Jiayuan Chen [Wed, 20 May 2026 02:34:11 +0000 (10:34 +0800)]
selftests: netfilter: add nft_fib_nexthop test
Functional coverage of nft_fib6_eval()'s nexthop enumeration over
three route shapes:
1) single external nexthop (nhid)
2) external nexthop group (nhid -> group)
3) old-style multipath (nexthop ... nexthop ...)
Each scenario places one nexthop on the input device (veth0). For
(2) and (3) the matching nexthop is the second member, so the walk
has to traverse beyond the primary nh. Two nft counters on prerouting
verify the data path: one increments only when fib reports veth0 as
the oif, the other counts "missing" results and must stay at zero.
./nft_fib_nexthop.sh
PASS: single external nexthop (nhid -> veth0)
PASS: nexthop group (dummy0 + veth0)
PASS: old-style multipath (sibling on veth0)
nft_fib6_info_nh_uses_dev() blindly walks &rt->fib6_siblings, causing
an OOB read past the struct nexthop slab when rt->nh is set:
==================================================================
BUG: KASAN: slab-out-of-bounds in nft_fib6_eval+0x1362/0x16c0
Read of size 8 at addr ffff888103a099d0 by task ping/386
Branch by route shape: when rt->nh is set, walk via
nexthop_for_each_fib6_nh() (also covers nh groups, which the original
code missed); otherwise walk fib6_siblings, guarded by READ_ONCE() of
rt->fib6_nsiblings as required by commit 31d7d67ba127 ("ipv6: annotate
data-races around rt->fib6_nsiblings").
Jiayuan Chen [Wed, 20 May 2026 02:34:09 +0000 (10:34 +0800)]
netfilter: nft_fib_ipv6: walk fib6_siblings under RCU
nft_fib6_info_nh_uses_dev() runs from nft_fib6_eval() in softirq under
rcu_read_lock(). fib6_siblings is modified by writers that hold
tb6_lock but do not wait for RCU readers, so the sibling walk should
use list_for_each_entry_rcu(): it adds READ_ONCE() on the ->next
pointer and lets CONFIG_PROVE_RCU_LIST validate the locking.
Florian Westphal [Tue, 19 May 2026 20:52:07 +0000 (22:52 +0200)]
netfilter: ebtables: fix OOB read in compat_mtw_from_user
Luxiao Xu says:
The function compat_mtw_from_user() converts ebtables extensions from
32-bit user structures to kernel native structures. However, it lacks
proper validation of the user-supplied match_size/target_size.
When certain extensions are processed, the kernel-side translation
logic may perform memory accesses based on the extension's expected
size. If the user provides a size smaller than what the extension
requires, it results in an out-of-bounds read as reported by KASAN.
This fix introduces a check to ensure match_size is at least as large
as the extension's required compatsize. This covers matches, watchers,
and targets, while maintaining compatibility with standard targets.
AFAIU this is relevant for matches that need to go though
match->compat_from_user() call. Those that use plain memcpy with the
user-provided size are ok because the caller checks that size vs the
start of the next rule entry offset (which itself is checked vs. total
size copied from userspace).
The ->compat_from_user() callbacks assume they can read compatsize bytes,
so they need this extra check.
Based on an earlier patch from Luxiao Xu.
Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Sat, 16 May 2026 15:23:21 +0000 (23:23 +0800)]
netfilter: disable payload mangling in userns
Several parts of network stack rely on iph->ihl validation
done by network stack before PRE_ROUTING.
Disable this feature for user namespaces for now.
tcp option handling is likely safe even for LOCAL_IN, so this
this leaves tcp option mangling via nft_exthdr.c as-is.
I don't think these are the only means to alter packets, but these
appear to be relatively prominent.
This could be relaxed later. Example:
- allow userns for ingress hook.
- allow userns if base is transport header.
Also, we should revalidate or restrict generally:
- Don't allow linklayer writes to spill into network header
- restrict ipv4 and ipv6 to 'known safe' writes, e.g.
saddr/daddr/check/tos
Florian Westphal [Thu, 14 May 2026 12:21:57 +0000 (14:21 +0200)]
netfilter: nf_conntrack_gre: fix gre keymap list corruption
Quoting reporter:
A race between GRE keymap insertion and destruction can corrupt the
kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a
new keymap pointer before the embedded `list_head` is linked, while
`nf_ct_gre_keymap_destroy()` can concurrently delete and free that
same object. An unprivileged user can reach this through the PPTP
conntrack helper by racing PPTP control messages or helper teardown,
leading to KASAN-detectable list corruption/UAF in kernel context.
## Root Cause Analysis
`exp_gre()` installs GRE expectations for a PPTP control flow and then
adds two GRE keymap entries [..]
The add path publishes `ct_pptp_info->keymap[dir]` before linking the
embedded list node [..]
Concurrent teardown deletes that partially initialized object.
Make add/destroy symmetric: install both, destroy both while under lock.
Furthermore, we should refuse to publish a new mapping in case ct is going
away, else we may leak the allocation.
The "retrans" detection is strange: existing mapping is checked for key
equality with the new mapping, then for "is on the list" via list walk.
But I can't see how an existing keymap entry can be NOT on list.
Change this to only check if we're asked to map same tuple again -- if so,
skip re-install, else signal failure.
Last, add a bug trap for the keymap list; it has to be empty when namespace
is going away.
Reported-by: Leo Lin <leo@depthfirst.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Chris Mason [Tue, 19 May 2026 19:36:14 +0000 (12:36 -0700)]
netfilter: synproxy: refresh tcphdr after skb_ensure_writable
synproxy_tstamp_adjust() rewrites the TCP timestamp option in place
and then patches the TCP checksum via inet_proto_csum_replace4() on
the caller-supplied tcphdr pointer. Both ipv4_synproxy_hook() and
ipv6_synproxy_hook() obtain that pointer with skb_header_pointer()
before calling in, so it may either alias skb->head directly or
point at the caller's on-stack _tcph buffer.
Between obtaining the pointer and using it, the function calls
skb_ensure_writable(skb, optend), which on a cloned or non-linear
skb invokes pskb_expand_head() and frees the old skb->head. After
that point the cached th is stale:
caller (ipv[46]_synproxy_hook)
th = skb_header_pointer(skb, ..., &_tcph)
synproxy_tstamp_adjust(skb, protoff, th, ...)
skb_ensure_writable(skb, optend)
pskb_expand_head() /* kfree(old skb->head) */
...
inet_proto_csum_replace4(&th->check, ...)
/* writes into freed head, or
into the caller's stack copy
leaving the on-wire checksum
stale */
The option bytes are written through skb->data and are fine; only
the checksum update goes through th and so lands in the wrong
place. The result is either a write into freed slab memory or a
packet leaving with a checksum that does not match its payload.
Fix by re-deriving th from skb->data + protoff immediately after
skb_ensure_writable() succeeds, so the subsequent checksum update
targets the linear, writable header.
Hamza Mahfooz [Mon, 11 May 2026 14:43:14 +0000 (10:43 -0400)]
netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check
An unintended behavior in the TCP conntrack state machine allows a
connection to be forced into the CLOSE state using an RST packet with an
invalid sequence number.
Specifically, after a SYN packet is observed, an RST with an invalid SEQ
can transition the conntrack entry to TCP_CONNTRACK_CLOSE, regardless of
whether the RST corresponds to the expected reply direction. The relevant
code path assumes the RST is a response to an outgoing SYN, but does not
validate packet direction or ensure that a matching SYN was actually sent
in the opposite direction.
As a result, a crafted packet sequence consisting of a SYN followed by an
invalid-sequence RST can prematurely terminate an active NAT entry. This
makes connection teardown easier than intended.
So, tighten the state transition logic to ensure that RST-triggered
CLOSE transitions only occur when the RST is a valid response to a
previously observed SYN in the correct direction.
device property: set fwnode->secondary to NULL in fwnode_init()
If a firmware node is allocated on the stack (for instance: temporary
software node whose life-time we control) or on the heap - but using a
non-zeroing allocation function - and initialized using fwnode_init(),
its secondary pointer will contain uninitalized memory which likely will
be neither NULL nor IS_ERR() and so may end up being dereferenced (for
example: in dev_to_swnode()). Set fwnode->secondary to NULL on
initialization.
Cc: stable <stable@kernel.org> Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Xiaolei Wang [Mon, 18 May 2026 07:34:05 +0000 (15:34 +0800)]
misc: rp1: Send IACK on IRQ activate to fix kdump/kexec
After a kexec/kdump reboot, the macb Ethernet controller fails to
receive any packets, causing DHCP to hang indefinitely and the network
interface to be unusable despite link being up.
The root cause is that RP1's level-triggered MSI-X interrupt sources
(such as macb on hwirq 6) may have their internal state machines stuck
in the "waiting for IACK" state. This happens because the previous
kernel crashed before sending the acknowledgment for a pending level
interrupt.
In this stuck state, RP1 will not generate new MSI-X writes even though
the interrupt source remains asserted. Since no new MSI-X is sent, the
GIC never sees a new edge, the chained IRQ handler is never invoked,
and the interrupt is permanently lost.
Fix this by sending MSIX_CFG_IACK in rp1_irq_activate(). This
unconditionally resets the MSI-X state machine back to idle when a
child device requests its interrupt. If the interrupt source is still
asserted, RP1 will immediately issue a new MSI-X with the freshly
configured msg_addr/msg_data, and normal interrupt delivery resumes.
Writing IACK when the state machine is already idle (i.e., on a normal
cold boot) is harmless — it has no effect.
Hongling Zeng [Mon, 18 May 2026 02:29:39 +0000 (10:29 +0800)]
gpib: cb7210: Fix region leak when request_irq fails
When request_irq() fails, the region allocated by request_region()
is not released. Fix this by adding an error handling path with
proper goto labels to release the region.
Ben Hutchings [Tue, 5 May 2026 18:45:12 +0000 (20:45 +0200)]
parport: Fix race between port and client registration
The parport subsystem registers port devices before they are fully
initialised, resulting in a race condition where client drivers such
as lp can attach to ports that are not completely initialised or even
being torn down.
When the port and client drivers are built as modules and loaded
around the same time during boot, this occasionally results in a
crash. I was able to make this happen reliably in a VM with a
PC-style parallel port by patching parport_pc to fail probing:
while true; do
modprobe lp & modprobe parport_pc
wait
rmmod lp parport_pc
done
for a few seconds.
In the long term I think port registration should be changed to put
the call to device_add() inside parport_announce_port(), but since the
latter currently cannot fail this will require changing all port
drivers.
For now, add a flag to indicate whether a port has been "announced"
and only try to attach client drivers to ports when the flag is set.
Guangshuo Li [Tue, 5 May 2026 15:02:56 +0000 (23:02 +0800)]
uio: uio_pci_generic_sva: fix double free of devm_kzalloc() memory
uio_pci_sva allocates struct uio_pci_sva_dev with devm_kzalloc() in
probe(), but then calls kfree(udev) both on the probe() error path
(label out_free) and again in remove().
Because devm_kzalloc() allocations are devres-managed and are freed
automatically when the device is detached (including after a failing
probe() and during driver unbind), the explicit kfree() can lead to a
double free.
If probe() fails after devm_kzalloc(), the error path frees udev and
devres cleanup will free it again when the core unwinds the partially
bound device. On normal driver removal, remove() frees udev and devres
will free it again when the device is detached.
This issue was identified by a static analysis tool I developed and
confirmed by manual review. Fix by removing the manual kfree() calls
and dropping the now-unused label.
Fixes: 3397c3cd859a2 ("uio: Add SVA support for PCI devices via uio_pci_generic_sva.c") Cc: stable <stable@kernel.org> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260505150256.614071-1-lgs201920130244@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zeng Heng [Thu, 21 May 2026 07:30:11 +0000 (15:30 +0800)]
arm64: tlb: Flush walk cache when unsharing PMD tables
When huge_pmd_unshare() is called to unshare a PMD table, the
tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true
but the aarch64 tlb_flush() only checked tlb->freed_tables to
determine whether to use TLBF_NONE (vae1is, invalidates walk
cache) or TLBF_NOWALKCACHE (vale1is, leaf-only).
This caused the stale PMD page table entry to remain in the walk cache
after unshare, potentially leading to incorrect page table walks.
Fix by including unshared_tables in the check, so that when
unsharing tables, TLBF_NONE is used and the walk cache is properly
invalidated.
Here is the detailed distinction between vae1is and vale1is:
| Instruction Combination | Actual Invalidation Scope |
| ------------------------ | --------------------------------------------------|
| `VAE1IS` + TTL=`0` | All entries at all levels (full invalidation) |
| `VAE1IS` + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2 |
| `VALE1IS` + TTL=`0` | Leaf entries at all levels (non-leaf not cleared) |
| `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only |
Merge patch series "writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()"
Baokun Li <libaokun@linux.alibaba.com> says:
When a container exits, a race between cgroup_writeback_umount() and
inode_switch_wbs() / cleanup_offline_cgwb() can trigger
"VFS: Busy inodes after unmount" followed by a use-after-free on
percpu counters.
There is a window between inode_prepare_wbs_switch() returning true
(having passed the SB_ACTIVE check and grabbed the inode) and the
subsequent wb_queue_isw() call. If cgroup_writeback_umount() observes
the global isw_nr_in_flight counter as non-zero but flush_workqueue()
finds nothing queued, it returns early -- leaving a held inode
reference that blocks evict_inodes() and a later iput() that hits
freed percpu counters.
Patch 1 closes the race by extending the RCU read-side critical
section to cover the window from inode_prepare_wbs_switch() through
wb_queue_isw(), and adding synchronize_rcu() in the umount path so
that all in-flight switchers complete queueing before
flush_workqueue() runs. rcu_barrier() is intentionally retained so
the same hunk applies cleanly to stable trees that still queue
switches via queue_rcu_work().
Patch 2 removes the now-dead rcu_barrier() that was left over from
the queue_rcu_work() era (replaced by plain queue_work() in commit e1b849cfa6b6 "writeback: Avoid contention on wb->list_lock when
switching inodes"). This is mainline-only.
Patch 3 replaces the global synchronize_rcu()/flush_workqueue() pair
with a per-sb counter (s_isw_nr_in_flight) plus three small helpers
(cgroup_writeback_pin / cgroup_writeback_unpin /
cgroup_writeback_drain), eliminating the global serialization
penalty. This also reverts the RCU extension from patch 1 since the
per-sb counter makes it unnecessary.
Performance
-----------
Measured on a 16 vCPU QEMU guest, all kernels share the same .config.
Background load: 4 ext4 superblocks each running
This drives both inode_switch_wbs() (different cgroups writing the
same inode) and cleanup_offline_cgwb() (dying memcgs), keeping the
global isw_nr_in_flight non-zero throughout the run. Latencies are
wall-clock around umount(8) on a separate target sb; only the target
sb's umount is measured.
Four kernels are compared at each step of the series:
p50 p95 p99 max
base 99.7 ms 112.9 ms 112.9 ms 127.2 ms
+race 110.2 ms 153.8 ms 153.8 ms 160.4 ms
+rmbarrier 67.6 ms 88.3 ms 88.3 ms 96.8 ms
+persb 7.9 ms 10.0 ms 10.0 ms 10.1 ms
Idle target umount under cross-sb cgwb-switch pressure:
p50 p95 p99 max
base 92.0 ms 123.5 ms 136.5 ms 141.3 ms
+race 118.8 ms 154.6 ms 164.7 ms 165.3 ms
+rmbarrier 62.7 ms 95.4 ms 108.1 ms 108.6 ms
+persb 5.3 ms 6.9 ms 7.4 ms 7.4 ms
8 concurrent umounts of idle sbs under the same pressure:
p50 p95 p99 max
base 137.5 ms 166.9 ms 166.9 ms 171.3 ms
+race 162.2 ms 183.9 ms 183.9 ms 217.0 ms
+rmbarrier 61.3 ms 99.5 ms 99.5 ms 113.7 ms
+persb 8.1 ms 9.1 ms 9.1 ms 9.5 ms
A no-pressure baseline run (no background load) measures ~5 ms p50
across all four kernels, validating that the methodology has no
systematic bias.
In-kernel cgroup_writeback_umount() cumulative cost across the same
run (bpftrace, ~340 calls covering all four scenarios):
cgroup_writeback_umount() time
base 21240 ms total (~62 ms / call)
+race (rcu_barrier+sync) 24966 ms total (~73 ms / call)
+rmbarrier (synchronize_rcu) 12371 ms total (~36 ms / call)
+persb (per-sb counter) 1.37 ms total ( ~4 us / call)
Under +persb the wait_var_event() condition is true on entry
whenever the target sb has nothing in flight, so synchronize_rcu()
and flush_workqueue() are never called on this path.
Notes:
- Patch 1 adds ~10-27 ms p50 over base by introducing
synchronize_rcu(). This is the cost of closing the race
correctly and is paid by stable backports as well.
- Patch 2 ("drop rcu_barrier()") was expected to be a pure cleanup
on mainline, but actually removes a real wait: rcu_barrier()
drains call_rcu() callbacks from *all* subsystems, and the
cgroup teardown path keeps that pipeline busy under this
workload. Removing it cuts ~43-101 ms p50 on top of patch 1.
- Patch 3 (per-sb counter) replaces the global wait entirely; the
target sb no longer waits for activity on unrelated sbs,
recovering near-baseline latency in all three scenarios.
* patches from https://patch.msgid.link/20260521095016.2791354-1-libaokun@linux.alibaba.com:
writeback: use a per-sb counter to drain inode wb switches at umount
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
Baokun Li [Thu, 21 May 2026 09:50:16 +0000 (17:50 +0800)]
writeback: use a per-sb counter to drain inode wb switches at umount
Tracking in-flight inode wb switches with a single global counter
(isw_nr_in_flight) plus a synchronize_rcu() based wait in
cgroup_writeback_umount() forces every umount to take a global hit
whenever any other superblock on the system has wb switches in flight,
even if the superblock being unmounted has none of its own.
Replace the global synchronize_rcu()/flush_workqueue() pair with a
per-sb counter, s_isw_nr_in_flight, plus three small helpers:
- cgroup_writeback_pin(sb) - increment counter
- cgroup_writeback_unpin(sb) - decrement and wake drainer if last
- cgroup_writeback_drain(sb) - wait for counter to reach zero
The wiring is:
- inode_prepare_wbs_switch() pins before checking SB_ACTIVE and
grabbing the inode; failure paths unpin before returning. A
lockless SB_ACTIVE check at the top of the function lets us skip
the atomic_inc/smp_mb dance once SB_ACTIVE has been cleared (it
is monotonic and never set back).
- process_inode_switch_wbs() unpins after the matching iput().
- cgroup_writeback_umount() drains the per-sb counter via
wait_var_event().
The smp_mb() pair between inode_prepare_wbs_switch() and
cgroup_writeback_umount() keeps the SB_ACTIVE / counter ordering:
either the umounter sees a non-zero counter and waits, or the
switcher sees SB_ACTIVE cleared and aborts before grabbing the
inode.
The global isw_nr_in_flight is left in place, since it is still used
to throttle in-flight switches via WB_FRN_MAX_IN_FLIGHT.
The rcu_read_lock() extension in inode_switch_wbs() and
cleanup_offline_cgwb() that the race fix added is no longer needed
and is reverted; the synchronize_rcu() that the race fix added to
cgroup_writeback_umount() is dropped as well.
The following numbers were measured on a 16 vCPU QEMU guest with 4
background superblocks each churning "create memcg -> write 1 MiB ->
rmdir memcg" to keep the global isw_nr_in_flight non-zero. Latencies
are wall-clock around umount(8); only the target sb's umount is
measured.
Target sb runs its own cgwb churn:
p50 p95 p99 max
global synchronize_rcu() 67.6 ms 88.3 ms 88.3 ms 96.8 ms
per-sb counter (this) 7.9 ms 10.0 ms 10.0 ms 10.1 ms
Idle target umount latency under cross-sb cgwb-switch pressure:
p50 p95 p99 max
global synchronize_rcu() 62.7 ms 95.4 ms 108.1 ms 108.6 ms
per-sb counter (this) 5.3 ms 6.9 ms 7.4 ms 7.4 ms
no-pressure baseline 4.9 ms 5.9 ms 6.3 ms 6.7 ms
8 concurrent umounts of idle sbs under the same pressure:
p50 p95 max
global synchronize_rcu() 61.3 ms 99.5 ms 113.7 ms
per-sb counter (this) 8.1 ms 9.1 ms 9.5 ms
In-kernel cgroup_writeback_umount() time across the same run
(bpftrace, ~340 calls covering all scenarios):
global synchronize_rcu() 12371 ms total (~36 ms / call)
per-sb counter (this) 1.37 ms total ( ~4 us / call)
Baokun Li [Thu, 21 May 2026 09:50:15 +0000 (17:50 +0800)]
writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()
Commit e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when
switching inodes") replaced the queue_rcu_work() based scheduling of
inode wb switches with a plain queue_work(). Since then no switcher
goes through call_rcu(), so rcu_barrier() in cgroup_writeback_umount()
has no callbacks of its own to wait for. It still drains unrelated
call_rcu() callbacks from other subsystems on busy systems, which
incidentally slows umount down; drop it.
Fixes: e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when switching inodes") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260521095016.2791354-3-libaokun@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Baokun Li [Thu, 21 May 2026 09:50:14 +0000 (17:50 +0800)]
writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()
When a container exits, the following BUG_ON() is occasionally triggered:
==================================================================
VFS: Busy inodes after unmount of sdb (ext4)
------------[ cut here ]------------
kernel BUG at fs/super.c:695!
CPU: 3 PID: 6 Comm: containerd-shim Tainted: G OE K 6.6 #1
pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : generic_shutdown_super+0xf0/0x100
lr : generic_shutdown_super+0xf0/0x100
Call trace:
generic_shutdown_super+0xf0/0x100
kill_block_super+0x20/0x48
ext4_kill_sb+0x28/0x60
deactivate_locked_super+0x54/0x130
deactivate_super+0x84/0xa0
cleanup_mnt+0xa4/0x140
__cleanup_mnt+0x18/0x28
task_work_run+0x78/0xe0
do_notify_resume+0x204/0x240
==================================================================
The root cause is a race between cgroup_writeback_umount() and
inode_switch_wbs()/cleanup_offline_cgwb(). There is a window between
inode_prepare_wbs_switch() returning true and the subsequent
wb_queue_isw() call. Following is the process that triggers the issue:
CPU A (umount) | CPU B (writeback)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
inode_switch_wbs/cleanup_offline_cgwb
atomic_inc(&isw_nr_in_flight)
inode_prepare_wbs_switch
-> passes SB_ACTIVE check
__iget(inode)
generic_shutdown_super
sb->s_flags &= ~SB_ACTIVE
cgroup_writeback_umount(sb)
smp_mb()
atomic_read(&isw_nr_in_flight)
rcu_barrier()
-> no pending RCU callbacks
flush_workqueue(isw_wq)
-> nothing queued, returns
evict_inodes(sb)
-> Inode skipped as isw still holds a ref.
sop->put_super(sb)
/* destroys percpu counters */
-> VFS: Busy inodes after unmount!
wb_queue_isw()
queue_work(isw_wq, ...)
/* later in work function */
inode_switch_wbs_work_fn
process_inode_switch_wbs
iput() -> evict
percpu_counter_dec() // UAF!
Fix this by extending the RCU read-side critical section in
inode_switch_wbs() and cleanup_offline_cgwb() to cover from
inode_prepare_wbs_switch() through wb_queue_isw(). Since there is
no sleep in this window, rcu_read_lock() can be used. Then add a
synchronize_rcu() in cgroup_writeback_umount() before the existing
rcu_barrier(), so that all in-flight switchers that have passed the
SB_ACTIVE check have completed queue_work() before flush_workqueue()
is called.
The existing rcu_barrier() is intentionally retained so this fix can
be backported unchanged to stable kernels (5.10.y, 6.6.y, ...) that
still queue switches via queue_rcu_work(). It is a no-op on current
mainline (since commit e1b849cfa6b6 ("writeback: Avoid contention on
wb->list_lock when switching inodes")) and is removed in a follow-up
patch.
Matthew Maurer [Fri, 3 Apr 2026 18:18:58 +0000 (18:18 +0000)]
rust_binder: Avoid holding lock when dropping delivered_death
In 6c37bebd8c926, we switched to looping over the list and dropping each
individual node, ostensibly without the lock held in the loop body.
If the kernel were using Rust Edition 2024, the comment would be
accurate, and the lock would not be held across the drop. However, the
kernel is currently using 2021, so tail expression lifetime extension
results in the lock being held across the drop. Explicitly binding the
expression result to a variable makes the lockguard no longer part of a
tail expression, causing the lock to be dropped before entering the loop
body.
This was detected via `CONFIG_PROVE_LOCKING` identifying an invalid wait
context at the drop site.
Reported-by: David Stevens <stevensd@google.com> Signed-off-by: Matthew Maurer <mmaurer@google.com> Cc: stable <stable@kernel.org> Fixes: 6c37bebd8c92 ("rust_binder: avoid mem::take on delivered_deaths") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://patch.msgid.link/20260403-lockhold-v1-1-c332b56cd8ae@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alice Ryhl [Tue, 14 Apr 2026 12:02:34 +0000 (12:02 +0000)]
rust_binder: avoid calling pending_oneway_finished() on TF_UPDATE_TXN
When an outdated transaction is removed from `oneway_todo` due to
`TF_UPDATE_TXN`, its `Allocation` is dropped. The current implementation
of `Allocation::drop` calls `pending_oneway_finished()`, assuming the
transaction was executed. This leads to premature execution of the next
queued one-way transaction.
Fix this by taking the `oneway_node` from the `Allocation` of the
outdated transaction before it is dropped. This prevents
`Allocation::drop` from signaling completion.
We do not call `take_oneway_node()` from `Transaction::cancel` because
it's actually correct to call `pending_oneway_finished()` on cancel if
the transaction did not come from `oneway_todo`. This ensures that if
`BINDER_THREAD_EXIT` is invoked and cancels a oneway transaction, then
the next transaction is taken from `oneway_todo`.
This bug does not lead to any issues in the kernel, but may lead to
Binder delivering transactions to userspace earlier than userspace
expected to receive them.
Enable modular build since the driver now has a proper module_exit()
handler. There's nothing specific to DZ hardware to prevent driver
unloading and reloading from working.
(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-- where a pointer is dereferenced that has been derived from a null
pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because the DZ device is fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the zs
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Conversely only starting the console port so late lets the reset code
fully utilise our delay handlers, so switch from udelay() to fsleep()
for transmitter draining so as to avoid busy-waiting for an excessive
amount of time.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062326540.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switch the driver to using the channel reset rather than hardware reset,
simplifying handling by removing an interference between channels that
causes the other channel to become uninitialised afterwards.
There is little difference between the two kinds of reset in terms of
register settings that result, and we initialise the whole register set
right away anyway. However this prevents a hang from happening should
the console output handler in the firmware try to access the other port
whose transmitter has been disabled and line parameters messed up.
For example this will happen if the keyboard port (port A) is chosen for
the system console, unusually but not insanely for a headless system, as
the port is wired to a standard DA-15 connector and an adapter can be
easily made. Or with the next change in place this would happen for the
regular console port (port B), since the keyboard port (port A) will be
initialised first.
Just remove the unnecessary complication then, a channel reset is good
enough. We still need the initialisation marker, now per channel rather
than per SCC, as for the console port zs_reset() will be called twice:
once early on via zs_serial_console_init() for the console setup only,
and then again via zs_config_port() as the port is associated with a TTY
device.
Calling zs_reset() in the course of setting up the serial device causes
line parameters to be reset and the transmitter disabled. We've been
lucky in that no message is usually produced to the kernel log between
this call and the later call to uart_set_options() in the course of
console setup done by zs_serial_console_init(), or the system would hang
as the console output handler in the firmware tried to access a port the
transmitter of which has been disabled and line parameters messed up.
This will change with the next change to the driver, so fix zs_reset()
such that line parameters are set for 9600n8 console operation as with
the system firmware and the transmitter re-enabled after reset. This
also means zs_pm() serves no purpose anymore, so drop it.
Calling dz_reset() in the course of setting up the serial device causes
line parameters to be reset and the transmitter disabled. We've been
lucky in that no message is usually produced to the kernel log between
this call and the later call to uart_set_options() in the course of
console setup done by dz_serial_console_init(), or the system would hang
as the console output handler in the firmware tried to access a port the
transmitter of which has been disabled and line parameters messed up.
This will change with the next change to the driver, so fix dz_reset()
such that line parameters are set for 9600n8 console operation as with
the system firmware and the transmitter re-enabled after reset. This
also means dz_pm() serves no purpose anymore, so drop it.
serial: dz: Fix bootconsole message clobbering at chip reset
In the DZ interface as implemented by the DC7085 gate array the serial
transmitters are double buffered, meaning that at the time a transmitter
is ready to accept the next character there is one in the transmit shift
register still being sent to the line. Issuing a master clear at this
time causes this character to be lost, so wait an extra amount of time
sufficient for the transmit shift register to drain at 9600bps, which is
the baud rate setting used by the firmware console.
Mind the specified 1.4us TRDY recovery time in the course and continue
using iob() as the completion barrier, since the platforms involved use
a write buffer that can delay and combine writes, and reorder them with
respect to reads regardless of the MMIO locations accessed and we still
lack a platform-independent handler for that.
When called from dz_serial_console_init() this is too early for fsleep()
to work and even before lpj has been calculated and therefore the delay
is actually not sufficient for the transmitter to drain and is merely a
placeholder now. This will be addressed in a follow-up change.
Jacques Nilo [Wed, 13 May 2026 13:30:25 +0000 (15:30 +0200)]
serial: 8250_dw: dispatch SysRq character in dw8250_handle_irq()
dw8250_handle_irq() calls serial8250_handle_irq_locked() with the port
lock held via guard(uart_port_lock_irqsave). The guard destructor is
plain uart_port_unlock_irqrestore(), so a SysRq character captured into
port->sysrq_ch by uart_prepare_sysrq_char() is dropped without ever
being dispatched to handle_sysrq().
This is the same regression pattern as in serial8250_handle_irq(),
introduced when 883c5a2bc934 ("serial: 8250_dw: Rework
dw8250_handle_irq() locking and IIR handling") moved the function to
the guard()-based locking scheme without using the sysrq-aware unlock
helper.
Switch to guard(uart_port_lock_check_sysrq_irqsave) so that captured
sysrq_ch is dispatched on scope exit, matching the fix in
serial8250_handle_irq().
Fixes: 883c5a2bc934 ("serial: 8250_dw: Rework dw8250_handle_irq() locking and IIR handling") Cc: stable@vger.kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Jacques Nilo <jnilo@free.fr> Link: https://patch.msgid.link/ed56fcaf4af24e4ed011a7bce206e0182acb761c.1778675349.git.jnilo@free.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jacques Nilo [Wed, 13 May 2026 13:30:24 +0000 (15:30 +0200)]
serial: 8250: dispatch SysRq character in serial8250_handle_irq()
serial8250_handle_irq() captures a SysRq character into port->sysrq_ch
inside serial8250_handle_irq_locked() via uart_prepare_sysrq_char()
(reached from serial8250_read_char()). Dispatch of that captured
character to handle_sysrq() is expected to happen at port-unlock time,
through uart_unlock_and_check_sysrq[_irqrestore]().
After commit 8324a54f604d ("serial: 8250: Add
serial8250_handle_irq_locked()") the function was reduced to a wrapper
that takes the port lock via guard(uart_port_lock_irqsave) whose
destructor is plain uart_port_unlock_irqrestore(). The sysrq-aware
unlock helper is no longer called, so port->sysrq_ch is captured but
never dispatched: BREAK + SysRq key is consumed silently.
This was the very condition Johan Hovold's 853a9ae29e978 ("serial:
8250: fix handle_irq locking", 2021) introduced
uart_unlock_and_check_sysrq_irqrestore() to address.
Switch to the new guard(uart_port_lock_check_sysrq_irqsave), whose
destructor is the sysrq-aware unlock helper, restoring the pre-split
behaviour. Update the Context: comment on serial8250_handle_irq_locked()
so future HW-specific 8250 wrappers know to use the same guard or the
explicit sysrq-aware unlock.
Verified on RTL8196E with CONFIG_MAGIC_SYSRQ_SERIAL=y: BREAK + 'h' on
the console UART produces the SysRq help dump in dmesg and the brk
counter in /proc/tty/driver/serial increments correctly.
uart_handle_break() and uart_prepare_sysrq_char() (in
include/linux/serial_core.h) capture a SysRq character into
port->sysrq_ch while the port lock is held and rely on the unlock
helper -- uart_unlock_and_check_sysrq_irqrestore() -- to dispatch the
captured character to handle_sysrq() on scope exit.
The existing guard(uart_port_lock_irqsave) cannot be used by IRQ
handlers that process RX, because its destructor calls plain
uart_port_unlock_irqrestore() and silently drops port->sysrq_ch.
Add a dedicated guard(uart_port_lock_check_sysrq_irqsave) variant
whose destructor is the sysrq-aware unlock helper. The lock side is
identical to uart_port_lock_irqsave -- only the unlock-time behaviour
differs. Callers that may capture SysRq characters must use
guard(uart_port_lock_check_sysrq_irqsave); the existing
guard(uart_port_lock_irqsave) keeps its current plain-unlock semantics
for the many callers that do not process RX.
The new macro is placed after the CONFIG_MAGIC_SYSRQ_SERIAL block so
both definitions of uart_unlock_and_check_sysrq_irqrestore() (sysrq
enabled and disabled) are visible at expansion time. When
CONFIG_MAGIC_SYSRQ_SERIAL=n the destructor degenerates to plain
uart_port_unlock_irqrestore(), so there is no overhead.
No functional change on its own; users are converted in the following
patches.
Tudor Ambarus [Fri, 15 May 2026 12:41:21 +0000 (12:41 +0000)]
tty: serial: samsung: Remove redundant port lock acquisition in rx helpers
Sashiko identified a deadlock when the console flow is engaged [1].
When console flow control is enabled (UPF_CONS_FLOW),
s3c24xx_serial_stop_tx() calls s3c24xx_serial_rx_enable() and
s3c24xx_serial_start_tx() calls s3c24xx_serial_rx_disable().
The serial core framework invokes the .stop_tx() and .start_tx()
callbacks with the port->lock spinlock already held. Furthermore, all
internal driver paths that invoke stop_tx (such as the DMA TX
completion handler s3c24xx_serial_tx_dma_complete() or the PIO TX IRQ
handler s3c24xx_serial_tx_irq()) also acquire port->lock prior to
calling it. (Note that s3c24xx_serial_start_tx() is only invoked by the
serial core).
However, s3c24xx_serial_rx_enable() and s3c24xx_serial_rx_disable()
unconditionally attempt to acquire port->lock again using
uart_port_lock_irqsave(). Since spinlocks are not recursive, this
causes a deadlock on the same CPU when console flow control is engaged.
Remove the redundant lock acquisition from both rx helper functions.
Cc: stable <stable@kernel.org> Fixes: b497549a035c ("[ARM] S3C24XX: Split serial driver into core and per-cpu drivers") Reported-by: John Ogness <john.ogness@linutronix.de> Closes: https://sashiko.dev/#/patchset/20260506121606.5805-1-john.ogness%40linutronix.de [1] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://patch.msgid.link/20260515-samsung-tty-flow-control-deadlock-v1-1-93255edbc9bc@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
altera_jtaguart_probe() maps the register window before registering the
UART port, but it ignores failures from uart_add_one_port(). If port
registration fails, probe still returns success and the mapping remains
live until a later remove path that is not part of probe failure cleanup.
Return the uart_add_one_port() error and unmap the register window on
that failure path.
This issue was identified during our ongoing static-analysis research while
reviewing kernel code.
Fixes: 5bcd601049c6 ("serial: Add driver for the Altera JTAG UART") Cc: stable <stable@kernel.org> Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Link: https://patch.msgid.link/20260512065837.79528-1-mhun512@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicolas Pitre [Fri, 15 May 2026 03:48:57 +0000 (23:48 -0400)]
vt: merge ucs_is_zero_width()/ucs_is_double_width() into ucs_get_width()
The hot path in vc_process_ucs() asks two independent questions about the
same code point -- "is it double-width?" and "is it zero-width?" -- and
was answering each with its own bsearch over its own table. For anything
past the leading bounds check that meant two scans of the BMP width
tables back to back for what is logically a single lookup.
Replace both with one ucs_get_width(cp) returning 0, 1, or 2 in a single
bsearch, while keeping the total table footprint at the same 2384 B as
before.
To do so, merge the zero-width and double-width ranges per region into
one sorted-by-`first` table. BMP entries stay 4 bytes; per-entry width
is hosted in spare bits of the non-BMP table's `last` field. Non-BMP
code points use only 20 of 32 bits, so each u32 has 12 unused high bits.
Store first/last shifted left by 12 and use the low 12 bits of `last`
for metadata: bit 11 is this entry's own width flag, bits 0..7 host an
8-bit chunk of the BMP double-width bitmap. Because the metadata bits
sit strictly below the lowest cp-scale bit, the bsearch comparator
remains a plain u32 compare on shifted keys with no masking.
In vc_process_ucs() the overwhelmingly common single-width path now
collapses to a single predicted branch:
if (likely(w == 1))
return 1;
Note: scripts/checkpatch.pl complains about "Macros with complex values
should be enclosed in parentheses" for the BMP_*WIDTH and
RANGE_*WIDTH macros. They are deliberately defined to expand to a
comma-separated (first, last) pair so they can populate the two
adjacent fields of a struct initializer; wrapping them in
parentheses would turn that into a comma-expression and defeat
the whole construction. Please ignore.
Marco Felsch [Tue, 19 May 2026 09:57:00 +0000 (11:57 +0200)]
serial: 8250: fix possible ISR soft lockup
There are rare cases in which the host gets stuck in the ISR because it
is flooded with messages during the startup phase.
The reason for the soft lockup in the ISR is the missing FIFO error IRQ
(FIFOE) handling. Not handling it and reporting IRQ_HANDLED triggers
the IRQ immediately again.
Fix this by adding a check for the FIFOE status and clearing the FIFO
if no data is ready (DR).
This behavior was observed on an AM62L device which uses the OMAP 8250
driver. Fix it for all 8250 drivers, since the OMAP driver's special
IRQ setup handling may trigger this behavior more frequently, but it
is not ensured that other 8250 drivers aren't affected.
The plain text binding file was superseded by the YAML schema in
commit d50f974c4f7f ("dt-bindings: serial: Convert rs485 bindings
to json-schema"). The file now contains only a redirect notice.
Remove it, and update references in serial_core.c and
serial-rs485.rst to point to the YAML schema.
Praveen Talari [Mon, 18 May 2026 17:56:55 +0000 (23:26 +0530)]
serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
Add tracepoint support to the Qualcomm GENI serial driver to provide
runtime visibility into driver behavior without requiring invasive debug
patches.
The trace events cover UART termios configuration, clock setup, modem
control state, interrupt status, and TX/RX data, making it easier to
diagnose communication issues in the field.
tty: serial: Use named initializers for arrays of i2c_device_data
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
The mentioned robustness is relevant for a planned change to struct
i2c_device_id that replaces .driver_data by an anonymous union.
While touching all these arrays, unify usage of whitespace in the list
terminator.
This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
The clock notifier and matching work_struct in dw8250_data were added
in 2020 for the Baikal-T1 SoC, whose multiple UART ports share a
single reference clock and need to be informed when another consumer
re-rates that clock.
Baikal SoC support has since been removed from the kernel (see e.g.
commit 5d6c477687ae ("clk: baikal-t1: Remove not-going-to-be-supported
code for Baikal SoC") and the matching removals across bus/, mtd/,
PCI/, hwmon/, memory/). No remaining in-tree user needs the
cross-device baudclk rate-change notification path: the only
configuration that wired up the notifier was Baikal-T1's shared
reference clock topology.
Drop the now-unused clock-notifier and its deferred-update worker:
- struct dw8250_data fields clk_notifier and clk_work,
- the clk_to_dw8250_data() and work_to_dw8250_data() helpers,
- the dw8250_clk_work_cb() and dw8250_clk_notifier_cb() callbacks,
- the INIT_WORK / notifier_call setup in dw8250_probe(),
- the clk_notifier_register() / queue_work() in dw8250_probe(),
- the matching clk_notifier_unregister() / flush_work() in
dw8250_remove(),
- the stale comment in dw8250_set_termios() about the worker
blocking,
- the linux/notifier.h and linux/workqueue.h includes that are
no longer used.
dw8250_set_termios() keeps calling clk_set_rate() directly, which is
all the remaining single-UART configurations require.
Stepan Ionichev [Thu, 14 May 2026 14:37:45 +0000 (19:37 +0500)]
serial: 8250_dw: unregister 8250 port if clk_notifier_register() fails
dw8250_probe() registers the 8250 port via serial8250_register_8250_port()
and then, if the device has a clock, registers a clock notifier. If
clk_notifier_register() fails, probe returns the error but leaves the
8250 port registered. The matching serial8250_unregister_port() lives
in dw8250_remove(), which is not called when probe fails, so the port
slot stays occupied until the device is rebound or the system is
rebooted. The devm-allocated driver data is freed while the port still
references it (via the saved private_data and serial_in/serial_out
callbacks), so any access to that port slot before a rebind is a
use-after-free hazard.
Unregister the port on the clk_notifier_register() error path.
Stefan Dösinger [Wed, 13 May 2026 21:34:57 +0000 (00:34 +0300)]
amba/serial: amba-pl011: Bring back zx29 UART support
This is based on code removed in commit 89d4f98ae90d ("ARM: remove zte
zx platform"). I did not bring back the zx29-uart .compatible as the
arm,primecell-periphid does the job.
John Ogness [Mon, 11 May 2026 15:27:02 +0000 (17:33 +0206)]
serial: 8250: Add support for console flow control
The kernel documentation specifies that the console option 'r' can
be used to enable hardware flow control for console writes. The 8250
driver does include code for hardware flow control on the console if
cons_flow is set, but there is no code path that actually sets this.
However, that is not the only issue. The problems are:
1. Specifying the console option 'r' does not lead to cons_flow being
set.
2. Even if cons_flow would be set, serial8250_register_8250_port()
clears it.
3. When the console option 'r' is specified, uart_set_options()
attempts to initialize the port for CRTSCTS. However, afterwards
it does not set the UPSTAT_CTS_ENABLE status bit and therefore on
boot, uart_cts_enabled() is always false. This policy bit is
important for console drivers as a criteria if they may poll CTS.
4. Even though uart_set_options() attempts to initialize the port
for CRTSCTS, the 8250 set_termios() callback does not enable the
RTS signal (TIOCM_RTS) and thus the hardware is not properly
initialized for CTS polling.
5. Even if modem control was properly setup for CTS polling
(TIOCM_RTS), uart_configure_port() clears TIOCM_RTS, thus
breaking CTS polling.
6. wait_for_xmitr() and serial8250_console_write() use cons_flow
to decide if CTS polling should occur. However, the condition
should also include a check that it is not in RS485 mode and
CRTSCTS is actually enabled in the hardware.
Address all these issues as conservatively as possible by gating them
behind checks focussed on the user specifying console hardware flow
control support and the hardware being configured for CTS polling
at the time of the write to the UART.
Since checking the UPSTAT_CTS_ENABLE status bit is a part of the new
condition gate, these changes also support runtime termios updates to
disable/enable CRTSCTS.
John Ogness [Mon, 11 May 2026 15:27:01 +0000 (17:33 +0206)]
serial: 8250: Check LSR timeout on console flow control
wait_for_xmitr() calls wait_for_lsr() to wait for the transmission
registers to be empty. wait_for_lsr() can timeout after a reasonable
amount of time.
When console flow control is active, wait_for_xmitr() additionally
polls CTS, waiting for the peer to signal that it is ready to receive
more data.
If hardware flow control is enabled (auto CTS) and the peer deasserts
CTS, wait_for_lsr() will timeout. If additionally console flow
control is active and while polling CTS the peer asserts CTS, the
console will assume it can immediately transmit, even though the
transmission registers may not be empty. This can lead to data loss.
Avoid this problem by performing an extra wait_for_lsr() upon CTS
assertion if wait_for_lsr() previously timed out.
Stepan Ionichev [Fri, 8 May 2026 18:12:37 +0000 (23:12 +0500)]
tty: serial: 8250: protect against NULL uart->port.dev in register
serial8250_register_8250_port() conditionally copies uart->port.dev
from up->port.dev only when up->port.dev is non-NULL:
if (up->port.dev) {
uart->port.dev = up->port.dev;
...
}
So if both the existing uart slot and up have a NULL ->dev,
uart->port.dev remains NULL. The very next ACPI companion check
then dereferences it unconditionally:
if (!has_acpi_companion(uart->port.dev)) {
has_acpi_companion() reads dev->fwnode without a NULL guard
(include/linux/acpi.h), so this NULL-derefs the kernel for the
remaining no-dev case rather than just skipping the
mctrl_gpio_init() initialisation as intended.
smatch flags the inconsistency:
drivers/tty/serial/8250/8250_core.c:767
serial8250_register_8250_port() error: 'uart->port.dev' could be
null (see line 719)
Guard the call with a NULL check so register continues to work
for callers that legitimately have no parent device (legacy
non-OF/non-ACPI registrations).
No functional change for callers that pass a non-NULL ->dev.
arm64: dts: add support for A9 based Amlogic BY401
Add basic support for the A9 based Amlogic BY401 board, which describes
the following components: CPU, GIC, IRQ, Timer and UART.
These are capable of booting up into the serial console.
Hugo Villeneuve [Thu, 21 May 2026 15:33:29 +0000 (11:33 -0400)]
serial: max310x: fix compile errors if CONFIG_SPI_MASTER is disabled
Since commit 20ffe4b3330a8 ("serial: max310x: allow driver to be built with
SPI or I2C"), if I2C is enabled and SPI_MASTER is disabled, we have these
compile errors:
drivers/tty/serial/max310x.c: In function 'max310x_uart_init':
drivers/tty/serial/max310x.c: error: 'max310x_spi_driver' undeclared...
drivers/tty/serial/max310x.c: In function ‘max310x_uart_init’:
drivers/tty/serial/max310x.c: error: label ‘err_spi_register’
defined but not used...
drivers/tty/serial/max310x.c: error: ‘regcfg’ defined but not used
Fix by properly encapsulating i2c/spi code/variables in their respective
context with IS_ENABLED() macros for CONFIG_I2C and CONFIG_SPI_MASTER.
Also fix link failure with SERIAL_MAX310X=y and I2C=m by modifying Kconfig
depends.
Fixes: 20ffe4b3330a8 ("serial: max310x: allow driver to be built with SPI or I2C") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605121847.N9DVLNg2-lkp@intel.com/ Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com> Link: https://patch.msgid.link/20260521153333.2336642-1-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pawel Laszczak [Thu, 21 May 2026 08:16:24 +0000 (10:16 +0200)]
usb: cdnsp: Add support for device-only configuration
This patch introduces support for the Cadence USBSSP (cdnsp)
controller in hardware configurations where the Dual-Role Device (DRD)
register block is not implemented or is inaccessible.
In such cases, the driver cannot rely on the DRD logic to manage roles
and must operate exclusively in a fixed peripheral/host mode.
The change in BAR indexing (from BAR 2 to BAR 1) is a direct
consequence of the 32-bit addressing used in this specific
DRD-disabled hardware layout, compared to the 64-bit addressing
used in DRD-enabled configurations.
Tested on a PCI platform with a hardware configuration that lacks
DRD support. Platform-side changes are included to support the PCI
glue layer's property injection to handle this specific layout.
Introduce a new generic fallback compatible string 'cdns,cdnsp' for
Cadence USBSSP controllers to support hardware configurations where
the Dual-Role Device (DRD) register block is missing or inaccessible.
Following the maintainer's feedback, avoid generic property-like naming
(such as "-no-drd") and use a clean generic fallback. To keep the schema
resource-driven and strictly validated, define a two-string compatible
matrix using an empty schema ({}) wildcard. This allows future vendor
SoC compatibles to be prepended while safely falling back to the 2-resource
USBSSP configuration.
When 'cdns,cdnsp' is matched:
- The 'otg' register and interrupt resources are not required.
- The 'reg' and 'interrupts' properties are restricted to 2 items
(host and device).
- 'dr_mode' must be explicitly set to either 'host' or 'peripheral'.
The standard 'cdns,usb3' compatible remains unchanged, maintaining
backward compatibility by requiring all 3 resource sets (otg, host, dev).
Maoyi Xie [Thu, 21 May 2026 06:54:28 +0000 (14:54 +0800)]
usb: gadget: aspeed_udc: avoid past-the-end iterator in dequeue
ast_udc_ep_dequeue() declares the loop cursor `req` outside the
list_for_each_entry(). After the loop it tests `&req->req != _req`
to decide whether the request was found. If the queue holds no
match, `req` is past-the-end. It then aliases
container_of(&ep->queue, struct ast_udc_request, queue) via offset
cancellation. Whether that synthetic address equals `_req` depends
on heap layout. The function can return 0 without dequeueing
anything.
Default `rc` to -EINVAL and set it to 0 only inside the match
branch. `req` is no longer read after the loop, so the past-the-end
dereference goes away. No extra cursor variable or post-loop test
is needed.
Suggested-by: Alan Stern <stern@rowland.harvard.edu> Suggested-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com> Link: https://patch.msgid.link/20260521065428.3261238-1-maoyixie.tju@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jisheng Zhang [Wed, 20 May 2026 13:37:11 +0000 (21:37 +0800)]
usb: dwc2: remove WARN in dwc2_hcd_save_data_toggle
The WARN() in dwc2_hcd_save_data_toggle() was introduced in
commit 62943b7dfa35 ("usb: dwc2: host: fix the data toggle error in
full speed descriptor dma"), it looks like the WARN() is to ensure
proper usage of dwc2_hcd_save_data_toggle(): either qtd is provided
for control eps or qh is provided for non-control eps. This check is
good even if there's no such improper usage in current code. But the
WARN() usage in driver is discouraged nowadays: imagine there is an
improper usage, then kernel panic due to warn if 'panic_on_warn' is
enabled.
While emitting the err msg for improper usage is still valueable, so
let's replace the WARN with check and dev_err().
At the same time, it looks a bit strange we check !chan after
dereference of this pointer with
"if (chan->ep_type != USB_ENDPOINT_XFER_CONTROL)".
In fact, when entering the dwc2_hcd_save_data_toggle(), the chan won't
be NULL, because its caller or indirect caller has ensured this,
specifically, it's checked with below line in dwc2_hc_n_intr()
if (!chan) {
dev_err(hsotg->dev, "## hc_ptr_array for channel is NULL ##\n");
return;
}
This addresses the following issue reported by klocwork tool:
- Suspicious dereference of pointer 'chan' before NULL check at
line 518
Pooja Katiyar [Tue, 19 May 2026 18:45:13 +0000 (11:45 -0700)]
usb: typec: ucsi: Enable debugfs for message_out data structure
Add debugfs entry for writing message_out data structure to handle
UCSI 2.1 and 3.0 commands through debugfs interface.
Users writing to the message_out debugfs file should ensure the input
data adheres to the following format:
1. Input must be a non-empty valid hexadecimal string.
2. Input length of hexadecimal string must not exceed 256 bytes of
length to be in alignment with the message out data structure size
as per the UCSI specification v2.1.
3. If the input string length is odd, then user needs to prepend a
'0' to the first character for proper hex conversion.
Below are examples of valid hex strings. Note that these values are
just examples. The exact values depend on specific command use case.
Pooja Katiyar [Tue, 19 May 2026 18:45:12 +0000 (11:45 -0700)]
usb: typec: ucsi: Add support for message_out data structure
Add support for UCSI message_out data structure. The UCSI
interface defines separate message_in and message_out data
structure for bidirectional communication, where commands
like Set PDOs and LPM Firmware Update require writing data
to message_out before command execution.
Add write_message_out operation to ucsi_operations structure
to allow platform drivers to implement message_out data writing
capability.
Update ucsi_sync_control_common to accept message_out parameters
and call write_message_out followed by command execution to
maintain proper sequencing as per the UCSI specification.
Introduce ucsi_write_message_out_command for commands that need
to send message_out data, while maintaining ucsi_send_command
for commands that only require message_in response data.
Airoha SoC use the same register map and logic of the Mediatek xHCI
driver, hence add it to the dependency list to permit compilation also
on this ARCH.
Dmitry Baryshkov [Tue, 19 May 2026 10:48:08 +0000 (13:48 +0300)]
arm64: dts: qcom: pmi632: move vdd-vbus-supply to connector nodes
Instead of specifying the VBUS supply as powering on the Type-C block in
the PMIC, follow the standard schema and use vbus-supply property of the
usb-c connector itself.
Dmitry Baryshkov [Tue, 19 May 2026 10:48:07 +0000 (13:48 +0300)]
arm64: dts: qcom: pm8150b: move vdd-vbus-supply to connector nodes
Instead of specifying the VBUS supply as powering on the Type-C block in
the PMIC, follow the standard schema and use vbus-supply property of the
usb-c connector itself.
Dmitry Baryshkov [Tue, 19 May 2026 10:48:06 +0000 (13:48 +0300)]
arm64: dts: qcom: pm7250b: move vdd-vbus-supply to connector nodes
Instead of specifying the VBUS supply as powering on the Type-C block in
the PMIC, follow the standard schema and use vbus-supply property of the
usb-c connector itself.
Dmitry Baryshkov [Tue, 19 May 2026 10:48:05 +0000 (13:48 +0300)]
arm64: dts: qcom: pm4125: move vdd-vbus-supply to connector nodes
Instead of specifying the VBUS supply as powering on the Type-C block in
the PMIC, follow the standard schema and use vbus-supply property of the
usb-c connector itself.
Dmitry Baryshkov [Tue, 19 May 2026 10:48:04 +0000 (13:48 +0300)]
usb: typec: tcpm: qcom: prefer VBUS supply from the connector node
Current way of specifying VBUS supply (via the device's vdd-vbus-supply
property) is not ideal. In the end, VBUS is supplied to the USB-C
connector rather than the Type-C block in the PMIC. Follow the standard
way of specifying it (via the connector node) and fallback to the old
property if there is no vbus-supply in the connector node.
The Qualcomm PMIC Type-C devices historically provided their own way of
specifying the VBUS regulator, via the device's vdd-vbus-supply node.
This is not ideal as the VBUS is supplied to the connector and not to
the Type-C block in the PMIC. Deprecate this property in favour of the
standard way of specifying it (via the connector's vbus-supply
property).
USB: typec: qcom-pmic-typec: Drop redundant header includes
Unlike other units in this module, this one does not request interrupts
or regulator supplies. It does not use OF graph, USB role switching or
TypeC muxing APIs. Drop redundant header includes to speed up
preprocessor.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Link: https://patch.msgid.link/20260519100014.282058-4-krzysztof.kozlowski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dt-bindings: usb: qcom,pmic-typec: Drop redundant port
The binding defines both "port" and "connector" properties, where the
"port" is claimed to be for "data-role switching messages". There is no
such dedicated data port for this device and role switching is part of
connector ports - the port going to the USB controller.
The driver does not use the "port" property and there is no upstream DTS
which would have it. It looks like it's left-over of early versions of
this patchset and is completely redundant now, so let's drop it.
Seungjin Bae [Mon, 18 May 2026 22:49:02 +0000 (18:49 -0400)]
usb: host: max3421: Reject hub port requests for non-existent ports
The `max3421_hub_control()` function handles USB hub class requests
to the virtual root hub. The `GetPortStatus` case correctly rejects
requests with `index != 1`, since the virtual root hub has only a
single port. However, the `ClearPortFeature` and `SetPortFeature`
cases lack the same check.
Fix this by extending the `index != 1` rejection to both cases,
matching the existing behavior of `GetPortStatus`.
Fixes: 2d53139f3162 ("Add support for using a MAX3421E chip as a host driver.") Suggested-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Seungjin Bae <eeodqql09@gmail.com> Link: https://patch.msgid.link/20260518224901.1887013-3-eeodqql09@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Seungjin Bae [Mon, 18 May 2026 22:49:00 +0000 (18:49 -0400)]
usb: host: max3421: Fix shift-out-of-bounds in max3421_hub_control()
The `max3421_hub_control()` function handles USB hub class requests
to the virtual root hub. In the `default` branches of both the
`ClearPortFeature` and `SetPortFeature` switch statements, it modifies
`max3421_hcd->port_status` by left shifting 1 by the request's `value`
parameter. However, it does not validate whether this shift will exceed
the width of `port_status`.
So if a malicious userspace task with access to the root hub via
/dev/bus/usb/.../001 issues a USBDEVFS_CONTROL ioctl with `wValue`
greater than or equal to 32, the left shift operation invokes
shift-out-of-bounds undefined behavior. This results in arbitrary
bit corruption of `port_status`, including the normally-immutable
change bits, which can bypass internal state checks and confuse the
hub status.
Fix this by rejecting requests whose `value` exceeds the shift width
before performing the shift.
This issue was found using a KLEE-based symbolic execution tool for
kernel drivers that I'm currently developing.
The problem is that all connector locks belong to the same
lockdep lock class, so the following loop:
for (i = 0; i < ucsi->cap.num_connectors; i++)
ucsi_register_port(connector[i])
mutex_lock(&connector[i]->lock)
looks like a recursive acquire of the same mutex. Put each connector
lock into a dedicated lock class so that lockdep doesn't see it as a
possible recursion.
dt-bindings: usb: ci-hdrc-usb2: allow up to 3 clocks for qcom,ci-hdrc
Some Qualcomm SoCs such as apq8064 and msm8960 require a third "fs"
clock in addition to "iface" and "core", needed to propagate resets
through the controller and wrapper logic. Later SoCs such as msm8974
dropped this requirement and only use two clocks.
Note that the existing apq8064 and msm8960 DTS files currently specify
the "iface" and "core" clocks in reverse order compared to most newer
SoCs DTS, which causes dtbs_check warnings for these older SoCs. The
dependent patch series will fix that clock ordering.
Dave Carey [Fri, 15 May 2026 14:19:40 +0000 (10:19 -0400)]
USB: cdc-acm: start bulk-IN polling when ALWAYS_POLL_CTRL is set
The INGENIC 17EF:6161 touchscreen composite device has a ~55-second
watchdog that resets the USB device if the bulk-IN endpoint on the CDC
data interface goes unread. The existing ALWAYS_POLL_CTRL quirk keeps
the notification endpoint (ctrlurb / EP 0x82) polling continuously, but
that alone is insufficient: the firmware monitors bulk-IN activity, not
just notification-endpoint activity.
Add acm_submit_read_urbs() calls to the two ALWAYS_POLL_CTRL paths that
already restart the ctrlurb:
1. acm_probe(): start bulk reads at probe time alongside the ctrlurb,
so the watchdog is satisfied from first bind without requiring a
userspace process to open /dev/ttyACMn.
2. acm_port_shutdown(): restart bulk reads after port close alongside
the ctrlurb restart, so the watchdog keeps running when the last
TTY user closes the port.
acm_read_bulk_callback() already resubmits each URB unconditionally on
normal completion, so once submitted the reads remain active until an
explicit kill (disconnect, suspend). acm_submit_read_urb() is a no-op
for URBs that are already in flight (read_urbs_free bit clear), so the
existing acm_port_activate() call remains correct and races are avoided.
Tested on Lenovo Yoga Book 9 14IAH10 (83KJ): without this patch the
device resets every ~55 s when no TTY is open; with it the device
remains stable indefinitely.
Stepan Ionichev [Sat, 9 May 2026 11:06:36 +0000 (16:06 +0500)]
usb: gadget: goku_udc: avoid NULL deref of dev->driver in INT_USBRESET log
goku_irq() handles a number of bus events under a single ep0 path.
It already guards the gadget driver suspend/resume callbacks against a
NULL ->driver:
If the controller raises INT_USBRESET before any gadget driver has
been bound (or after one has been unbound), dev->driver is NULL and
the printk dereferences NULL.
smatch flags the inconsistency:
drivers/usb/gadget/udc/goku_udc.c:1618 goku_irq() error:
we previously assumed 'dev->driver' could be null (see line 1607)
Fall back to a placeholder when the gadget driver is not bound.
No functional change while a gadget driver is bound.
usb: typec: intel_pmc_mux: Zero initialize num_ports in pmc_usb_probe()
Clang warns (or errors with CONFIG_WERROR=y / W=e):
drivers/usb/typec/mux/intel_pmc_mux.c:740:3: error: variable 'num_ports' is uninitialized when used here [-Werror,-Wuninitialized]
740 | num_ports++;
| ^~~~~~~~~
This should have been initialized to zero. Do so now to clean up the
warning and ensure num_ports does not use uninitialized memory.
Adrian Wowk [Tue, 14 Apr 2026 01:00:50 +0000 (20:00 -0500)]
usbip: vhci_hcd: reduce CONFIG_USBIP_VHCI_NR_HCS upper bound to 32
Each VHCI HC instance registers two USB buses (one HS, one SS).
USB_MAXBUS in drivers/usb/core/hcd.c is hard-coded to 64, giving an
effective maximum of 32 VHCI HC instances (32 * 2 = 64 buses).
The Kconfig range for USBIP_VHCI_NR_HCS currently allows up to 128,
which will cause probe failures for any HC instance beyond the 32nd.
These probe failures trigger the NULL pointer dereference fixed in the
previous commit.
Reduce the upper bound to 32 to reflect the real maximum imposed by
USB_MAXBUS. Note that probe failures can still occur below this limit
if real hardware has already claimed enough USB bus numbers, making
the NULL check fix necessary regardless.
Adrian Wowk [Tue, 14 Apr 2026 01:00:49 +0000 (20:00 -0500)]
usbip: vhci_hcd: fix NULL deref in status_show_vhci
platform_get_drvdata() can return NULL if a VHCI host controller's
probe failed (e.g. due to USB bus number exhaustion). status_show_vhci()
checked for a NULL pdev but not for a NULL hcd returned by
platform_get_drvdata(). Passing NULL to hcd_to_vhci_hcd() does not
return NULL - it returns a pointer offset of 0x260, causing a NULL
pointer dereference when that value is subsequently dereferenced.
Add a NULL check on hcd before calling hcd_to_vhci_hcd(). Move
status_show_not_ready() above status_show_vhci() to make it callable
from the new error path without a forward declaration.
Oliver Neukum [Wed, 29 Apr 2026 09:44:04 +0000 (11:44 +0200)]
usb: core: hcd: fix possible deadlock in rh control transfers
>From within the SCSI error handler memory allocations must not
trigger IO. Handling errors in UAS and the storage driver may
involve resetting a device. The thread doing the reset itself
relies on VM magic. However, that is insufficient, as resetting
a device involves resuming it. Resumption as well as resetting
involves conrol transfers to the parent of the device to be reset.
That may be a root hub. Hence usbcore must heed the flags passed
to usb_submit_urb() processing control transfers to root hubs.
The problem exist since the storage driver has been merged.
Xu Yang [Mon, 27 Apr 2026 07:56:53 +0000 (15:56 +0800)]
usb: chipidea: udc: support dynamic gadget add/remove
An asynchronous vbus_event_work() keep running when switch the role from
device to host. This affects EHCI host controller initialization.
USBCMD.RUNSTOP bit is set at ehci_run() and cleared by following
vbus_event_work() if bus_event_work() run after ehci_run().
The log below shows what happens:
[ 87.819925] ci_hdrc ci_hdrc.0: EHCI Host Controller
[ 87.819963] ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1
[ 87.955634] ci_hdrc ci_hdrc.0: USB 2.0, controller refused to start: -110
[ 87.955658] ci_hdrc ci_hdrc.0: startup error -110
[ 87.955682] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
The problem is that the chipidea UDC driver call usb_udc_vbus_handler() to
pull down data line but it don't wait for completion before host controller
starts running.
Now UDC core can properly delete usb gadget device and make sure that vbus
work is cancelled or completed after usb_del_gadget_udc() is returned. But
the udc.c only call usb_del_gadget_udc() in ci_hdrc_gadget_destroy(). To
avoid above issue, add/remove the gadget device dynamically during USB role
switching.
To support dynamic gadget add/remove, do below steps:
- clear ci->gadget and ci->ci_hw_ep at initialization.
- assign udc_[start|stop]() to rdrv->[start|stop] and properly merge the
operations in udc_id_switch_for_[device|host]() to udc_[start|stop]()
Adjust the order ci_handle_vbus_change() and ci_role_start() to avoid NULL
pointer reference since ci_hdrc_gadget_init() doesn't add gadget anymore.
Wentao Guan [Fri, 22 May 2026 09:13:58 +0000 (17:13 +0800)]
USB: cdc-acm: Fix bit overlap and move quirk definitions to header
The VENDOR_CLASS_DATA_IFACE and ALWAYS_POLL_CTRL quirk flags added in
commit f58752ebcb35 ("USB: cdc-acm: Add quirks for Yoga Book 9 14IAH10
INGENIC touchscreen") were placed inside the acm_ctrl_msg() function
rather than in the header with the other quirk flags. Then, their
values (BIT(9) and BIT(10)) collided with NO_UNION_12 which is already
BIT(9).
Move the definitions to drivers/usb/class/cdc-acm.h where they belong
and shift them to BIT(10) and BIT(11) to avoid the overlap.
Claudio Imbrenda [Tue, 19 May 2026 15:01:14 +0000 (17:01 +0200)]
KVM: s390: Properly reset zero bit in PGSTE
In case of memory pressure, it's possible that a guest page gets freed
and then almost immediately reused by the guest. If CMMA is enabled,
_essa_clear_cbrl() will discard all pages that are either unused or
zero. If a discarded page is reused before _essa_clear_cbrl() is called,
and the pgste.zero bit is not cleared, the page will be discarded
despite not being unused.
When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This
prevents the page from being accidentally discarded when not unused.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Claudio Imbrenda [Tue, 19 May 2026 15:01:13 +0000 (17:01 +0200)]
KVM: s390: vsie: Fix redundant rmap entries
The address passed to the gmap rmap was not being masked. As a
consequence several different (but functionally equivalent) rmap
entries were being created for each shadowed table.
Fix this by properly masking the address depending on the table level.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Claudio Imbrenda [Tue, 19 May 2026 15:01:12 +0000 (17:01 +0200)]
KVM: s390: vsie: Fix unshadowing logic
In some cases (i.e. under extreme memory pressure on the host),
attempting to shadow memory will result in the same memory being
unshadowed, causing a loop.
Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT
tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent
unnecessary unshadowing and perform better checks.
Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did
not unshadow properly when the large page would become unprotected.
Opportunistically add a check in gmap_protect_rmap() to make sure it
won't be called with level == TABLE_TYPE_PAGE_TABLE.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Xu Yang [Mon, 27 Apr 2026 07:57:55 +0000 (15:57 +0800)]
usb: chipidea: core: convert ci_role_switch to local variable
When a system contains multiple USB controllers, the global ci_role_switch
variable may be overwritten by subsequent driver initialization code.
This can cause issues in the following cases:
- The 2nd ci_hdrc_probe() sees ci_role_switch.fwnode as non-NULL even
though the "usb-role-switch" property is not present for the controller.
- When the ci_hdrc device is unbound and bound again, ci_role_switch
fwnode will not be reassigned, and the old value will be used instead.
Convert ci_role_switch to a local variable to fix these issues.
Fixes: 05559f10ed79 ("usb: chipidea: add role switch class support") Cc: stable <stable@kernel.org> Acked-by: Peter Chen <peter.chen@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://patch.msgid.link/20260427075755.3611217-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb: gadget: f_fs: serialize DMABUF cancel against request completion
ffs_epfile_dmabuf_io_complete() calls usb_ep_free_request() on the
completed request but leaves priv->req, the back-pointer that
ffs_dmabuf_transfer() set on submission, pointing at the freed
memory. A later FUNCTIONFS_DMABUF_DETACH ioctl or
ffs_epfile_release() on the close path still sees priv->req
non-NULL under ffs->eps_lock:
if (priv->ep && priv->req)
usb_ep_dequeue(priv->ep, priv->req);
so usb_ep_dequeue() is called on a freed usb_request.
On dummy_hcd the dequeue path only walks a live queue and
pointer-compares, so the freed pointer reads without faulting and
KASAN requires an explicit check at the FunctionFS call site to
surface the use-after-free. On SG-capable in-tree UDCs the
dequeue path dereferences the supplied request immediately:
* chipidea's ep_dequeue() does
container_of(req, struct ci_hw_req, req) and reads
hwreq->req.status before acquiring its own lock.
* cdnsp's cdnsp_gadget_ep_dequeue() reads request->status first.
The narrower option of clearing priv->req via cmpxchg() in the
completion does not close the race: the completion runs without
eps_lock, so a cancel path holding eps_lock can still observe
priv->req non-NULL, race a concurrent completion that clears and
frees, and pass the freed pointer to usb_ep_dequeue(). A slightly
longer fix that moves the free into the cleanup work is needed.
Same class of lifetime race as the recent usbip-vudc timer fix [1].
Take eps_lock in the sole place that mutates priv->req from the
callback direction by moving usb_ep_free_request() out of the
completion into ffs_dmabuf_cleanup(), the existing work handler
scheduled by ffs_dmabuf_signal_done() on
ffs->io_completion_wq. Clear priv->req there under eps_lock
before freeing, and only clear if priv->req still names our
request (a subsequent ffs_dmabuf_transfer() on the same
attachment may have queued a new one).
This keeps the existing dummy_hcd sync-dequeue invariant: the
completion callback is still invoked by the UDC without
eps_lock held (dummy_hcd drops its own lock before calling the
callback), and the callback now takes no f_fs lock at all.
Serialization against the cancel path happens in cleanup, which
runs from the workqueue with no f_fs lock held on entry.
The priv ref count protects the containing ffs_dmabuf_priv:
ffs_dmabuf_transfer() takes a ref via ffs_dmabuf_get(), cleanup
drops it via ffs_dmabuf_put(), so priv stays live for the
cleanup even after the cancel path's list_del + ffs_dmabuf_put.
The ffs_dmabuf_transfer() error path no longer frees usb_req
inline: fence->req and fence->ep are set before usb_ep_queue(),
so ffs_dmabuf_cleanup() (scheduled by the error-path
ffs_dmabuf_signal_done()) owns the free regardless of whether
the queue succeeded.
Reproduced under KASAN on both detach and close paths against
dummy_hcd with an observability hook
(kasan_check_byte(priv->req) immediately before usb_ep_dequeue)
at the two FunctionFS cancel sites to surface the stale-pointer
access; the hook is not part of this patch. The KASAN
allocator / free stacks in the captured splats identify the
same request: alloc in dummy_alloc_request, free in
dummy_timer, fault reached from ffs_epfile_release (close) and
from the FUNCTIONFS_DMABUF_DETACH ioctl (detach). With the
patch applied, both paths are silent under the same hook.
The bug is reached from the FunctionFS device node, which in
real deployments is owned by the privileged gadget daemon
(adbd, UMS, composite gadget services, etc.); it is not
reachable from unprivileged userspace or from a USB host on the
cable. FunctionFS mounts default to GLOBAL_ROOT_UID, but the
filesystem supports uid=, gid=, and fmode= delegation to a
non-root gadget daemon, so on real deployments the attacker may
be a less-privileged service rather than root.
usb: gadget: f_fs: copy only received bytes on short ep0 read
ffs_ep0_read() allocates its control-OUT data buffer with
kmalloc() (not kzalloc) at the Length value from the Setup
packet, then copies that full len to userspace regardless of
how many bytes were actually received:
data = kmalloc(len, GFP_KERNEL);
...
ret = __ffs_ep0_queue_wait(ffs, data, len);
if ((ret > 0) && (copy_to_user(buf, data, len)))
ret = -EFAULT;
__ffs_ep0_queue_wait() returns req->actual, which on a short
control OUT transfer is strictly less than len. The
copy_to_user() call still copies len bytes, so on a short OUT
the last (len - ret) bytes of the kmalloc() buffer --
uninitialised slab residue -- are delivered to the FunctionFS
daemon.
Short ep0 OUT completions are specified USB control-transfer
behavior and are produced by in-tree UDCs:
* dwc2 continues on req->actual < req->length for ep0 DATA OUT
(short-not-ok is the only ep0-OUT stall path).
* aspeed_udc ends ep0 OUT on rx_len < ep->ep.maxpacket.
* renesas_usbf logs "ep0 short packet" and completes the
request.
* dwc3 stalls on short IN but not on short OUT.
A short ep0 OUT is therefore not evidence of a broken UDC; it is
a normal condition f_fs has to cope with. The sibling gadgetfs
implementation in drivers/usb/gadget/legacy/inode.c already does
this correctly via min(len, dev->req->actual) before
copy_to_user(). This patch brings f_fs.c to the same safe
pattern rather than trimming at a defensive layer.
The bug is reached from the FunctionFS device node, which in
real deployments is owned by the privileged gadget daemon
(adbd, UMS, composite gadget services, etc.); it is not
reachable from unprivileged userspace. Linux host stacks
normally reject short-wLength control OUTs before they reach
the gadget, so reproducing this required a build that
bypasses that host-side check. With the bypass in place, a
1-byte payload on a 64-byte Setup produces 63 bytes of
non-canary slab residue in the daemon's read buffer.
Fix by copying only ret (actually received) bytes to
userspace.
Seungjin Bae [Mon, 18 May 2026 23:43:14 +0000 (19:43 -0400)]
usb: gadget: dummy_hcd: Reject hub port requests for non-existent ports
The `dummy_hub_control()` function handles USB hub class requests
to the virtual root hub. The `GetPortStatus` case returns -EPIPE for
requests with `wIndex != 1`, since the virtual root hub has only a
single port. However, the `ClearPortFeature` and `SetPortFeature`
cases lack the same check.
Fix this by extending the `wIndex != 1` rejection to both cases,
matching the existing behavior of `GetPortStatus`.
Hang Cao [Wed, 15 Apr 2026 06:42:38 +0000 (14:42 +0800)]
dt-bindings: usb: Fix EIC7700 USB reset's issue
The EIC7700 USB requires a USB PHY reset operation; otherwise, the USB
will not work. The reason why the USB driver that was applied can work
properly is that the USB PHY has already been reset in ESWIN's U-Boot.
However, the proper functioning of the USB driver should not be dependent
on the bootloader. Therefore, it is necessary to incorporate the USB PHY
reset signal into the DT bindings.
This patch does not introduce any backward incompatibility since the dts
is not upstream yet. As array of reset operations are used in the driver,
no modifications to the USB controller driver are needed.
Fixes: c640a4239db5 ("dt-bindings: usb: Add ESWIN EIC7700 USB controller") Cc: stable <stable@kernel.org> Signed-off-by: Hang Cao <caohang@eswincomputing.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20260415064238.1784-1-caohang@eswincomputing.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usbip: vudc: Fix use after free bug in vudc_remove due to race condition
This patch follows up Zheng Wang's 2023 report of a use-after-free in
vudc_remove(). The original thread stalled on Shuah Khan's request for
runtime testing of the unplug/unbind path. This patch supplies that
testing and keeps Zheng's original fix shape.
In vudc_probe(), v_init_timer() binds udc->tr_timer.timer to v_timer().
usbip_sockfd_store() starts the timer via v_start_timer()/v_kick_timer().
vudc_remove() can then free the containing struct vudc while the timer is
still pending or executing.
KASAN confirms the race on an unpatched x86_64 QEMU guest with
CONFIG_KASAN=y, CONFIG_USBIP_VUDC=y, CONFIG_USB_ZERO=y, and a tight loop
that repeatedly writes a socket fd to usbip_sockfd, closes the socket
pair, and unbinds/rebinds usbip-vudc.0:
BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x8ba/0x8e0
Write of size 8 at addr ffff888001b80740 by task trigger_and_unb/239
Allocated by task 239:
vudc_probe+0x4d/0xaa0
Freed by task 239:
kfree+0x18f/0x520
device_release_driver_internal+0x388/0x540
unbind_store+0xd9/0x100
This lands in the timer core rather than v_timer() itself because the
embedded timer_list is being walked after its containing struct vudc has
already been freed. The underlying lifetime bug is the same one Zheng
reported.
With v_stop_timer() called from vudc_remove() and the timer deleted
synchronously, the same harness completed 5000 bind/unbind iterations
with no KASAN report.
dt-bindings: usb: ti,omap4-musb: Drop duplicate 'usb-phy' property constraints
The deprecated 'usb-phy' property is documented already in usb.yaml and
doesn't need a type definition here. It just needs constraints on how
many entries there are.
As this is a host controller, reference usb-hcd.yaml which then
references usb.yaml.
Fixes: 70fcdc82cf4a ("dt-bindings: usb: ti,omap4-musb: convert to DT schema") Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260508182556.1759173-1-robh@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sam Burkels [Fri, 1 May 2026 13:23:46 +0000 (15:23 +0200)]
usb: storage: Add quirks for PNY Elite Portable SSD
The PNY Elite Portable SSD (USB ID 154b:f009) is a sibling of the
already-quirked PNY Pro Elite SSDs (154b:f00b and 154b:f00d). Like its
siblings, it uses a Phison-based USB-SATA bridge that exhibits
firmware bugs when bound to the uas driver.
Without quirks, the device fails to complete READ CAPACITY commands
when accessed over UAS on a SuperSpeed (USB 3) port. The device
enumerates and reports as a SCSI direct-access device, but reports
zero logical blocks and never finishes spin-up:
usb 2-3: new SuperSpeed USB device number 8 using xhci_hcd
usb 2-3: New USB device found, idVendor=154b, idProduct=f009
usb 2-3: Product: PNY ELITE PSSD
usb 2-3: Manufacturer: PNY
scsi host0: uas
scsi 0:0:0:0: Direct-Access PNY PNY ELITE PSSD 0
sd 0:0:0:0: [sda] Spinning up disk...
[...10+ seconds of polling, no progress...]
sd 0:0:0:0: [sda] Read Capacity(16) failed: hostbyte=DID_ERROR
sd 0:0:0:0: [sda] Read Capacity(10) failed: hostbyte=DID_ERROR
sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
Tested each individual quirk to find the minimum that fixes this:
- US_FL_NO_ATA_1X alone: device hangs on spin-up
- US_FL_NO_REPORT_OPCODES alone: works on USB 2.0, hangs on USB 3.0
- US_FL_NO_ATA_1X | US_FL_NO_REPORT_OPCODES: works on both
With both quirks the device enumerates correctly while still using
the uas driver, and delivers full UAS throughput (~281 MB/s
sequential read on a USB 3.0 Gen 1 port).
The existing PNY Pro Elite entries (f00b, f00d) only set NO_ATA_1X,
but this device additionally chokes on REPORT OPCODES under
SuperSpeed.
Signed-off-by: Sam Burkels <sam@1a38.nl> Acked-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@kernel.org> Link: https://patch.msgid.link/20260501132346.86572-1-sam@1a38.nl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen J. Fuhry [Wed, 13 May 2026 17:14:19 +0000 (13:14 -0400)]
USB: quirks: add NO_LPM for Lenovo ThinkPad USB-C Dock Gen2 hub controllers
The Lenovo ThinkPad USB-C Dock Gen2 (17ef:a391, 17ef:a392) hub
controllers exhibit link instability when USB Link Power Management
is enabled, similar to the dock's Ethernet adapter (17ef:a387) which
already carries USB_QUIRK_NO_LPM.
When the dock reconnects after a transient disconnect, the hub
controllers enter LPM states between re-enumeration retries, causing
repeated disconnect/reconnect cycles lasting up to two minutes.
Disabling LPM for these devices restores stable enumeration.
Marc Zyngier [Wed, 20 May 2026 09:19:40 +0000 (10:19 +0100)]
KVM: arm64: vgic-v5: Limit support to 64 PPIs
Although we have some code supporting 128 PPIs, the only supported
configuration is 64 PPIs. There is no way to test the 128 PPI code,
so it is bound to bitrot very quickly.
Given that KVM/arm64's goal has always been to stick to non-IMPDEF
behaviours, drop the 128 PPI support. Someone motivated enough and
with very strong arguments can always bring it back -- it's all in
the git history.
Despite adding the necessary infrastructure to identify irq types,
vgic_get_vcpu_irq() treats GICv5 PPIs in a special way, which
impairs the readability of the code.
Use the existing irq classifiers to handle per-CPU irqs for all
vgic types, and let the normal control flow reach global interrupt
handling without any v5-specific path.
Marc Zyngier [Wed, 20 May 2026 09:19:38 +0000 (10:19 +0100)]
KVM: arm64: vgic-v5: Drop defensive checks from vgic_v5_ppi_queue_irq_unlock()
vgic_v5_ppi_queue_irq_unlock() performs a bunch of sanity checks
that are pretty pointless as there is no code path that can
result in these invariants to be violated. And if they are, a nice
crash is just as instructive than a warning.
Drop what is evidently debug code and simplify the whole thing.
vgic_allocate_private_irqs_locked() calls two helpers, oddly named
vgic_{,v5_}allocate_private_irq().
Not only these helpers don't allocate anything, but they also
contain duplicate init code that would be better placed in the
caller.
Consolidate the common init code in the caller, rename the helpers
to vgic_{,v5_}setup_private_irq(), and pass the irq pointer around
instead of the index of the interrupt.