Petr Oros [Tue, 28 Apr 2026 05:22:16 +0000 (22:22 -0700)]
iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
The V1 ADD_VLAN opcode had no success handler; filters sent via V1
stayed in ADDING state permanently. Add a fallthrough case so V1
filters also transition ADDING -> ACTIVE on PF confirmation.
Critically, add an `if (v_retval) break` guard: the error switch in
iavf_virtchnl_completion() does NOT return after handling errors,
it falls through to the success switch. Without this guard, a
PF-rejected ADD would incorrectly mark ADDING filters as ACTIVE,
creating a driver/HW mismatch where the driver believes the filter
is installed but the PF never accepted it.
For V2, this is harmless: iavf_vlan_add_reject() in the error
block already kfree'd all ADDING filters, so the success handler
finds nothing to transition.
Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-4-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:15 +0000 (22:22 -0700)]
iavf: wait for PF confirmation before removing VLAN filters
The VLAN filter DELETE path was asymmetric with the ADD path: ADD
waits for PF confirmation (ADD -> ADDING -> ACTIVE), but DELETE
immediately frees the filter struct after sending the DEL message
without waiting for the PF response.
This is problematic because:
- If the PF rejects the DEL, the filter remains in HW but the driver
has already freed the tracking structure, losing sync.
- Race conditions between DEL pending and other operations
(add, reset) cannot be properly resolved if the filter struct
is already gone.
Add IAVF_VLAN_REMOVING state to make the DELETE path symmetric:
In iavf_del_vlans(), transition filters from REMOVE to REMOVING
instead of immediately freeing them. The new DEL completion handler
in iavf_virtchnl_completion() frees filters on success or reverts
them to ACTIVE on error.
Update iavf_add_vlan() to handle the REMOVING state: if a DEL is
pending and the user re-adds the same VLAN, queue it for ADD so
it gets re-programmed after the PF processes the DEL.
The !VLAN_FILTERING_ALLOWED early-exit path still frees filters
directly since no PF message is sent in that case.
Also update iavf_del_vlan() to skip filters already in REMOVING
state: DEL has been sent to PF and the completion handler will
free the filter when PF confirms. Without this guard, the sequence
DEL(pending) -> user-del -> second DEL could cause the PF to return
an error for the second DEL (filter already gone), causing the
completion handler to incorrectly revert a deleted filter back to
ACTIVE.
Fixes: 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-3-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:14 +0000 (22:22 -0700)]
iavf: stop removing VLAN filters from PF on interface down
When a VF goes down, the driver currently sends DEL_VLAN to the PF for
every VLAN filter (ACTIVE -> DISABLE -> send DEL -> INACTIVE), then
re-adds them all on UP (INACTIVE -> ADD -> send ADD -> ADDING ->
ACTIVE). This round-trip is unnecessary because:
1. The PF disables the VF's queues via VIRTCHNL_OP_DISABLE_QUEUES,
which already prevents all RX/TX traffic regardless of VLAN filter
state.
2. The VLAN filters remaining in PF HW while the VF is down is
harmless - packets matching those filters have nowhere to go with
queues disabled.
3. The DEL+ADD cycle during down/up creates race windows where the
VLAN filter list is incomplete. With spoofcheck enabled, the PF
enables TX VLAN filtering on the first non-zero VLAN add, blocking
traffic for any VLANs not yet re-added.
Remove the entire DISABLE/INACTIVE state machinery:
- Remove IAVF_VLAN_DISABLE and IAVF_VLAN_INACTIVE enum values
- Remove iavf_restore_filters() and its call from iavf_open()
- Remove VLAN filter handling from iavf_clear_mac_vlan_filters(),
rename it to iavf_clear_mac_filters()
- Remove DEL_VLAN_FILTER scheduling from iavf_down()
- Remove all DISABLE/INACTIVE handling from iavf_del_vlans()
VLAN filters now stay ACTIVE across down/up cycles. Only explicit
user removal (ndo_vlan_rx_kill_vid) or PF/VF reset triggers VLAN
filter deletion/re-addition.
Fixes: ed1f5b58ea01 ("i40evf: remove VLAN filters on close") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-2-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Oros [Tue, 28 Apr 2026 05:22:13 +0000 (22:22 -0700)]
iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
Rename the IAVF_VLAN_IS_NEW state to IAVF_VLAN_ADDING to better
describe what the state represents: an ADD request has been sent to
the PF and is waiting for a response.
This is a pure rename with no behavioral change, preparing for a
cleanup of the VLAN filter state machine.
Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-1-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set
The CONFIG_PA11 option can not be used as a reliable check if we build a
32-bit kernel which needs the 32-bit VDSO.
Instead depend on CONFIG_64BIT and CONFIG_COMPAT only.
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: Helge Deller <deller@gmx.de>
Petr Tesarik [Fri, 10 Apr 2026 11:35:06 +0000 (13:35 +0200)]
dma-direct: fix use of max_pfn
Calculate the correct physical address of the last byte of memory. Since
max_pfn is in fact "the PFN of the first page after the highest system RAM
in physical address space", the highest address that might be used for a
DMA buffer is one byte below max_pfn << PAGE_SHIFT.
This fix is unlikely to make any difference in practice. It's just that the
current formula is slightly confusing.
netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
sashiko says:
could the related code in __nf_tables_abort() leak the struct nft_hook objects when the table is dormant?
In __nf_tables_abort(), when rolling back a NEWCHAIN transaction that
updates hooks, the code conditionally unregisters and frees the hooks only
if the table is not dormant [..]
if (!(table->flags & NFT_TABLE_F_DORMANT)) {
nft_netdev_unregister_hooks(net,
&nft_trans_chain_hooks(trans),
true);
}
...
nft_trans_destroy(trans);
Unfortunately netdev family mixes hook registration and allocation.
Push table struct down and only check for the flag to unregister.
Fixes: 216e7bf7402c ("netfilter: nf_tables: skip netdev hook unregistration if table is dormant") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: xt_CT: fix usersize for v1 and v2 revision
While resurrecting the conntrack-tool test cases I found following bug:
In:
iptables -I OUTPUT -t raw -p 13 -j CT --timeout test-generic
Out:
[0:0] -A OUTPUT -p 13 -j CT --timeout test
Data after first four bytes of the timeout policy name is never
copied to userspace because its treated as kernel-only.
Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Merge tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix inverted check of registering the stats for branch tracing
When calling register_stat_tracer() which returns zero on success and
negative on error, the callers were checking the return of zero as an
error and printing a warning message. Because this was just a normal
printk() message and not a WARN(), it wasn't caught in any testing.
Fix the check to print the warning message when an error actually
happens.
- Fix a typo in a comment in tracepoint.h
- Limit the size of event probes to 3K in size
It is possible to create a dynamic event probe via the tracefs system
that is greater than the max size of an event that the ring buffer
can hold. This basically causes the event to become useless.
Limit the size of an event probe to be 3K as that should be large
enough to handle any dynamic events being created, and fits within
the PAGE_SIZE sub-buffers of the ring buffer.
* tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Limit size of event probe to 3K
tracepoint: Fix typo in tracepoint.h comment
tracing: branch: Fix inverted check on stat tracer registration
regulator: rpi-panel-attiny: add back GPIOLIB dependency
This driver provides a gpio chip, which is only possible when GPIOLIB
is enabled, which was previously guaranteed by the CONFIG_OF_GPIO
dependency that is now gone:
riscv: Define __riscv_copy_{,vec_}{words,bytes}_unaligned() using SYM_TYPED_FUNC_START
After commit 67bdd7b01387 ("riscv: Split out measure_cycles() for
reuse") and commit c03ad15f7cf6 ("riscv: Reuse measure_cycles() in
check_vector_unaligned_access()"), there are CFI failure when booting
kernels with CONFIG_CFI=y:
CFI failure at measure_cycles+0x38/0xe0 (target: __riscv_copy_words_unaligned+0x0/0x50; expected type: ...)
CFI failure at measure_cycles+0x38/0xe0 (target: __riscv_copy_vec_words_unaligned+0x0/0x24; expected type: ...)
The __riscv_copy_*_unaligned() functions are now called indirectly but
they are not defined with SYM_TYPED_FUNC_START, which is required for
assembly functions called indirectly from C to pass CFI checking. Switch
to SYM_TYPED_FUNC_START to clear up the CFI failures.
Fixes: 67bdd7b01387 ("riscv: Split out measure_cycles() for reuse") Fixes: c03ad15f7cf6 ("riscv: Reuse measure_cycles() in check_vector_unaligned_access()") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://patch.msgid.link/20260406-measure_cycles-cfi-failure-v1-1-03e0234ae02f@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>
Hasan Basbunar [Tue, 28 Apr 2026 17:07:39 +0000 (19:07 +0200)]
page_pool: fix memory-provider leak in page_pool_create_percpu() error path
When page_pool_create_percpu() fails on page_pool_list(), it falls
through to its err_uninit: label, which calls page_pool_uninit().
At that point page_pool_init() has already taken two references
when the user requested PP_FLAG_ALLOW_UNREADABLE_NETMEM:
Neither is undone by page_pool_uninit(); both are only undone by
__page_pool_destroy() (success-side teardown). The error path
therefore leaks the per-provider reference taken by mp_ops->init
(io_zcrx_ifq->refs in the io_uring zcrx provider, the dmabuf
binding refcount in the devmem provider) plus one increment of
the page_pool_mem_providers static branch on every failure of
xa_alloc_cyclic() inside page_pool_list().
The leaked io_zcrx_ifq->refs in turn pins everything
io_zcrx_ifq_free() would release on cleanup: ifq->user (uid),
ifq->mm_account (mmdrop), ifq->dev (device refcount),
ifq->netdev_tracker (netdev refcount), and the rbuf region.
The leaked static branch increment forces all subsequent
page_pool_alloc_netmems() and page_pool_return_page() callers to
take the slow mp_ops branch for the lifetime of the kernel.
The same shape applies to the devmem dmabuf provider via
mp_dmabuf_devmem_init()/mp_dmabuf_devmem_destroy().
Restore the cleanup symmetry by moving the mp_ops->destroy() and
static_branch_dec() calls out of __page_pool_destroy() and into
page_pool_uninit(), so page_pool_uninit() is again the strict
inverse of page_pool_init(). page_pool_uninit() has only two
callers (the err_uninit: path and __page_pool_destroy()), so this
preserves the single-call invariant on the success path while
fixing the err path. The error path of page_pool_init() itself
still skips the mp_ops cleanup correctly: mp_ops->init is the
last action that takes a reference before page_pool_init() returns
0, so when it returns an error neither the refcount nor the static
branch has been touched.
Triggering the bug requires xa_alloc_cyclic() to fail with -ENOMEM,
which under normal GFP_KERNEL retry behaviour is rare. It is
deterministic under CONFIG_FAULT_INJECTION with fail_page_alloc /
xa fault injection, or under sustained memory pressure. The leak
is silent: there is no warning, and the released kernel build
continues running with a permanently-incremented static branch.
value changed: 0x0000000000000000 -> 0xffff88813cf5c400
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 22063 Comm: syz.0.31122 Tainted: G W syzkaller #0 PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Fixes: 47e91f56008b ("bonding: use RCU protection for 3ad xmit path") Reported-by: syzbot+9bb2ff2a4ab9e17307e1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69f0a82f.050a0220.3aadc4.0000.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Link: https://patch.msgid.link/20260428123207.3809211-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi [Tue, 28 Apr 2026 06:53:16 +0000 (08:53 +0200)]
net: airoha: Do not return err in ndo_stop() callback
Always complete the airoha_dev_stop() routine regardless of the
airoha_set_vip_for_gdm_port() return value, since errors from
ndo_stop() are ignored by the networking stack and the interface is
always considered down after the call.
Hamza Mahfooz [Tue, 28 Apr 2026 12:53:39 +0000 (08:53 -0400)]
hv_sock: fix ARM64 support
VMBUS ring buffers must be page aligned. Therefore, the current value of
24K presents a challenge on ARM64 kernels (with 64K pages). So, use
VMBUS_RING_SIZE() to ensure they are always aligned and large enough to
hold all of the relevant data.
Cc: stable@vger.kernel.org Fixes: 77ffe33363c0 ("hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication") Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260428125339.13963-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 28 Apr 2026 20:39:24 +0000 (13:39 -0700)]
MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
The IPv4/IPv6 and routing code is not very well separated from
the TCP/UDP code. Scope it down properly by providing a more
accurate file list, instead of net/ipv4/ and net/ipv6/
Now that the entry is more accurately representing layer 3
and routing merge in the nexthop entry into it.
Add Ido Schimmel as a co-maintainer, Ido's git history speaks
for itself.
Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260428203924.1229169-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 28 Apr 2026 20:33:57 +0000 (13:33 -0700)]
selftests: drv-net: clarify linters and frameworks in README
Minor clarifications in the README:
- call out what linters we expect to be clean
- make it clear that by "frameworks" we mean code under lib/
not just factoring code out in the same file
Jakub Kicinski [Tue, 28 Apr 2026 02:53:20 +0000 (19:53 -0700)]
net: add net_iov_init() and use it to initialize ->page_type
Commit db359fccf212 ("mm: introduce a new page type for page pool in
page type") added a page_type field to struct net_iov at the same
offset as struct page::page_type, so that page_pool_set_pp_info() can
call __SetPageNetpp() uniformly on both pages and net_iovs.
The page-type API requires the field to hold the UINT_MAX "no type"
sentinel before a type can be set; for real struct page that invariant
is established by the page allocator on free. struct net_iov is not
allocated through the page allocator, so the field is left as zero
(io_uring zcrx, which uses __GFP_ZERO) or as slab garbage (devmem,
which uses kvmalloc_objs() without zeroing). When the page pool then
calls page_pool_set_pp_info() on a freshly-bound niov,
__SetPageNetpp()'s VM_BUG_ON_PAGE(page->page_type != UINT_MAX) fires
and the kernel BUGs. Triggered in selftests by io_uring zcrx setup
through the fbnic queue restart path:
The same path is reachable through devmem dmabuf binding via
netdev_nl_bind_rx_doit() -> net_devmem_bind_dmabuf_to_queue().
Add a net_iov_init() helper that stamps ->owner, ->type and the
->page_type sentinel, and use it from both the devmem and io_uring
zcrx niov init loops.
Fixes: db359fccf212 ("mm: introduce a new page type for page pool in page type") Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20260428025320.853452-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Weiming Shi [Mon, 27 Apr 2026 12:34:50 +0000 (14:34 +0200)]
netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
nft_fwd_neigh can be used in egress chains (NF_NETDEV_EGRESS). When the
forwarding rule targets the same device or two devices forward to each
other, neigh_xmit() triggers dev_queue_xmit() which re-enters
nf_hook_egress(), causing infinite recursion and stack overflow.
Move the nf_get_nf_dup_skb_recursion() accessor and NF_RECURSION_LIMIT
to the shared header nf_dup_netdev.h as a static inline, so that
nft_fwd_netdev can use the recursion counter directly without exported
function call overhead. Guard neigh_xmit() with the same recursion
limit already used in nf_do_netdev_egress().
[ Updated to cache the nf_get_nf_dup_skb_recursion pointer. --pablo ]
Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
The ttl field has been decremented already and evaluation of this rule
would proceed, just drop this packet instead if there is no destination
device to forwards this packet. This is exactly what nf_dup already does
in this case.
Moreover, check for headroom and call skb_expand_head() like in the IP
output path to ensure there is sufficient headroom when forwarding this
via neigh_xmit().
Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: replace skb_try_make_writable() by skb_ensure_writable()
skb_try_make_writable() only works on clones and uncloned packets might
have their network header in paged fragments.
nft_fwd needs to work for the ingress and egress hooks, but the egress
hook where skb->data points to the mac header, use skb_network_offset()
to include the mac header. The flowtable is fine since it already uses
the transport offset.
Fixes: d32de98ea70f ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer") Fixes: 7d2086871762 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On L1VH, debugfs stats pages are overlay pages: the kernel allocates
them and registers the GPAs with the hypervisor via
HVCALL_MAP_STATS_PAGE2. These overlay mappings persist in the
hypervisor across kexec. If the kexec'd kernel reuses those physical
pages, the hypervisor's overlay semantics cause a machine check
exception.
Fix this by calling mshv_debugfs_exit() from the reboot notifier,
which issues HVCALL_UNMAP_STATS_PAGE for each mapped stats page before
kexec. This releases the overlay bindings so the physical pages can be
safely reused. Guard mshv_debugfs_exit() against being called when
init failed.
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
The reboot notifier that tears down the SynIC cpuhp state guards the
cleanup with hv_root_partition(), so on L1VH (where
hv_root_partition() is false) SINT0, SINT5, and SIRBP are never
cleaned up before kexec. The kexec'd kernel then inherits stale
unmasked SINTs and an enabled SIRBP pointing to freed memory.
Remove the hv_root_partition() guard so the cleanup runs for all
parent partitions.
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
mshv: limit SynIC management to MSHV-owned resources
The SynIC is shared between VMBus and MSHV. VMBus owns the message
page (SIMP), event flags page (SIEFP), global enable (SCONTROL),
and SINT2. MSHV adds SINT0, SINT5, and the event ring page (SIRBP).
Currently mshv_synic_cpu_init() redundantly enables SIMP, SIEFP, and
SCONTROL that VMBus already configured, and mshv_synic_cpu_exit()
disables all of them. This is wrong because MSHV can be torn down
while VMBus is still active. In particular, a kexec reboot notifier
tears down MSHV first. Disabling SCONTROL, SIMP, and SIEFP out
from under VMBus causes its later cleanup to write SynIC MSRs while
SynIC is disabled, which the hypervisor does not tolerate.
Restrict MSHV to managing only the resources it owns:
- SINT0, SINT5: mask on cleanup, unmask on init
- SIRBP: enable/disable as before
- SIMP, SIEFP, SCONTROL: leave to VMBus when it is active (L1VH
and nested root partition); on a non-nested root partition VMBus
does not run, so MSHV must enable/disable them
While here, fix the SIEFP and SIRBP memremap() and virt_to_phys()
calls to use HV_HYP_PAGE_SHIFT/HV_HYP_PAGE_SIZE instead of
PAGE_SHIFT/PAGE_SIZE. The hypervisor always uses 4K pages for SynIC
register GPAs regardless of the kernel page size, so using PAGE_SHIFT
produces wrong addresses on ARM64 with 64K pages.
Note that initialization order matters - VMBUS first, MSHV second,
and the reverse on de-init. Ideally, we would want a dedicated SYNIC
driver that replaces the cross-dependencies with a clear API and
dynamic tracking. Such refactor should go into its own dedicated
series, outside of this kexec fix series.
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
Shyam Prasad N [Mon, 30 Mar 2026 10:49:59 +0000 (16:19 +0530)]
cifs: change_conf needs to be called for session setup
Today we skip calling change_conf for negotiates and session setup
requests. This can be a problem for mchan as the immediate next call
after session setup could be due to an I/O that is made on the
mount point. For single channel, this is not a problem as
there will be several calls after setting up session.
This change enforces calling change_conf when the total credits contain
enough for reservations for echoes and oplocks. We expect this to happen
during the last session setup response. This way, echoes and oplocks are
not disabled before the first request to the server. So if that first
request is an open, it does not need to disable requesting leases.
Cc: <stable@vger.kernel.org> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: change allocation requirements in smb2_compound_op
Currently, smb2_compound_op() allocates
struct smb2_compound_vars *vars using GFP_ATOMIC, although
smb2_compound_op() can sleep when it calls compound_send_recv()
before vars is freed.
Allocate vars using GFP_KERNEL.
Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
hv: utils: replace deprecated strcpy with strscpy in kvp_register
strcpy() has been deprecated [1] because it performs no bounds checking
on the destination buffer, which can lead to buffer overflows. While the
current code works correctly, replace strcpy() with the safer strscpy()
to follow secure coding best practices. Use ->body.kvp_register.version
directly as the destination buffer and remove the local variable.
Hyunchul Lee [Tue, 28 Apr 2026 01:34:10 +0000 (10:34 +0900)]
ntfs: drop nlink once for WIN32/DOS aliases
NTFS could store a filename as paired WIN32 and DOS $FILE_NAME attributes
for directories. But ntfs_delete() deleted both attributes for unlinking
a directory, but it also called drop_nlink() for each attributes.
This could trigger warnings when unlinking directories.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
hv: utils: handle and propagate errors in kvp_register
Make kvp_register() return an error code instead of silently ignoring
failures, and propagate the error from kvp_handle_handshake() instead of
returning success.
This propagates both kzalloc_obj() and hvutil_transport_send() failures
to kvp_handle_handshake() and thus to kvp_on_msg().
Fixes: 245ba56a52a3 ("Staging: hv: Implement key/value pair (KVP)") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
Steven Rostedt [Tue, 28 Apr 2026 16:23:02 +0000 (12:23 -0400)]
tracing/probes: Limit size of event probe to 3K
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.
A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 93ccae7a22274 ("tracing/kprobes: Support basic types on dynamic events") Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
alloc_workqueue_va() forwards its va_list to __alloc_workqueue() which
ultimately feeds vsnprintf(). __alloc_workqueue() already carries
__printf(1, 0); the new wrapper needs the same annotation so format
string checking propagates through the forwarding.
Michael Guralnik [Mon, 27 Apr 2026 11:02:36 +0000 (14:02 +0300)]
RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
Raw Packet QPs are unique in that they support separate send and receive
queues, using 2 different user-provided buffers.
They can also be created with one of the queues having size 0, allowing
a send-only or receive-only QP.
The Raw Packet RQ umem is created in the common user QP creation path,
which allows zero-length queues. Add a later validation of the RQ umem
in Raw Packet QP creation path when an RQ was requested.
This prevents possible null-ptr dereference crashes, as seen in the
below trace:
Fixes: 0fb2ed66a14c ("IB/mlx5: Add create and destroy functionality for Raw Packet QP") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-5-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Michael Guralnik [Mon, 27 Apr 2026 11:02:35 +0000 (14:02 +0300)]
RDMA/core: Fix rereg_mr use-after-free race
When a driver creates a new MR during rereg_user_mr, a race window
exists between rdma_alloc_commit_uobject() for the new MR and the point
where the code reads that MR to populate the response keys.
A concurrent rereg_mr or destroy_mr could destroy the MR in this window
and cause UAF in the first thread.
Racing flow between two rereg_mr calls:
CPU0 CPU1
---- ----
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr0
mr1 = driver→rereg()
rdma_alloc_commit_uobject(mr1)
// mr1 replaced mr0 and is unlocked
uobj_put_destroy(mr0)
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr1
mr2 = driver→rereg()
rdma_alloc_commit_uobject(mr2)
// mr2 replaced mr1 and is unlocked
uobj_put_destroy(mr1)
// Destroys mr1!
resp.lkey = mr1->lkey; // UAF - mr1 was freed!
resp.rkey = mr1->rkey; // UAF - mr1 was freed!
Fix by storing lkey/rkey in local variables before the new MR is
unlocked and using the local variables to set the user response.
Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-4-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a
netlink request to the userspace daemon to perform IP-to-GID
resolution in certain cases. The function allocates the netlink message
buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the
size of size_t) instead of 16 bytes (the size of an IPv6 address).
This results in an 8-byte under-allocation.
This is currently masked by nlmsg_new() over-allocation of the skb
in its internal logic. However, the code remains incorrect.
Fix the issue by supplying the proper IPv6 address length to
nla_total_size().
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-3-4621fa52de0e@nvidia.com Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Edward Srouji [Mon, 27 Apr 2026 11:02:33 +0000 (14:02 +0300)]
RDMA/mlx5: Fix UAF in DCT destroy due to race with create
A potential race condition exists between mlx5_core_destroy_dct() and
mlx5_core_create_dct() that can lead to a use-after-free.
After _mlx5_core_destroy_dct() releases the DCT to firmware, the DCTN
can be immediately reallocated for a new DCT being created concurrently.
If the create path stores the new DCT in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new DCT's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created DCT.
Fixes: afff24899846 ("RDMA/mlx5: Handle DCT QP logic separately from low level QP interface") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-2-4621fa52de0e@nvidia.com Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Edward Srouji [Mon, 27 Apr 2026 11:02:32 +0000 (14:02 +0300)]
RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
A race condition exists between mlx5_cmd_destroy_srq() and
mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1].
After destroy_srq_split() releases the SRQ to firmware, the SRQN can be
immediately reallocated for a new SRQ being created concurrently. If the
create path stores the new SRQ in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new SRQ's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created SRQ.
Input: elan_i2c - validate firmware size before use
Ensure that the firmware file is large enough to contain the expected
number of pages and the signature (which resides at the end of the
firmware blob) before accessing them to prevent potential out-of-bounds
reads.
Jia Yao [Fri, 17 Apr 2026 05:59:17 +0000 (05:59 +0000)]
drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR
Add validation in xe_vm_bind_ioctl() to reject PAT indices
with XE_COH_NONE coherency mode when used with
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR.
CPU address mirror mappings use system memory that is CPU
cached, which makes them incompatible with COH_NONE PAT
indices. Allowing COH_NONE with CPU cached buffers is a
security risk, as the GPU may bypass CPU caches and read
stale sensitive data from DRAM.
Although CPU_ADDR_MIRROR does not create an immediate
mapping, the backing system memory is still CPU cached.
Apply the same PAT coherency restrictions as
DRM_XE_VM_BIND_OP_MAP_USERPTR.
v2:
- Correct fix tag
v6:
- No change
v7:
- Correct fix tag
v8:
- Rebase
v9:
- Limit the restrictions to iGPU
v10:
- Just add the iGPU logic but keep dGPU logic
Fixes: b43e864af0d4 ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR") Cc: <stable@vger.kernel.org> # v6.15+ Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260417055917.2027459-3-jia.yao@intel.com
(cherry picked from commit 4d58d7535e826a3175527b6174502f0db319d7f6) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Jia Yao [Fri, 17 Apr 2026 05:59:16 +0000 (05:59 +0000)]
drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise
Add validation in xe_vm_madvise_ioctl() to reject PAT indices with
XE_COH_NONE coherency mode when applied to CPU cached memory.
Using coh_none with CPU cached buffers is a security issue. When the
kernel clears pages before reallocation, the clear operation stays in
CPU cache (dirty). GPU with coh_none can bypass CPU caches and read
stale sensitive data directly from DRAM, potentially leaking data from
previously freed pages of other processes.
This aligns with the existing validation in vm_bind path
(xe_vm_bind_ioctl_validate_bo).
v2(Matthew brost)
- Add fixes
- Move one debug print to better place
v3(Matthew Auld)
- Should be drm/xe/uapi
- More Cc
v4(Shuicheng Lin)
- Fix kmem leak issues by the way
v5
- Remove kmem leak because it has been merged by another patch
v6
- Remove the fix which is not related to current fix
v7
- No change
v8
- Rebase
v9
- Limit the restrictions to iGPU
v10
- No change
Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Cc: <stable@vger.kernel.org> # v6.18+ Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260417055917.2027459-2-jia.yao@intel.com
(cherry picked from commit 016ccdb674b8c899940b3944952c96a6a490d10a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Shuicheng Lin [Fri, 17 Apr 2026 16:33:08 +0000 (16:33 +0000)]
drm/xe/gsc: Fix BO leak on error in query_compatibility_version()
When xe_gsc_read_out_header() fails, query_compatibility_version()
returns directly instead of jumping to the out_bo label. This skips
the xe_bo_unpin_map_no_vm() call, leaving the BO pinned and mapped
with no remaining reference to free it.
Fix by using goto out_bo so the error path properly cleans up the BO,
consistent with the other error handling in the same function.
Shuicheng Lin [Wed, 15 Apr 2026 22:54:28 +0000 (22:54 +0000)]
drm/xe/eustall: Fix drm_dev_put called before stream disable in close
In xe_eu_stall_stream_close(), drm_dev_put() is called before the
stream is disabled and its resources are freed. If this drops the
last reference, the device structures could be freed while the
subsequent cleanup code still accesses them, leading to a
use-after-free.
Fix this by moving drm_dev_put() after all device accesses are
complete. This matches the ordering in xe_oa_release().
Fixes: 9a0b11d4cf3b ("drm/xe/eustall: Add support to init, enable and disable EU stall sampling") Cc: Harish Chegondi <harish.chegondi@intel.com> Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20260415225428.3399934-1-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 35aff528f7297e949e5e19c9cd7fd748cf1cf21c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Shuicheng Lin [Wed, 8 Apr 2026 02:06:47 +0000 (02:06 +0000)]
drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()
Two error handling issues exist in xe_exec_queue_create_ioctl():
1. When xe_hw_engine_group_add_exec_queue() fails, the error path jumps
to put_exec_queue which skips xe_exec_queue_kill(). If the VM is in
preempt fence mode, xe_vm_add_compute_exec_queue() has already added
the queue to the VM's compute exec queue list. Skipping the kill
leaves the queue on that list, leading to a dangling pointer after
the queue is freed.
2. When xa_alloc() fails after xe_hw_engine_group_add_exec_queue() has
succeeded, the error path does not call
xe_hw_engine_group_del_exec_queue() to remove the queue from the hw
engine group list. The queue is then freed while still linked into
the hw engine group, causing a use-after-free.
Fix both by:
- Changing the xe_hw_engine_group_add_exec_queue() failure path to jump
to kill_exec_queue so that xe_exec_queue_kill() properly removes the
queue from the VM's compute list.
- Adding a del_hw_engine_group label before kill_exec_queue for the
xa_alloc() failure path, which removes the queue from the hw engine
group before proceeding with the rest of the cleanup.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:55 +0000 (17:52 +0000)]
drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()
When xe_dma_buf_init_obj() fails, the attachment from
dma_buf_dynamic_attach() is not detached. Add dma_buf_detach() before
returning the error. Note: we cannot use goto out_err here because
xe_dma_buf_init_obj() already frees bo on failure, and out_err would
double-free it.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:54 +0000 (17:52 +0000)]
drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure
When drm_gpuvm_resv_object_alloc() fails, the pre-allocated storage bo
is not freed. Add xe_bo_free(storage) before returning the error.
xe_dma_buf_init_obj() calls xe_bo_init_locked(), which frees the bo on
error. Therefore, xe_dma_buf_init_obj() must also free the bo on its own
error paths. Otherwise, since xe_gem_prime_import() cannot distinguish
whether the failure originated from xe_dma_buf_init_obj() or from
xe_bo_init_locked(), it cannot safely decide whether the bo should be
freed.
Add comments documenting the ownership semantics: on success, ownership
of storage is transferred to the returned drm_gem_object; on failure,
storage is freed before returning.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:53 +0000 (17:52 +0000)]
drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()
When XE_BO_FLAG_GGTT_ALL is set without XE_BO_FLAG_GGTT, the function
returns an error without freeing a caller-provided bo, violating the
documented contract that bo is freed on failure.
Shuicheng Lin [Wed, 8 Apr 2026 17:52:52 +0000 (17:52 +0000)]
drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()
When type is ttm_bo_type_device and aligned_size != size, the function
returns an error without freeing a caller-provided bo, violating the
documented contract that bo is freed on failure.
Shuicheng Lin [Thu, 9 Apr 2026 00:34:49 +0000 (00:34 +0000)]
drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked
xe_exec_queue_tlb_inval_last_fence_put_unlocked() uses q->vm->xe as the
first argument to xe_assert(). This function is called unconditionally
from xe_exec_queue_destroy() for all queues, including kernel queues
that have q->vm == NULL (e.g., queues created during GT init in
xe_gt_record_default_lrcs() with vm=NULL).
While current compilers optimize away the q->vm->xe dereference (even
in CONFIG_DRM_XE_DEBUG=y builds, the compiler pushes the dereference
into the WARN branch that is only taken when the assert condition is
false), the code is semantically incorrect and constitutes undefined
behavior in the C abstract machine for the NULL pointer case.
Use gt_to_xe(q->gt) instead, which is always valid for any exec queue.
This is consistent with how xe_exec_queue_destroy() itself obtains the
xe_device pointer in its own xe_assert at the top of the function.
drm/xe/vf: Use drm mm instead of drm sa for CCS read/write
The suballocator algorithm tracks a hole cursor at the last allocation
and tries to allocate after it. This is optimized for fence-ordered
progress, where older allocations are expected to become reusable first.
In fence-enabled mode, that ordering assumption holds. In fence-disabled
mode, allocations may be freed in arbitrary order, so limiting allocation
to the current hole window can miss valid free space and fail allocations
despite sufficient total space.
Use DRM memory manager instead of sub-allocator to get rid of this issue
as CCS read/write operations do not use fences.
Fixes: 864690cf4dd6 ("drm/xe/vf: Attach and detach CCS copy commands with BO") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-6-satyanarayana.k.v.p@intel.com
(cherry picked from commit 6c84b493012aeb05dec29c709377bf0e17ac6815) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add a memory pool to allocate sub-ranges from a BO-backed pool
using drm_mm.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-5-satyanarayana.k.v.p@intel.com
(cherry picked from commit 1ce3229f8f269a245ff3b8c65ffae36b4d6afb93) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
firmware: arm_ffa: Unregister bus notifier on teardown for FF-A v1.0
For FF-A v1.0 the driver registers a bus notifier to backfill UUID
matching, but the notifier was never unregistered on cleanup paths.
Track the registration state and unregister it during teardown and early
partition-setup failure.
firmware: arm_ffa: Fix per-vcpu self notifications handling in workqueue
Per-vcpu notification handling already runs from a per-cpu work item on
the target cpu. Routing that path back through smp_call_function_single()
re-enters the call-function IPI path and executes the notification
handler with interrupts disabled. That makes the framework path unsafe,
since it takes a mutex, allocates memory with GFP_KERNEL, and invokes
client callbacks.
Handle per-vcpu self notifications directly from the existing per-cpu
work item instead. This keeps the per-vcpu path in task context and
avoids the extra IPI hop entirely.
firmware: arm_ffa: Avoid collapsing NPI work from different CPUs
Notification pending interrupts are registered as per-CPU IRQs, but the
driver queues all NPI handling through a single shared work_struct.
That allows queue_work_on() calls from different CPUs to collapse onto a
single pending work item even though the work function uses the CPU it
runs on to fetch and handle per-CPU notifications.
Move notif_pcpu_work into the per-CPU ffa_pcpu_irq state and initialize
one work item per CPU. This keeps NPI handling independent per CPU and
avoids losing notifications when multiple CPUs queue work concurrently.
firmware: arm_ffa: Skip free_pages on RX buffer alloc failure
If the RX buffer allocation fails in ffa_init(), the error path jumps to
free_pages even though no buffer has been allocated yet. Route that case
directly to free_drv_info so the cleanup path is only used after at
least one RX/TX buffer allocation has succeeded.
firmware: arm_ffa: Check for NULL FF-A ID table while driver registration
The bus match callback assumes that every FF-A driver provides an
id_table and dereferences it unconditionally. Enforce that contract at
registration time so a buggy client driver cannot crash the bus during
match.
Matt Roper [Wed, 8 Apr 2026 22:27:44 +0000 (15:27 -0700)]
drm/xe/debugfs: Correct printing of register whitelist ranges
The register-save-restore debugfs prints whitelist entries as offset
ranges. E.g.,
REG[0x39319c-0x39319f]: allow read access
for a single dword-sized register. However the GENMASK value used to
set the lower bits to '1' for the upper bound of the whitelist range
incorrectly included one more bit than it should have, causing the
whitelist ranges to sometimes appear twice as large as they really were.
For example,
REG[0x6210-0x6217]: allow rw access
was also intended to be a single dword-sized register whitelist (with a
range 0x6210-0x6213) but was printed incorrectly as a qword-sized range
because one too many bits was flipped on. Similar 'off by one' logic
was applied when printing 4-dword register ranges and 64-dword register
ranges as well.
Correct the GENMASK logic to print these ranges in debugfs correctly.
No impact outside of correcting the misleading debugfs output.
Matt Roper [Fri, 10 Apr 2026 22:50:30 +0000 (15:50 -0700)]
drm/xe: Mark ROW_CHICKEN5 as a masked register
ROW_CHICKEN5 is a masked register (i.e., to adjust the value of any of
the lower 16 bits, the corresponding bit in the upper 16 bits must also
be set). Add the XE_REG_OPTION_MASKED to its definition; failure to do
so will cause workaround updates of this register to not apply properly.
Matt Roper [Fri, 10 Apr 2026 22:50:29 +0000 (15:50 -0700)]
drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL
From Xe2 onward (i.e., all platforms officially supported by the Xe
driver), the GAMSTLB_CTRL register is located at offset 0x477C and
represented by the macro "GAMSTLB_CTRL" in code. However the register
formerly resided at offset 0xCF4C on Xe1-era platforms, and we also have
macro XEHP_GAMSTLB_CTRL that represents this old offset in the
unofficial/developer-only Xe1 code. When tuning for the register was
added for Xe3p_LPG, the old Xe1-era macro was accidentally used instead
of the proper macro for Xe2 and beyond, causing the tuning to not be
applied properly. Use the proper definition so that the correct offset
is written to.
drm/xe/xe3p_lpg: Add missing indirect ring state feature flag
Even though commit 8fcb7dfb8bbf ("drm/xe/xe3p_lpg: Add support for
graphics IP 35.10") mentions that the support for Indirect Ring State
exists for Xe3p_LPG, it missed actually setting the feature flag in
graphics_xe3p_lpg. Fix that by adding the missing member.
Matt Roper [Wed, 1 Apr 2026 20:12:44 +0000 (13:12 -0700)]
drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138
There appears to have been a silent merge conflict between some commits
updating the workaround tables on Xe's -fixes and -next branches:
- Commit bc6387a2e0c1 ("drm/xe/xe2_hpg: Fix handling of Wa_14019988906
& Wa_14019877138") from the fixes branch moved the Xe2_HPG instance
of two workarounds touching the PSS_CHICKEN register from the
engine_was[] table to the lrc_was[] table; the equivalent
implementation for all other platforms/IPs were already properly
located on lrc_was[]. This commit on the fixes branch is a
cherry-pick of commit e04c609eedf4 ("drm/xe/xe2_hpg: Fix handling of
Wa_14019988906 & Wa_14019877138") that already existed on the next
branch.
- Commit 55b19abb6c44 ("drm/xe: Consolidate workaround entries for
Wa_14019877138") and commit c2142a1a8415 ("drm/xe: Consolidate
workaround entries for Wa_14019988906") consolidated the individual
entries per IP generation for each workaround into single, larger
range-based entries.
During merge conflict resolution the Xe2_HPG-specific entries (i.e.,
those with rule "GRAPHICS_VERSION_RANGE(2001, 2002)") were accidentally
resurrected, even though the table already contains the consolidated
entries that match a superset of thse ranges. These redundant entries
don't cause any build failures but do trigger a dmesg error during probe
on BMG-G21 devices:
Matthew Brost [Thu, 26 Mar 2026 21:01:16 +0000 (14:01 -0700)]
drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
xe_guc_submit_wedge() runs in the DMA-fence signaling path, where
GFP_KERNEL memory allocations are not permitted. However, registering
guc_submit_wedged_fini via drmm_add_action_or_reset() triggers such an
allocation.
Avoid this by moving the logic from guc_submit_wedged_fini() into
guc_submit_fini(), where wedged exec queue references are dropped during
normal teardown.
DaeMyung Kang [Sat, 25 Apr 2026 09:38:28 +0000 (18:38 +0900)]
ksmbd: rewrite stop_sessions() with restartable iteration
stop_sessions() walks conn_list with hash_for_each() and, for every
entry, drops conn_list_lock across the transport ->shutdown() call
before re-acquiring the read lock to continue the loop. The hash
walk relies on cross-iteration state (the current bucket and the
hlist position), which is not preserved across unlock/relock: if
another thread performs a list mutation during the unlocked window,
the ongoing iteration becomes unreliable and can re-visit
connections that have already been handled or skip connections that
have not. The outer `if (!hash_empty(conn_list)) goto again;` retry
masks the symptom in the common case but does not address the
unsafe iteration itself.
Reframe the loop so it never relies on iterator state across
unlock/relock. Under conn_list_lock held for read, pick the first
connection whose ->shutdown() has not yet been issued by this path,
pin it by taking an extra reference, record that fact on the
connection and mark it EXITING while still inside the locked walk,
then drop the lock. Then call ->shutdown() outside the lock, drop
the pin (freeing the connection if the handler already released its
reference), and restart from the top.
Use a new per-connection flag, conn->stop_called, as the "shutdown
issued from stop_sessions()" marker rather than reusing the status
state. ksmbd_conn_set_exiting() is also invoked by
ksmbd_sessions_deregister() on sibling channels of a multichannel
session without issuing a transport shutdown, so treating
KSMBD_SESS_EXITING as "already handled here" would skip connections
that still need shutdown() to wake their handler out of recv(),
leaving the outer retry waiting indefinitely for the hash to drain.
stop_sessions() is serialised by init_lock in
ksmbd_conn_transport_destroy(), so writing stop_called under the
read lock has no other writer.
Set EXITING inside the locked walk so the selection, the stop_called
marker, and the status transition all happen together, and guard
against regressing a connection that has already advanced to
KSMBD_SESS_RELEASING on its own (for example, if the handler exited
its receive loop for an unrelated reason between teardown steps).
When the pin drop is the last put, release the transport and pair
ida_destroy(&target->async_ida) with the ida_init() done in
ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its
own does not leak the xarray backing of the embedded async_ida.
The outer retry with msleep() is kept to wait for handler threads to
reach ksmbd_conn_free() and drain the hash.
Observed with an instrumented build that logs one line per visit and
widens the unlocked window before ->shutdown() by 200 ms, under
five concurrent cifs mounts (nosharesock, one connection each):
* Current code: the same connection address is revisited many
times during a single stop_sessions() call and ->shutdown() is
invoked well beyond the number of live connections before the
hash finally drains.
* Rewritten code: each live connection produces exactly one
->shutdown() call; the function returns as soon as the hash is
empty.
Functional teardown via `ksmbd.control --shutdown` with the same
five mounts completes cleanly on the rewritten path.
Performance is observably unchanged. Tearing down N concurrent
nosharesock cifs connections with `ksmbd.control --shutdown` +
`rmmod ksmbd` takes essentially the same wall time before and after
the rewrite:
N before after
10 4.93s 5.34s
30 7.34s 7.03s
50 7.31s 7.01s (3-run avg: 7.04s vs 7.25s)
100 6.98s 6.78s
200 6.77s 6.89s
and the number of ->shutdown() calls equals the number of live
connections on both paths when the race is not widened. The
teardown is dominated by the msleep(100)-based outer retry waiting
for handler threads to run ksmbd_conn_free(), not by the iteration
itself; the restartable loop's worst-case O(N^2) visit cost is in
the microseconds even at N=200 and sits far below the msleep(100)
granularity.
Applied alone on top of ksmbd-for-next-next, this patch does not
introduce a new leak site. Under the same reproducer (10x
concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the
tree still shows the pre-existing per-connection transport leak
count that arises when the last refcount drop lands in one of
ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() -
all of which end with a bare kfree() today. kmemleak backtraces
for the unreferenced objects point into the TCP accept path
(sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none
involve stop_sessions(). Plugging those bare-kfree sites is the
responsibility of the follow-up patch.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"On top of a lot of Arm fixes, this includes a massive rename of types
and variables in tools/testing/selftests/kvm - these were
unnecessarily different from what the kernel uses, so they're being
made consistent.
arm64:
- Allow tracing for non-pKVM, which was accidentally disabled when
the series was merged
- Rationalise the way the pKVM hypercall ranges are defined by using
the same mechanism as already used for the vcpu_sysreg enum
- Enforce that SMCCC function numbers relayed by the pKVM proxy are
actually compliant with the specification
- Fix a couple of feature to idreg mappings which resulted in the
wrong sanitisation being applied
- Fix the GICD_IIDR revision number field that could never been
written correctly by userspace
- Make kvm_vcpu_initialized() correctly use its parameter instead of
relying on the surrounding context
- Enforce correct ordering in __pkvm_init_vcpu(), plugging a
potential pin leak at the same time
- Move __pkvm_init_finalise() to a less dangerous spot, avoiding
future problems
- Restore functional userspace irqchip support after a four year
breakage (last functional kernel was 5.18...)
- Spelling fixes
Selftests:
- Rename types across all KVM selftests to more closely align with
types used in the kernel:
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits)
KVM: selftests: Add check_steal_time_uapi() implementation for LoongArch
KVM: arm64: Wake-up from WFI when iqrchip is in userspace
KVM: arm64: Fix initialisation order in __pkvm_init_finalise()
KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu()
KVM: arm64: Fix kvm_vcpu_initialized() macro parameter
KVM: arm64: Fix FEAT_SPE_FnE to use PMSIDR_EL1.FnE, not PMSVer
KVM: arm64: Fix typo in feature check comments
KVM: arm64: Fix FEAT_Debugv8p9 to check DebugVer, not PMUVer
KVM: arm64: Reject non compliant SMCCC function calls in pKVM
KVM: arm64: vgic: Fix IIDR revision field extracted from wrong value
KVM: selftests: Replace "paddr" with "gpa" throughout
KVM: selftests: Replace "u64 nested_paddr" with "gpa_t l2_gpa"
KVM: selftests: Replace "u64 gpa" with "gpa_t" throughout
KVM: selftests: Replace "vaddr" with "gva" throughout
KVM: selftests: Clarify that arm64's inject_uer() takes a host PA, not a guest PA
KVM: selftests: Rename translate_to_host_paddr() => translate_hva_to_hpa()
KVM: selftests: Rename vm_vaddr_populate_bitmap() => vm_populate_gva_bitmap()
KVM: selftests: Rename vm_vaddr_unused_gap() => vm_unused_gva_gap()
KVM: selftests: Drop "vaddr_" from APIs that allocate memory for a given VM
KVM: selftests: Use u8 instead of uint8_t
...
Michal Kosiorek [Wed, 29 Apr 2026 08:54:51 +0000 (10:54 +0200)]
xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete
KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s
hlist_del_rcu calls under syzkaller load on linux-6.12.y stable
(reproduced on 6.12.47, also reachable via the same code path on
torvalds/master and on the ipsec tree). Nine unique signatures cluster
in the xfrm_state lifecycle, the load-bearing one being:
BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline]
BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c
Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435
The other observed signatures hit the same slab object from
__xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB
write variant of __xfrm_state_delete, all on the byseq/byspi
hash chains.
__xfrm_state_delete() guards its byseq and byspi unhashes with
value-based predicates:
if (x->km.seq)
hlist_del_rcu(&x->byseq);
if (x->id.spi)
hlist_del_rcu(&x->byspi);
while everywhere else in the file (e.g. state_cache, state_cache_input)
the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets
x->id.spi = newspi inside xfrm_state_lock and then immediately inserts
into byspi, but a path that observes x->id.spi != 0 outside of
xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently
with whether x is actually on the list. The same holds for x->km.seq
versus byseq, and the bydst/bysrc unhashes have no predicate at all,
so a second __xfrm_state_delete() on the same object writes through
LIST_POISON pprev.
The defensive change here:
- Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst,
bysrc, byseq and byspi so a second deletion is a no-op rather
than a write through LIST_POISON pprev. The byseq/byspi nodes
are already initialised in xfrm_state_alloc().
- Test hlist_unhashed() rather than the value predicate for
byseq/byspi, so the unhash decision tracks list state rather than
mutable scalar fields.
Empirical verification: applied this patch on top of v6.12.47, rebuilt,
and re-ran the same syzkaller harness for 1h16m on a previously-crashy
configuration that produced ~100 hits each of slab-use-after-free
Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in
__xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at
~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo
confirms the xfrm_state slab is actively allocated and freed during
the run (~143 KiB resident), so the fuzzer is still exercising those
code paths -- they just no longer crash.
Reproduction:
- Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV
- syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db
- 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal
- 9 unique signatures collected in ~9h, all within xfrm_state
lifecycle
Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") Fixes: 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI is zero to the SPI hash.") Reported-by: Michal Kosiorek <mkosiorek121@gmail.com> Tested-by: Michal Kosiorek <mkosiorek121@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Michal Kosiorek <mkosiorek121@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Task B acquires both hb locks and attempts to acquire the PI-lock of the
top most waiter (task B). Task A is leaving early due to a signal/
timeout and started removing itself from the queue. It updates its
requeue_state but can not remove it from the list because this requires
the hb lock which is owned by task B.
Usually task A is able to swoop the lock after task B unlocked it.
However if task B is of higher priority then task A may not be able to
wake up in time and acquire the lock before task B gets it again.
Especially on a UP system where A is never scheduled.
As a result task A blocks on the lock and task B busy loops, trying to
make progress but live locks the system instead. Tragic.
This can be fixed by removing the top most waiter from the list in this
case. This allows task B to grab the next top waiter (if any) in the
next iteration and make progress.
Remove the top most waiter if futex_requeue_pi_prepare() fails.
Let the waiter conditionally remove itself from the list in
handle_early_requeue_pi_wakeup().
Fixes: 07d91ef510fb1 ("futex: Prevent requeue_pi() lock nesting issue on RT") Reported-by: Moritz Klammler <Moritz.Klammler@ferchau.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260428103425.dywXyPd3@linutronix.de Closes: https://lore.kernel.org/all/VE1PR06MB6894BE61C173D802365BE19DFF4CA@VE1PR06MB6894.eurprd06.prod.outlook.com
WANG Rui [Mon, 27 Apr 2026 08:47:21 +0000 (16:47 +0800)]
efi/libstub: Synchronize instruction cache after kernel relocation
The relocated kernel image is copied to its new location using memcpy().
On architectures with separate instruction and data caches, the copied
instructions may remain stale in the instruction cache, leading to the
execution of outdated contents.
Call efi_cache_sync_image() after the relocation copy to ensure the
instruction cache is synchronized with the updated memory contents before
control is transferred to the relocated kernel.
efi/libstub: Move efi_relocate_kernel() into its only remaining user
LoongArch is the only arch that still uses efi_relocate_kernel(), so
before making changes to it that LoongArch needs, turn it into a private
function. Move efi_low_alloc_above() into mem.c while at it, and drop
the relocate.c source file altogether.
Tested-by: WANG Rui <wangrui@loongson.cn> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
ALSA: hda/tas2781: Fix incorrect bit update for non-book-zero or book 0 pages >1
In TAS2781 SPI mode, when accessing non-book-zero or page numbers greater
than 1 in book 0, an additional byte must be read. The first byte in such
cases is a dummy byte and should be ignored.
ALSA: hda: cs35l56: Fix uninitialized value in cs35l56_hda_read_acpi()
Eliminate the uninitialized 'nval' in cs35l56_hda_read_acpi() if a
system-specific quirk overrides processing of the dev-index property.
The value is now stored in a new 'num_amps' member of struct cs35l56_hda
so that the quirk handler can set the value.
The quirk for the Lenovo Yoga Book 9i GenX replaces the values from the
dev-index property with hardcoded indexes. So cs35l56_hda_read_acpi() would
then skip reading the property. But this left the 'nval' local variable
uninitialized when it is later passed to cirrus_scodec_get_speaker_id().
Fixes: 40b1c2f9b299 ("ALSA: hda/cs35l56: Workaround bad dev-index on Lenovo Yoga Book 9i GenX") Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/linux-sound/aenFesLAStjrVNy8@stanley.mountain/T/#u Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20260428130531.169600-1-rf@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
ALSA: hda/conexant: Fix missing error check for jack detection
In cx_probe(), the return value of snd_hda_jack_detect_enable_callback()
is ignored. This function returns a pointer, and if it fails (e.g., due
to memory allocation failure), it returns an error pointer which must
be checked using IS_ERR().
If the registration fails, the driver continues to probe, but the jack
detection callback will not be registered. This can lead to a kernel
crash later when the driver attempts to handle jack events or accesses
the uninitialized structure.
Check the return value using IS_ERR() and propagate the error via
PTR_ERR() to the probe caller.
ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks
At parsing the channel mapping for HDMI, the current code may spew
WARN_ON() unnecessarily for the case where only invalid (zero) channel
maps are given from the hardware. Drop WARN_ON() and reorganize the
code a bit for avoiding the hdmi_slot over the array size.
Johan Hovold [Tue, 7 Apr 2026 09:50:27 +0000 (11:50 +0200)]
clk: rk808: fix OF node reference imbalance
The driver reuses the OF node of the parent multi-function device but
fails to take another reference to balance the one dropped by the
platform bus code when unbinding the MFD and deregistering the child
devices.
Fix this by using the intended helper for reusing OF nodes.
Fixes: 2dc51ca822e4 ("clk: RK808: Reduce 'struct rk808' usage") Cc: stable@vger.kernel.org # 6.5 Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Brian Masney <bmasney@redhat.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Brian Masney [Wed, 15 Apr 2026 20:30:49 +0000 (16:30 -0400)]
MAINTAINERS: add myself as a reviewer for the clk subsystem
I've reviewed a lot clk patches for parts of the subsystem that
typically doesn't get much review. Add myself as a reviewer so that I
don't miss anything.
ASoC: spacemit: adjust FIFO trigger threshold to half FIFO size
Set both TX and RX FIFO trigger thresholds (TFT/RFT) to 0xF (half of
the 32-entry FIFO) instead of 5. This provides better DMA efficiency
by allowing more data to accumulate before triggering a DMA request,
reducing the number of DMA transactions needed.
ASoC: spacemit: move hw constraints from hw_params to startup
Hardware constraints should be applied in the startup callback rather
than hw_params, as hw_params may be called too late for the constraints
to take effect properly.
Move the channel count and format constraints for I2S and DSP_A/DSP_B
modes into a new startup callback. This also tightens the I2S mode
channel constraint from 1-2 to exactly 2, matching the actual hardware
behavior.
Théo Lebrun [Wed, 25 Feb 2026 16:55:07 +0000 (17:55 +0100)]
reset: eyeq: drop device_set_of_node_from_dev() done by parent
Our parent driver (clk-eyeq) now does the
device_set_of_node_from_dev(dev, dev->parent)
call through the newly introduced devm_auxiliary_device_create() helper.
Doing it again in the reset-eyeq probe would be redundant.
Drop both the WARN_ON() and the device_set_of_node_from_dev() call.
Also fix the following comment that talks about "our newfound OF node".
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
clk: spacemit: k3: mark top_dclk as CLK_IS_CRITICAL
top_dclk is the DDR bus clock. If it is gated by clk_disable_unused,
all memory-mapped bus transactions cease to function, causing DMA
engines to hang and general system instability.
Mark it CLK_IS_CRITICAL so the CCF never gates it during the
unused clock sweep.
Fixes: e371a77255b8 ("clk: spacemit: k3: add the clock tree") Reviewed-by: Brian Masney <bmasney@redhat.com> Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
mptcp: pm: kernel: reset fullmesh counter after flush
This variable counts how many MPTCP endpoints have a 'fullmesh' flag
set. After having flushed all MPTCP endpoints, it is then needed to
reset this counter.
Without this reset, this counter exposed to the userspace is wrong, but
also non-fullmesh endpoints added after the flush will not be taken into
account to create subflows in reaction to ADD_ADDRs.