git.ipfire.org Git - thirdparty/kernel/linux.git/log

macsec: don't read an unset MAC header in macsec_encrypt()

macsec_encrypt() reads the Ethernet header via eth_hdr(skb)
(skb->head + skb->mac_header) to memmove() the 12 source/destination MAC
bytes forward and make room for the SecTAG.

On the AF_PACKET SOCK_RAW + PACKET_QDISC_BYPASS transmit path the skb
reaches the macsec ndo_start_xmit() with the MAC header unset, so
eth_hdr(skb) resolves to skb->head + (u16)~0 and the read is out of
bounds: a 12-byte heap over-read that is also emitted on the wire as the
frame's outer source/destination MAC. KASAN reports a slab-out-of-bounds
read in macsec_start_xmit() on 6.0; on current mainline a CONFIG_DEBUG_NET
build flags it as an unset mac header in skb_mac_header().

On the TX path the L2 header is at skb->data, so use skb_eth_hdr(), added
by commit 96cc4b69581d ("macvlan: do not assume mac_header is set in
macvlan_broadcast()") for exactly this purpose.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Cc: stable@vger.kernel.org
Signed-off-by: Daehyeon Ko <4ncienth@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260703083634.2035145-1-4ncienth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dibs: loopback: validate offset and size in move_data()

The loopback move_data() performs a memcpy into the registered DMB
without checking whether offset + size exceeds the DMB length. Unlike
real ISM hardware, which enforces memory region bounds natively, the
software loopback has no such protection.

A peer-supplied out-of-bounds offset or oversized write would result in
an OOB write past the allocated kernel buffer. Add an explicit bounds
check before the memcpy to reject such requests with -EINVAL.

Fixes: f7a22071dbf3 ("net/smc: implement DMB-related operations of loopback-ism")
Cc: stable@vger.kernel.org
Reported-by: Federico Kirschbaum <federico.kirschbaum@xbow.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reported-by: Baul Lee <baul.lee@xbow.com>
Link: https://patch.msgid.link/20260707074318.1448662-1-dust.li@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/xe/userptr: Stub notifier_lock helpers when DRM_GPUSVM=n

When CONFIG_DRM_GPUSVM=n (e.g. um-allyesconfig), the only caller of
xe_pt_svm_userptr_notifier_lock() is compiled out, triggering:

  drivers/gpu/drm/xe/xe_pt.c:1418:13: warning:
    'xe_pt_svm_userptr_notifier_lock' defined but not used
    [-Wunused-function]

The helpers cannot simply be removed in this case: the matching
xe_pt_svm_userptr_notifier_unlock() is also referenced from
xe_pt_update_ops_run(), which lives outside any DRM_GPUSVM ifdef and is
gated only at runtime by pt_update_ops->needs_svm_lock. The symbol must
exist in all builds.

Provide empty static inline stubs for !DRM_GPUSVM, matching the pattern
used by xe_svm_notifier_lock()/_unlock() in xe_svm.h.

Fixes: dca6e08c923a ("drm/xe/userptr: Hold notifier_lock for write on inject test path")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606302210.QqcLbOEN-lkp@intel.com/
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260630192221.2998168-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 3359422bf0a1140e96d783a19a397686e580a3ca)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: free madvise VMA array on L2 flush failure

xe_vm_madvise_ioctl() allocates madvise_range.vmas in get_vmas().
After get_vmas() succeeds with at least one VMA, error paths must go
through free_vmas so the array is released before the madvise details are
destroyed.

The L2 flush validation path added for PAT madvise rejects some
SVM/userptr ranges after get_vmas() has succeeded, but jumps directly to
madv_fini. This skips kfree(madvise_range.vmas), leaking the VMA array on
each failed ioctl.

Jump to free_vmas instead, matching the other validation failure paths
after get_vmas() has succeeded.

Fixes: 4f39a194d41e ("drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260708073422.725186-1-lgs201920130244@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit c3a1c3579b1250060da73507a4acef712974c78a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: remove duplicate <kunit/test-bug.h> include

xe_pci.c includes <kunit/test-bug.h> twice, separated only by the
<kunit/test.h> include. Drop the redundant second include; this is a
non-functional cleanup flagged by scripts/checkincludes.pl.

Fixes: 6cad22853cb8 ("drm/xe/kunit: Add stub to read_gmdid")
Signed-off-by: Anas Khan <anxkhn28@gmail.com>
Link: https://patch.msgid.link/20260702112820.34675-1-anxkhn28@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 84ed5b0a925721aaf069d36e18a99db966ff4e80)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Wait on external BO kernel fences in exec IOCTL

Before arming a user job, xe_exec_ioctl() only added the VM's
dma-resv KERNEL slot as a dependency. That slot covers rebinds and
the kernel operations of the VM's private BOs, but not external BOs
(bo->vm == NULL), which carry their kernel operations (evictions,
moves, ...) in their own dma-resv KERNEL slot.

The DMA_RESV_USAGE_KERNEL slot is the cross-driver contract for
memory management operations that must complete before the BO or its
backing store may be used: any accessor is required to wait on the
KERNEL fences before touching the resv. By skipping the external BOs'
KERNEL slots, the exec path violated that contract and could schedule
a user job while a kernel operation on an external BO mapped by the VM
was still in flight, racing against it and potentially reading or
writing memory that was being moved.

Replace the VM-only dependency with an iteration over every object
locked by the exec, adding each object's KERNEL slot as a job
dependency. This covers the VM resv (rebinds and private BOs) as well
as every external BO, mirroring the drm_gpuvm_resv_add_fence() call
that later publishes the job fence to the same set of objects.
Long-running mode continues to skip this, as before.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Assisted-by: GitHub_Copilot:claude-opus-4.8
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260702215805.4011228-1-matthew.brost@intel.com
(cherry picked from commit a6b842acf3ddd1efc53a56de9260cfa718fb35e7)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Fix PTE index in xe_vm_populate_pgtable() for chunked binds

xe_vm_populate_pgtable() indexed the source PTE array (update->pt_entries)
by the per-call loop counter, assuming each call starts at the first entry
of the update. That holds for the CPU bind path
(xe_migrate_update_pgtables_cpu), which populates a whole update in a single
call, but not for the GPU bind path: write_pgtable() splits an update into
MAX_PTE_PER_SDI (510) sized MI_STORE_DATA_IMM chunks, invoking the populate
callback once per chunk with an advancing qword_ofs but a fresh command-
buffer destination pointer.

As a result, every chunk after the first re-read pt_entries from index 0
instead of from its true offset, so PTEs beyond the first 510 entries of a
single update were programmed with the wrong physical pages, shifting the
mapping by exactly MAX_PTE_PER_SDI pages.

This stayed latent because a single update only exceeds 510 qwords when a
large (e.g. 2M) region is bound as individual 4K PTEs rather than a single
huge-page entry, which happens when the backing store is sufficiently
fragmented. It was surfaced by the BO defrag path, which deliberately
rebinds such fragmented ranges via the GPU bind path, producing
deterministic data corruption offset by 510 pages.

Index pt_entries by the chunk's absolute offset relative to update->ofs so
both the CPU and GPU paths pick the correct entries.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Assisted-by: GitHub_Copilot:claude-opus-4.8
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260702012434.3861171-1-matthew.brost@intel.com
(cherry picked from commit e6f2d0b757c4fb577a513c577140109d1d292a9a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

octeontx2-af: fix VF bringup affecting PF promiscuous state

Mbox handling of nix_set_rx_mode for a VF with promiscuous and
all_multi flags set to false causes deletion of the PF's promiscuous
and allmulti MCAM rules. This occurs because the APIs that
enable/disable these rules operate only on the PF, even when the
mbox request is made via a VF interface.

Guard both rvu_npc_enable_allmulti_entry() and
rvu_npc_enable_promisc_entry() disable paths with an is_vf() check so
that a VF bringing up or tearing down its interface cannot inadvertently
clear the PF's MCAM rules.

Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature")
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
Link: https://patch.msgid.link/20260702045616.3002773-2-nshettyj@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-26-07-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*.

Most of these are LLM fixes for old issues flagged by sashiko/LLMs.

Many of these trigger drive-by-findings in sashiko. In particular:

- many load/store tearing and missing memory barriers, races
  etc. in ipset, esp. with GC and resizing.
  Keeping the proposed patches spinning for yet-another-iteration
  keeps legit fixes back, so I prefer to add these now and follow
  up with other reports later.
- flowtable work queue still has possible races with teardown,
  but same rationale as with ipset: drive-by findings, not
  problems coming with the flowtable IPIP changeset in this PR.
- ever since unreadable frag skb support was added in 6.12, we can no
  longer do: BUG_ON(skb_copy_bits( ...): it will fire with such skbs.
  Mina Almasry is looking at similar patterns elsewhere in the stack.

1) Guard skb->mac_header adjustment after IPv6 defragmentation in
nf_conntrack_reasm.  From Xiang Mei.

2) NUL-terminate ebtables table names before calling find_table_lock() to
prevent stack-out-of-bounds reads.  Also from Xiang Mei.

3) Zero the ebtables chainstack array, else error unwind may free bogus
pointer when CPU mask is sparse.  All three issues date from 2.6 days.

4) Ensure ebtables module names are c-strings, same bug pattern as 2).
Bug added in 4.6.

5) Fix catchall element handling for inverted lookups in nft_lookup. Fold the
catchall lookup into ext before computing the match status.  Was like
this ever since catchall elements got introduced in 5.13.
From Tamaki Yanagawa.

6-9) ipset updates from Jozsef Kadlecsik:
- mark rcu protected areas correctly
- address gc and resize clash in the comment extension
- add/del backlog cleanup in the error path
- allocate right size for the generic hash structure

10-12): IPIP flowtable updates from Pablo Neira Ayuso:
- Use the current direction's route when pushing IPIP headers
   Fix incorrect headroom and fragmentation offset calculations.
- Avoid hardware offload for IPIP tunnels due to lack of driver support.
- Support IPIP tunnels with direct xmit in netfilter flowtable.
   dst_cache and dst_cookie are moved outside the union to share route
   state across flows.  This is a followup to work done in 6.19 cycle.

13) Don't BUG() on skb_copy_bits error. Handle unreadable fragments by
either returning an error or restricting the copy operations to linear area,
This became an issue when unreable frag support was merged in 6.12.

14-16): IPVS updates from Yizhou Zhao:
- Pass parsed transport offset to IPVS state handlers.
   update callback signatures.
- use correct transport header offset on state lookp in TCP.
   As-is it was possible for ipv6 extension header data to be
   treated as L4 header.
- same for SCTP.  This was also broken since 2.6 days.

17) Ensure inner IP headers in ICMP errors are in the skb headroom after
stripping outer headers. Add more checks for the length of inner headers.
This was broken since 3.7 days.
From Julian Anastasov.

netfilter pull request nf-26-07-08

* tag 'nf-26-07-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  ipvs: ensure inner headers in ICMP errors are in headroom
  ipvs: use parsed transport offset in SCTP state lookup
  ipvs: use parsed transport offset in TCP state lookup
  ipvs: pass parsed transport offset to state handlers
  netfilter: handle unreadable frags
  netfilter: flowtable: support IPIP tunnel with direct xmit
  netfilter: flowtable: IPIP tunnel hardware offload is not yet support
  netfilter: flowtable: use dst in this direction when pushing IPIP header
  netfilter: ipset: allocate the proper memory for the generic hash structure
  netfilter: ipset: cleanup the add/del backlog when resize failed
  netfilter: ipset: exclude gc when resize is in progress
  netfilter: ipset: mark the rcu locked areas properly
  netfilter: nft_lookup: fix catchall element handling with inverted lookups
  netfilter: ebtables: module names must be null-terminated
  netfilter: ebtables: zero chainstack array
  netfilter: ebtables: terminate table name before find_table_lock()
  netfilter: nf_conntrack_reasm: guard mac_header adjustment after IPv6 defrag
====================

Link: https://patch.msgid.link/20260708140309.19633-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethtool: rss: Fix hfunc and input_xfrm parsing on big endian

ETHTOOL_A_RSS_HFUNC and ETHTOOL_A_RSS_INPUT_XFRM are NLA_U32 attributes,
but ethnl_rss_set() and ethnl_rss_create_doit() parse them with
ethnl_update_u8(), which reads a single byte.

On little endian this happens to read the least significant byte and
works as long as the value fits in a byte. On big endian it reads the
most significant byte, so the requested value is parsed incorrectly.

The destination fields in struct ethtool_rxfh_param are u8, so the
attribute can't be read directly with ethnl_update_u32().
Cap the hfunc policy at U8_MAX so an out of range value is rejected
instead of being silently truncated into the u8 field, and add
ethnl_update_u8_u32() to read the full u32 and narrow it into the u8
destination.

Fixes: 82ae67cbc423 ("ethtool: rss: support setting hfunc via Netlink")
Fixes: d3e2c7bab124 ("ethtool: rss: support setting input-xfrm via Netlink")
Fixes: a166ab7816c5 ("ethtool: rss: support creating contexts via Netlink")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260706055017.3355806-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Fix L3 tunnel entropy refcount leak

mlx5_tun_entropy_refcount_inc() counts both VXLAN and L2-to-L3
tunnel reformat entries as entropy-enabling users. The matching
decrement path only handled VXLAN, leaving L2-to-L3 tunnel entries
counted after release.

Handle MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL in
mlx5_tun_entropy_refcount_dec() as well so the enabling entry
refcount remains balanced.

Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260703141423.1723-1-lirongqing@baidu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ALSA: hda/realtek: Add quirk for TongFang X6xx45xU

Fix microphone detection on built in headphone jack for some devices.

Signed-off-by: Eckhart Mohr <e.mohr@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Link: https://patch.msgid.link/20260708132135.102680-1-wse@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/realtek - Fixed Headphone noise issue for Dell QCM1255

This platform booted with Ubuntu 24.04 with Pipewire audio server. So,
it has pop noise with headphone. But it's normal with Pulseaudio server.
This patch was the workaround. Connect the headphones to DAC 0x2.
The popping sound will disappear.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/34b990cb56914148ba02fa8e9d176479@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'batadv-net-pullrequest-20260708' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes, all by Sven Eckelmann:

- ensure minimal ethernet header on TX

- fix VLAN priority offset

- clean untagged VLAN on netdev registration failure

- tt: avoid request storms during pending request

- tt: prevent TVLV OOB check overflow

- frag: free unfragmentable packet

- frag: fix primary_if leak on failed linearization

- mcast: avoid OOB read of num_dests header

- dat: fix tie-break for candidate selection

* tag 'batadv-net-pullrequest-20260708' of https://git.open-mesh.org/batadv:
  batman-adv: dat: fix tie-break for candidate selection
  batman-adv: mcast: avoid OOB read of num_dests header
  batman-adv: frag: fix primary_if leak on failed linearization
  batman-adv: frag: free unfragmentable packet
  batman-adv: tt: prevent TVLV OOB check overflow
  batman-adv: tt: avoid request storms during pending request
  batman-adv: clean untagged VLAN on netdev registration failure
  batman-adv: fix VLAN priority offset
  batman-adv: ensure minimal ethernet header on TX
====================

Link: https://patch.msgid.link/20260708091821.314516-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: drop in-flight Tx SKBs on close

The MACB driver has since forever leaked the outgoing SKBs that
have not yet been marked as completed. They live in queue->tx_skb
which gets freed without remorse nor checking.

macb_free_consistent() gets called in a few codepaths, but only close will
trigger the added expressions. In macb_open() and macb_alloc_consistent()
failure cases, queues' tx_skb just got allocated and are empty.

Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Cc: stable@vger.kernel.org
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20260702-macb-drop-tx-v4-1-1c833eebdbc8@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'fix-mana-rx-with-bounce-buffering'

Dexuan Cui says:

====================
Fix MANA RX with bounce buffering

With swiotlb=force, the MANA NIC fails to work properly due to commit
730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead
of full pages to improve memory efficiency.").

This happens because, with the standard MTU=1500, the aforementioned
commit uses page pool frags with PP_FLAG_DMA_MAP, but fails to call
page_pool_dma_sync_for_cpu() to sync the received packet for CPU acces
before handing the RX buffer to the stack.

Here patch #2 adds the required page_pool_dma_sync_for_cpu().

Patch #1 validates the packet length reported by the NIC. With patch #2,
page_pool_dma_sync_for_cpu() uses the packet length, so we don't want
to blindly trust the packet length, just in case.

There is no change between v2 and v3.
v3 just swaps the order of the 2 patches in v2, as suggested by Simon [3].

References:
[1] v1: https://lore.kernel.org/netdev/20260618035029.249361-1-decui@microsoft.com/
[2] v2: https://lore.kernel.org/netdev/20260624222605.1794719-1-decui@microsoft.com/
[3] https://lore.kernel.org/netdev/20260626145048.GB1310988@horms.kernel.org/
====================

Link: https://patch.msgid.link/20260702041237.617719-1-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Sync page pool RX frags for CPU

MANA allocates RX buffers from page pool fragments when frag_count is
greater than 1. In that case the buffers remain DMA mapped by page pool
and the RX completion path does not call dma_unmap_single(). As a result,
the implicit sync-for-CPU normally performed by dma_unmap_single() is
missing before the packet data is passed to the networking stack.

This breaks RX on configurations which require explicit DMA syncing, for
example when booted with swiotlb=force.

Fix this by recording the page pool page and DMA sync offset when the RX
buffer is allocated, and syncing the received packet range for CPU access
before handing the RX buffer to the stack.

Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.")
Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Link: https://patch.msgid.link/20260702041237.617719-3-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Validate the packet length reported by the NIC

Validate the packet length reported in the RX CQE before passing it
to skb processing. The CQE is supplied by the NIC device and should
not be blindly trusted.

Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Link: https://patch.msgid.link/20260702041237.617719-2-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/net: fix EVP_MD_CTX leak in tcp_mmap

In tcp_mmap.c, both child_thread() and main() allocate an EVP_MD_CTX
via EVP_MD_CTX_new() when integrity checking is enabled, but neither
function releases the context. child_thread() misses the free in its
common cleanup block, and main() returns without freeing the context.

This results in a SHA256 context leak on every run that uses the
‑i (integrity) option. Add the missing EVP_MD_CTX_free() calls to
the appropriate cleanup paths to fix the leak.

Fixes: 5c5945dc695c ("selftests/net: Add SHA256 computation over data sent in tcp_mmap")
Signed-off-by: Wang Yan <wangyan01@kylinos.cn>
Link: https://patch.msgid.link/20260702025949.442523-1-wangyan01@kylinos.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

KVM: s390: Improve kvm_s390_vm_stop_migration()

There is no need to clear cmma-dirty state if the VM is not using CMMA.

Skip the CMMA-related code if CMMA is not in use.

Fixes: 6cfd47f91f6a ("KVM: s390: Fix cmma dirty tracking")
Fixes: 190df4a212a7 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

KVM: s390: Fix dat_crste_walk_range() early return

If a walk entry handler for a lower level returns a value,
dat_crste_walk_range() will not return immediately, but instead loop
again and move to the next entry.

This means that some entries are potentially skipped, and early return
is ignored. Skipped entries might lead to all kinds of issues, given
that the caller expects them to not be skipped. Early return is often
used to interrupt a walk when a rescheduling is needed; if it is
ignored it can lead to stalls.

Fix by breaking from the loop immediately if the walk to a lower level
returned non-zero.

Fixes: 2db149a0a6c5 ("KVM: s390: KVM page table management functions: walks")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

KVM: s390: vsie: Avoid potential deadlock with real spaces

The natural lock ordering is mmu_lock -> children_lock, but in
gmap_create_shadow() the reverse order is used when handling shadowing
of real address spaces.

Convert the inner locking of kvm->mmu_lock to a trylock; return -EAGAIN
if the lock is busy, and let the caller try again.

This path is not expected to happen in real-life scenarios, so its
performance is not important.

Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

KVM: s390: pci: Fix GISC refcount leak on AIF enable failure

kvm_s390_gisc_register() registers the guest ISC before pinning
the guest interrupt forwarding pages and allocating the AISB bit.
If any of the later setup steps fails, the function unwinds the
pinned pages and other local state, but does not unregister the
GISC reference. Add the missing kvm_s390_gisc_unregister() to the
error unwind path.

Fixes: 3c5a1b6f0a18 ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260624061910.2794734-1-haoxiang_li2024@163.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

drm/fb-helper: Only consider active CRTCs for vblank sync

Only synchronize fbdev output to the vblank of an active CRTC. Go over
the list of CRTCs and pick the first that matches. Fixes warnings as
the one shown below

[ 77.201354] WARNING: drivers/gpu/drm/drm_vblank.c:1320 at drm_crtc_wait_one_vblank+0x194/0x1cc [drm], CPU#1: kworker/1:7/1867
[ 77.201354] omapdrm omapdrm.0: [drm] vblank wait timed out on crtc 0

This currently happens if the fbdev output is not on CRTC 0.

Atomic and non-atomic drivers require distinct code paths. As for other
fbdev operations, implement both and select the correct one at runtime.

Not finding an active CRTC is not a bug. Do not wait in this case, but
flush the display update as before.

v4:
- avoid possible deadlocks with locking context (Sashiko)
v3:
- drop excessive state validation (Jani)
- acquire plane and CRTC mutices (Sashiko)
v2:
- move look-up code into separate helper
- support drivers with legacy modesetting
v1:
- see https://lore.kernel.org/dri-devel/1c9e0e24-9c4a-4259-8700-cf9e5fd60ca3@suse.de/

Co-authored-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: d8c4bddcd8bcb ("drm/fb-helper: Synchronize dirty worker with vblank")
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Closes: https://bugs.debian.org/1138033
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260702145021.226932-1-tzimmermann@suse.de

Merge tag 'iio-fixes-for-7.2a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus

Jonathan writes:

IIO: 1st set of fixes for the 7.2 cycle

Usual mixed bag of recently introduced issues and much older ones.

core
- Ensure kfifo is reset before fd is allocated avoiding concurrent use of
  fifo with reset.
multiple drivers
- Fix up missing Kconfig dependencies.
hid-sensors
- Add support for multibyte read as necessary precursor to...
- Fix stale or zero output when reading raw values for quaternions.
adi,adis
- Add IRQF_NO_THREAD to ensure interrupt is not pushed to the software
  interrupt chip used for trigger demux in the IIO core from a thread.
bosch,bmc150
- Hardening against device returning a reserved out of range value for
  how many entries are in the FIFO.
bosch,bmi160
- Add IRQF_NO_THREAD to ensure interrupt is not pushed to the software
  interrupt chip used for trigger demux in the IIO core from a thread.
dynaimage,al3010
- Fix wrong scale for highest gain_range due to too many digits in the
  micro part (val2).
freescale,mpl3115
- Fix unbalanced runtime pm on error in read_raw().
invensens,icm42600
- Avoid wrong divisor for fifo timestamps when using the watermark
  interrupt.
- Fix timestamp accuracy loss due to excessive divisor for calculations.
kionix,kxsd9
- Fix unbalanced runtime pm on an error in write_raw().
microchip,mcp37feb02
- Fix an uninitialized reference voltage value for particular DT config.
melix,mlx90635
- Build on basis of right Kconfig symbol.
nxp,lpc32xx
- Ensure completion initialized before requesting irq. Hardening against
  spurious IRQ.
nxp,saradc
- Fix a delay calculation.
sharp,gp2ap0002
- Fix unbalanced runtime pm on error in read_raw().
st,lsm6dsx
- Fix an issue seen in wild where an unplanned CPU reset can leave the
  device on the wrong register page, thus leaving the driver wedged.
st,st_sensors library
- Make sure to handle a device that provides data as big endian correctly.
st,spear
- Ensure completion initialized before requesting irq. Hardening against
  spurious IRQ.
taos,tsl2591
- Don't eat return from devm_request_threaded_irq() as that breaks
  deferred probing.
ti,ads1119
- Fix a pm reference count leak in an error path.
ti,ads124s08
- Handle gpio look up errors correctly.

* tag 'iio-fixes-for-7.2a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (28 commits)
  iio: event: Fix event FIFO reset race
  iio: imu: inv_icm42600: fix timestamp clock period by using lower value
  iio: light: al3010: fix incorrect scale for the highest gain range
  iio: adc: nxp-sar-adc: Fix the delay calculation in nxp_sar_adc_wait_for()
  iio: light: tsl2591: return actual error from probe IRQ failure
  iio: imu: inv_icm42600: fix timestamping by limiting FIFO reading
  iio: imu: st_lsm6dsx: deselect shub page before reading whoami
  iio: adc: ad7779: add missing 'select IIO_TRIGGERED_BUFFER' to Kconfig
  iio: adc: ad4130: add missing `select IIO_TRIGGERED_BUFFER` to Kconfig
  iio: adc: ti-ads124s08: Return reset GPIO lookup errors
  iio: temperature: Build mlx90635 with CONFIG_MLX90635
  iio: light: al3320a: add missing REGMAP_I2C to Kconfig
  iio: light: al3010: add missing REGMAP_I2C to Kconfig
  iio: light: al3000a: add missing REGMAP_I2C to Kconfig
  iio: common: st_sensors: honour channel endianness in read_axis_data
  iio: imu: bmi160: add IRQF_NO_THREAD to data-ready trigger IRQ
  iio: imu: adis: add IRQF_NO_THREAD to non-FIFO trigger IRQ
  iio: hid-sensor-rotation: Fix stale or zero output when reading raw values
  HID: sensor-hub: Add sensor_hub_input_attr_read_values() for multi-byte reads
  iio: adc: spear: Initialize completion before requesting IRQ
  ...

scsi: lpfc: Fix memory leak in lpfc_sli4_driver_resource_setup()

The memory allocated for mboxq using mempool_alloc() is not freed in
some of the early exit error paths. Fix that by moving the
mempool_free() call to an earlier point after last use.

Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20260707065304.949135-1-nihaal@cse.iitm.ac.in
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sg: Report request-table problems when any status is set

SG_GET_REQUEST_TABLE reports per-request diagnostic state through
sg_req_info::problem. The field is meant to indicate whether there is an
error to report for a completed request.

sg_fill_request_table() currently combines masked_status, host_status
and driver_status with bitwise AND. This only reports a problem when all
three status fields are non-zero at the same time. A normal target check
condition, for example, has masked_status set while host_status and
driver_status may both be zero, so the request is incorrectly reported
as clean.

Use the same condition as sg_new_read(), which sets SG_INFO_CHECK when
any of the three status fields is non-zero.

Cc: stable@vger.kernel.org
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/54B60C19F7DB8889+20260707030845.970018-1-raoxu@uniontech.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: tracing: Do not dereference pointers in TP_printk()

The trace events in drivers/ufs/core/ufs_trace.h were converted to take
a pointer to the hba structure as an argument for the tracepoint and
then in TP_printk() the printing of the dev_name from the ring buffer
was converted to using the dev dereferenced pointer from the hba saved
pointer.

This is not allowed as the TP_printk() is executed at the time the trace
event is read from /sys/kernel/tracing/trace file. That can happen
literally, seconds, minutes, hours, weeks, days, or even months later!
There is no guarantee that the hba pointer will still exist by the time
it is dereferenced when the "trace" file is read.

Instead, save the device name from the hba pointer at the time the
tracepoint is called and place it into the ring buffer event. Then the
TP_printk() can read the name directly from the ring buffer and remove
the possibility that it will read a freed pointer and crash the kernel.

This was detected when testing the trace event code that looks for
TP_printk() parameters doing illegal derferences[1]

[1] https://lore.kernel.org/all/20260630184836.74d477b6@gandalf.local.home/

Cc: stable@vger.kernel.org
Fixes: 583e518e7100 ("scsi: ufs: core: Add hba parameter to trace events")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260630185412.283c26c5@gandalf.local.home
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Merge branch 7.2/scsi-queue into 7.2/scsi-fixes

Pull in outstanding commits from 7.2/scsi-queue.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

regulator: core: regulator_lock_two() should test for EDEADLK not EDEADLOCK

Compare against -EDEADLK, which is what ww_mutex_lock() actually
returns and what every other deadlock check in this file already uses.

Function regulator_lock_two() acquires two regulators via
regulator_lock_nested() -> ww_mutex_lock().  On contention,
ww_mutex_lock() returns -EDEADLK, which is the caller's signal to drop
the lock it holds and retry the acquisition in the canonical order.

However, regulator_lock_two() tests the return value against -EDEADLOCK
rather than -EDEADLK.  On most architectures, EDEADLK and EDEADLOCK are
the same value, so the comparison happens to be correct and the bug is
invisible.  But on MIPS, SPARC, and PowerPC, those two errors have
different values.  The test is wrong: a genuine -EDEADLK backoff no
longer matches -EDEADLOCK, so instead of unlocking and retrying, the
code falls into WARN_ON(ret) and returns with only one of the two
regulators locked.

In practice, this is a bug only on MIPS, because the regulator core is
not built or used on the other two platforms.

In general, EDEADLK is preferred over EDEADLOCK for new code.

Fixes: cba6cfdc7c3f ("regulator: core: Avoid lockdep reports when resolving supplies")
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Link: https://patch.msgid.link/20260708235722.2953579-1-ttabi@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>

smb: client: fix busy dentry warning on unmount after DIO

Commit c68337442f03 ("cifs: Fix busy dentry used after unmounting") fixed
the issue in cifs where deferred close of a file led to a dentry reference
count not being released in umount, by flushing deferredclose_wq in
cifs_kill_sb() to solve it.

However, the cifs DIO path suffers from the same busy-dentry problem caused
by a delayed dentry reference-count release:

[dio] [cifsd] [close + umount]
netfs_unbuffered_write_iter_locked
...
cifs_demultiplex_thread
netfs_unbuffered_write
  cifs_issue_write
  netfs_wait_for_in_progress_stream [1]
...
netfs_write_subrequest_terminated
  netfs_subreq_clear_in_progress
   netfs_wake_collector // wake [1]
  netfs_put_subrequest
netfs_put_request
  queue_work(system_dfl_wq, xxx) [2]
// dio write return cifs_close
_cifsFileInfo_put
  // cfile->count 2->1
  --cfile->count [3]

// umount
cifs_kill_sb
kill_anon_super
  // warning triggered!
  shrink_dcache_for_umount [4]
[system_dfl_wq] [5]
netfs_free_request
...
_cifsFileInfo_put
  // cfile->count 1->0
  --cfile->count
  queue_work(fileinfo_put_wq, xxx)

[fileinfo_put_wq] [6]
cifsFileInfo_put_work
cifsFileInfo_put_final
  dput

If the umount path is triggered before [5], it results warning:
BUG: Dentry 00000000eab1f070{i=9a917b66ae404fec,n=test}  still in use (1)
[unmount of cifs cifs]

The existing per-inode ictx->io_count wait in cifs_evict_inode() does not
help: it lives in the inode eviction path, which runs after
shrink_dcache_for_umount() has already warned about the busy dentries.

Fix it by adding a per-superblock outstanding-rreq counter that is
incremented in cifs_init_request() and decremented in cifs_free_request().
In cifs_kill_sb(), before kill_anon_super(), wait for this counter to reach
0 - which guarantees that all cleanup_work for this sb have run and thus
all relevant cfile puts are queued on fileinfo_put_wq or serverclose_wq.
Then drain the workqueue so the dentry refs are dropped.

This is a targeted wait, not a flush of the system-wide system_dfl_wq.

Fixes: 340cea84f691c ("cifs: open files should not hold ref on superblock")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: Fix support for creating SFU fifo

SFU fifos are natively supported (created and recognized) at least by:
- Microsoft POSIX subsystem
- OpenNT/Interix subsystem
- Microsoft SFU (Windows Services for UNIX)
- Microsoft SUA (Subsystem for UNIX-based Applications)
- Windows NFS server (up to the Windows Server 2008 R2)

Windows NFS server since Windows Server 2012 uses new reparse point format
for storing new fifos, but still can recognize this old format (also in the
latest Windows Server 2022 version).

SFU-style fifo is empty regular file which has system attribute set.

These SFU-style fifos are already recognized by Linux SMB client.

But Linux SMB client is currently creating new SFU fifos in different
format which is not compatible with all those SFU-style consumers. Fix this
by creating new fifos in correct SFU format which would be recognized by
all those applications and also by existing Linux SMB clients.

This change affects only creating new fifos when mount option -o sfu is used.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: Fix support for creating SFU socket

SFU sockets are natively supported by Interix 3.0 subsystem and also by
later versions. It is part of Microsoft SFU (Windows Services for UNIX) and
Microsoft SUA (Subsystem for UNIX-based Applications). They can be created
and existing (stored on local disk or remote SMB share) can be recognized.

SFU sockets are recognized also by NFS server included in Windows Server.
Windows NFS server versions since Windows Server 2012 uses new reparse
point format for storing new sockets, but still can recognize this old
format (also in the latest Windows Server 2022 version).

SFU-style socket is a regular file which has system attribute set and
content of the file is one zero byte.

These SFU-style sockets are already recognized by Linux SMB client.

But Linux SMB client is currently creating new SFU socket in different
format which is not compatible with all those SFU applications. Fix this by
creating new sockets in correct SFU format which would be recognized by all
SFU, SUA, NFS and existing Linux SMB clients.

This change affects only creating new sockets when mount option -o sfu is used.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix atime clamp check in read completion

cifs_rreq_done() updates the inode atime to current_time(inode) after a
netfs read.  It then preserves the CIFS rule that atime should not be
older than mtime, because some applications break if atime is less than
mtime.  That rule only requires clamping when atime < mtime.

The current check uses the raw non-zero result of timespec64_compare().
It therefore takes the clamp path for both atime < mtime and
atime > mtime.  The latter is the normal case when reading an older file:
the newly recorded atime is newer than the file mtime.  The completion
handler then immediately moves atime back to mtime, losing the access
time that was just recorded.  Userspace tools that rely on atime, such as
stat, find -atime, backup tools or cold-data classifiers, can therefore
see a recently read CIFS file as not recently accessed.

This is easy to miss because the bug is silent: read I/O still succeeds,
no error is reported, and many systems either do not check atime after
reads or mount with policies such as relatime/noatime.  It becomes
visible when a CIFS file has an mtime older than the current time, the
file is read, and the local inode atime is inspected before a later
revalidation replaces the cached timestamps.

Clamp only when atime is actually older than mtime.  This matches the
same atime/mtime rule used when applying CIFS inode attributes.

Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

ASoC: tas2562: fix deprecated 'shut-down' GPIO always cleared after lookup

In tas2562_parse_dt(), the fallback lookup for the deprecated
"shut-down" GPIO property is broken due to a missing pair of braces.

The code intends to reset sdz_gpio to NULL only when the lookup
returns an error that is not -EPROBE_DEFER (so the driver gracefully
continues without a GPIO). However, without braces the statement:

tas2562->sdz_gpio = NULL;

falls outside the IS_ERR() check and is executed unconditionally
for every path through the if block, including a successful GPIO
lookup.

This means any device using the deprecated 'shut-down' DT property
will always have sdz_gpio == NULL after probe, making the GPIO
completely non-functional.

Fix this by adding the missing braces to scope the NULL assignment
inside the IS_ERR() branch, matching the pattern already used for
the primary 'shutdown' GPIO lookup above.

Fixes: f78a97003b8b ("ASoC: tas2562: Update shutdown GPIO property")
Signed-off-by: Uday Khare <udaykhare77@gmail.com>
Link: https://patch.msgid.link/20260706153109.10953-1-udaykhare77@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

drm/amdkfd: Check bounds on CRIU restore queue type and mqd size

We weren't checking whether the values provided in the private
data in kfd CRIU restore were within bounds.

For queue type, add a KFD_QUEUE_TYPE_MAX and ensure the provided
type is less than it.

For mqd_size, add new function mqd_size_from_queue_type and confirm
that the provided mqd_size matches expectations.

Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f19d8086f6644083c913d70bfdeee20e1b6f46a5)
Cc: stable@vger.kernel.org

drm/amd/pm: fix smu14 power limit range calculation

SMU14 derives the default PPT limit from SocketPowerLimitAc/Dc, but
MsgLimits.Power may expose a different firmware limit for the same PPT0
throttler. Using those values independently as fixed min/max bases can
report an incorrect configurable power range.

Keep the socket power limit as the default value and as the fallback for
current-limit queries. Calculate the reported range from both firmware
values instead, using the lower value as the minimum base and the higher
value as the maximum base before applying OD percentages.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c936b8126b444401318fcbeb1828488cc5312dee)
Cc: stable@vger.kernel.org

drm/amdkfd: Check bounds in allocate_event_notification_slot

The valid event ids go from 0 to KFD_SIGNAL_EVENT_LIMIT

allocate_event_notification_slot has an option to specify
an event id to allocate at, used by CRIU. We weren't checking
the bounds on that value.

Check them.

v2: Lower bounds check is unecessary because of idr_alloc
already rejecting negative numbers. Upper bounds check should
be KFD_SIGNAL_EVENT_LIMIT since the signal mode mappings might
not yet exist

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6853f1f6cbbeb3f53ebbbd7286536aeb2c5d5f50)
Cc: stable@vger.kernel.org

amdkfd: properly free secondary context id

Function kfd_process_free_id() should skip over
the primary kfd process because its context id
is fixed assigned, not allocated through the ida table.
This function should only work on secondary contexts.

Fixes: fac682a1d1af ("amdkfd: identify a secondary kfd process by its id")
Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8799ba6fb6a48438aea20c82e74c2f2a3d2b2e7a)
Cc: stable@vger.kernel.org

drm/amdkfd: Don't acquire buffers during CRIU queue restore.

kfd_criu_restore_queue's call of kfd_queue_acquire_buffers was
failing for multiple reasons
- The ctl_stack_size set by the CRIU plugin doesn't match
what is expected by acquire_buffers
- The svm buffer cannot be acquired at this point because
CRIU may not have restored it, or may have restored it
to a different address.

The only reason acquire_buffers was necessary here was to
avoid a null ptr dereference in init_user_queue.

Just put in a check for that dereference; it doesn't appear to
come up in real use cases right now. That is, there is no
usage of CRIU with shared MES.

This is a partial revert of
commit 20a5e7ffdfec ("drm/amdkfd: Properly acquire queue buffers in CRIU restore")

Fixes: 20a5e7ffdfec ("drm/amdkfd: Properly acquire queue buffers in CRIU restore")
Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1cafa8b29e029eac3ddf64604f891b35dbf6262b)
Cc: stable@vger.kernel.org

drm/amdkfd: Check bounds on CRIU restore event id

The valid amdkfd event ids go from 0 to KFD_SIGNAL_EVENT_LIMIT - 1.

During CRIU restore, ensure that the provided event ids are
in that range.

v2: No need for lower bound check since idr_alloc rejects negative
inputs

v3: Also change error message to reflect new error condition

Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5c6c247992d4d9200e073b83f4ec6c703c096845)

drm/gfx10: Program DB_RING_CONTROL

This is needed to allocate occlusion counters across
both gfx pipes.

Fixes: b7a1a0ef12b8 ("drm/amd/amdgpu: add pipe1 hardware support")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6807352cbabb74b61ba42888769283af72191f66)
Cc: stable@vger.kernel.org

dm era: fix error code propagation in era_ctr()

era_ctr() replaces the actual error codes returned by dm_get_device()
and dm_set_target_max_io_len() with hardcoded -EINVAL, discarding
the real reason for the failure (e.g. -ENODEV, -ENOMEM). This makes
it harder for users to diagnose problems and is inconsistent with
other dm targets (dm-thin, dm-verity, dm-flakey, dm-ebs) which
propagate the original error.

Fix all three sites to return 'r' instead of -EINVAL.

Signed-off-by: Cao Guanghui <caoguanghui@kylinos.cn>
Reviewed-by: Su Yue <glass.su@suse.com>
Reviewed-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

dm era: fix NULL pointer dereference in metadata_open()

metadata_open() returns NULL when kzalloc_obj() fails, but the
caller era_ctr() only checks IS_ERR(md). Since IS_ERR(NULL)
returns false, the NULL pointer is treated as a valid result
and later assigned to era->md, leading to a NULL pointer
dereference when the metadata is accessed.

Fix this by returning ERR_PTR(-ENOMEM) on allocation failure,
consistent with dm-cache-metadata.c, dm-thin-metadata.c, and
dm-clone-metadata.c which all use ERR_PTR(-ENOMEM) for the
same pattern.

Fixes: eec40579d848 ("dm: add era target")
Signed-off-by: Cao Guanghui <caoguanghui@kylinos.cn>
Reviewed-by: Su Yue <glass.su@suse.com>
Reviewed-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

dm: avoid leaking the caller's thread keyring via the table device file

The refactoring in commit a28d893eb327 ("md: port block device access to file")
accidentally causes the caller's thread keyring to be kept alive long
beyond the caller's lifetime.

As a result, "cryptsetup luksSuspend" silently fails to wipe the
LUKS volume key from memory.

In detail: "cryptsetup luksOpen" uses its supposedly ephemeral thread
keyring to pass the volume key to the kernel. dm-crypt's
crypt_set_keyring_key() copies the key material into its own
crypt_config structure and then drops its own reference to the key in
the keyring with key_put().

With this fix, restoring pre-v6.9 behavior, the copy in the thread
keyring is then promptly garbage collected, such that exactly one copy
of the volume key remains. This single copy is correctly wiped from
memory on "cryptsetup luksSuspend".

Without this fix, the thread keyring and the volume key in it remains.
This second copy is only freed on "luksClose". "luksSuspend" neither
knows about this copy nor has any way to remove it, so the key remains
recoverable from RAM after a suspend that is documented to have wiped it.

This fix should not introduce new security problems, as the code is
anyway gated by CAP_SYS_ADMIN. The device-mapper core, not the calling
task, is the legitimate owner of this long-lived file.

Fixes: a28d893eb327 ("md: port block device access to file")
Closes: https://gitlab.com/cryptsetup/cryptsetup/-/work_items/993
Link: https://www.speicherleck.de/iblech/cryptsetup-luksSuspend-issue-reproduction/
Signed-off-by: Ingo Blechschmidt <iblech@speicherleck.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Ondrej Kozina <okozina@redhat.com>

dm-inlinecrypt: Fix an error handling path in inlinecrypt_ctr()

All error handling paths, except but this one, branch to the 'bad' label in
the error handling path.

If not done, there is a memory leak and some sensitive data may be kept
around.

So, fix this error path and also do the needed clean-up.

Also, fix missing goto in the "Wrong alignment of iv_offset sector" path.

Fixes: e7f57d2c47e2 ("dm-inlinecrypt: add target for inline block device encryption")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

dm-pcache: reject option groups without values

The pcache target parses optional arguments as name/value pairs.  A
table that advertises one optional argument and supplies only a
recognized option name, for example "cache_mode", reaches
parse_cache_opts() with argc == 1.  The parser consumes the name,
decrements argc to zero, then calls dm_shift_arg() again for the value.
dm_shift_arg() returns NULL when no arguments remain, and the following
strcmp() dereferences that NULL pointer.

Check that each recognized option has a value before consuming it.  This
keeps valid "cache_mode writeback" and "data_crc true/false" tables
unchanged while making malformed tables fail during target construction
with a precise missing-value error.

Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
Reviewed-by: Zheng Gu <cengku@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 1d57628ff95b ("dm-pcache: add persistent cache target in device-mapper")
Cc: stable@vger.kernel.org

dm thin metadata: fix metadata snapshot consistency on commit failure

__reserve_metadata_snap() and __release_metadata_snap() modify the
superblock's held_root directly in the block_manager's buffer. If the
subsequent metadata commit fails, the held_root gets flushed to disk
through the abort_transaction path, resulting in inconsistent metadata.

Reproducer 1: __reserve_metadata_snap()

1. Create a 2 MiB metadata device and make the region after the 14th
   block inaccessible, to trigger metadata commit failure in the
   subsequent reserve_metadata_snap operation. The 14th block will be
   the shadow destination for the index block.

dmsetup create tmeta --table "0 112 linear /dev/sdc 0
112 3984 error"

2. Create a 16 MiB thin-pool

dmsetup create tdata --table "0 32768 zero"
dd if=/dev/zero of=/dev/mapper/tmeta bs=4k count=1
dmsetup create tpool --table "0 32768 thin-pool /dev/mapper/tmeta \
/dev/mapper/tdata 128 0 1 skip_block_zeroing"

3. Take a metadata snapshot to trigger metadata commit failure and
   transaction abort. However, the held_root is written to disk,
   breaking metadata consistency.

dmsetup message tpool 0 "reserve_metadata_snap"

thin_check v1.2.2 result:

Bad reference count for metadata block 6.  Expected 2, but space map contains 1.
Bad reference count for metadata block 7.  Expected 2, but space map contains 1.
Bad reference count for metadata block 13.  Expected 1, but space map contains 0.

Reproducer 2: __release_metadata_snap()

1. Create a 2 MiB metadata device and make the region after the 16th
   block inaccessible, to trigger metadata commit failure in the
   subsequent release_metadata_snap operation. The 16th block will be
   the shadow destination for the index block.

dmsetup create tmeta --table "0 128 linear /dev/sdc 0
128 3968 error"

2. Create a 16 MiB thin-pool

dmsetup create tdata --table "0 32768 zero"
dd if=/dev/zero of=/dev/mapper/tmeta bs=4k count=1
dmsetup create tpool --table "0 32768 thin-pool /dev/mapper/tmeta \
/dev/mapper/tdata 128 0 1 skip_block_zeroing"

3. Reserve then release the metadata snapshot, to trigger metadata
   commit failure and transaction abort. The held_root gets removed
   from the on-disk superblock, causing inconsistent metadata.

dmsetup message tpool 0 "reserve_metadata_snap"
dmsetup message tpool 0 "release_metadata_snap"

thin_check v1.2.2 result:

Bad reference count for metadata block 6.  Expected 1, but space map contains 2.
Bad reference count for metadata block 7.  Expected 1, but space map contains 2.
1 metadata blocks have leaked.

Fix by deferring the held_root update to commit time.

Additionally, move the existing-snapshot check in __reserve_metadata_snap
before the shadow operation to avoid unnecessary work. In
__release_metadata_snap, clear pmd->held_root before btree deletion so
partial failure leaks blocks rather than leaving a stale reference, and
unlock the snapshot block before decrementing its refcount.

Fixes: 991d9fa02da0 ("dm: add thin provisioning target")
Cc: stable@vger.kernel.org
Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

dm-verity: fix buffer overflow in FEC calculation

There's a buffer overflow in dm-verity-fec:

if (neras && *neras <= v->fec->roots)
fio->erasures[(*neras)++] = i;

This allows *neras to reach roots + 1 (the post-increment pushes it past
roots). This value is then passed as no_eras to decode_rs8(). Inside the
RS decoder (lib/reed_solomon/decode_rs.c:113-121), the erasure locator
polynomial loop writes lambda[j] where j can reach nroots + 1 — one
element past the end of lambda[] (which is sized nroots + 1, valid
indices 0..nroots). The out-of-bounds write lands on syn[0], corrupting
the syndrome buffer.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Cc: stable@vger.kernel.org
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

dm era: fix out-of-bounds memory access for non-zero start sector

dm-era tracks writes in target-relative blocks, but era_map() calculates
the writeset block before applying the target offset. Tables with a
non-zero start sector can therefore pass an absolute mapped-device block
to metadata_current_marked().

If the absolute block is beyond the current writeset size,
writeset_marked() tests past the end of the in-core bitset. KASAN reports
this as a vmalloc-out-of-bounds access.

Apply the target offset before calculating the era block so writeset
lookups use the target-relative block number.

Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
Reviewed-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: eec40579d848 ("dm: add era target")

dm-log: fix a bitset_size overflow on 32bit machines

Commit c20e36b7631d ("dm log: fix out-of-bounds write due to
region_count overflow") made sure that region_count could fit in an
unsigned int. But the bitmap memory isn't allocated based on
region_count. It uses bitset_size (a size_t variable). The first step of
calculating bitset_size is to set it to region_count, rounded up to a
multiple of BITS_PER_LONG. If region_size is less than BITS_PER_LONG
smaller than UINT_MAX, it will get rounded up to 2^32. On a 32bit
architecture, this will make bitset_size wrap around to 0 and fail,
despite region_count being valid.

Since bitset_size gets divided by 8, it can hold any valid region_count.
It just needs a special case to handle the rollover. If it is 0, the
value rolled over, and bitset size should be set to the number of bytes
needed to hold 2^32 bits.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: c20e36b7631d ("dm log: fix out-of-bounds write due to region_count overflow")
Cc: stable@vger.kernel.org

drm/amdgpu: fix lifetime issue of amdgpu_vm_get_task_info_pasid()

The vm pointer returned from amdgpu_vm_get_vm_from_pasid() is only
valid while the lock is still being held. Once xa_unlock_irqrestore is
called and returned, the pointer is no longer under lock and is subject
to modification. Since, the caller still dereferences vm->task_info in
amdgpu_vm_get_task_info_vm() after the lock is removed, this causes a
use after unlock problem.

Remove the lifetime issue present in amdgpu_vm_get_task_info_pasid()
through removing the amdgpu_vm_get_vm_from_pasid() function from
amdgpu_vm.c and making the relevant code inline to hold the lock while
it is still in use.

Signed-off-by: Shahyan Soltani <shahyan.soltani@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d01579f3f868b333acc901815972685989092c7)
Cc: stable@vger.kernel.org

drm/amdgpu: trigger GPU recovery when userq destroy fails to unmap a hung queue

Destroying a hung user queue issues a MES REMOVE_QUEUE that times out,
The destroy path only logged the error and freed the queue, so the
next userq submission failed and forced a GPU reset attributed to an innocent workload.

Kick the userq reset work when unmap fails so the GPU is recovered at
destroy time.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8396b9de4198a54ec4760a94a179347540a9764d)
Cc: stable@vger.kernel.org

drm/amd/amdgpu: disable ASPM on VI if pcie dpm is disabled

Disable ASPM on VI if PCIE dpm is disabled.

Fixes: bb00bf17328d ("drm/amd/amdgpu: decouple ASPM with pcie dpm")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5370
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 873a8d6b3c0a386408c891e4ff1c684fa11783e1)
Cc: stable@vger.kernel.org

drm/amdgpu: Disable JDPG on VCN5_3

JDPG does not support on VCN5

This patch will disable JDPG, because DPG is not correctly
copying the JRBC Read/Write Pointers (R/WPTR) from the PG
(Power Gating) block to JRBC.

Signed-off-by: Suresh Guttula <suresh.guttula@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ea3fdd1eda088030d8925f023613728969f55955)

drm/amdgpu: add support for SMU version 15.0.9

Initialize SMU Version 15_0_9

Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com>
Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1dfd4e84b5beec353a81d61af9eaf4e5a56e0c57)

drm/amdgpu: add support for PSP version 15.0.9

Initialize PSP Version 15_0_9

Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com>
Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ef71f00173228904763552b7405169023f8034a8)

NFS: Charge unstable writes by request size, not folio size

nfs_folio_mark_unstable() and nfs_folio_clear_commit() charge and
uncharge NR_WRITEBACK/WB_WRITEBACK by folio_nr_pages(folio) once per
*request* added to or removed from a commit list. This is correct only
when a folio has a single associated request. When pg_test splits a
folio into N sub-folio requests (e.g. pNFS flexfiles striping with a
stripe unit smaller than the folio size, or plain wsize-limited
splitting), each of the N requests independently charges the whole
folio's page count, inflating the accounting by a factor of N per
folio. With large folios and small stripe units this reaches multiple
orders of magnitude: a 2 MiB folio split into 512 4 KiB requests can
charge up to 512x its real size, pushing global dirty+writeback
accounting past the system's dirty threshold and forcing every
buffered writer on the host into the hard-throttle path, including
unrelated in-kernel NFS server threads sharing the box.

Charge each request only for the pages it actually covers.

Fixes: 0c493b5cf16e ("NFS: Convert buffered writes to use folios")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
Assisted-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>

NFSv4: include MAY_WRITE in open permission mask for O_TRUNC

POSIX requires write permission to truncate a file, so an open() that
specifies O_TRUNC must be authorized for write access regardless of the
O_ACCMODE access mode.

nfs_open_permission_mask() builds the access mask passed to
nfs_may_open(), which is the local authorization gate for OPENs the
client serves itself from a cached write delegation via the
can_open_delegated() path in nfs4_try_open_cached(). The mask is
derived from O_ACCMODE alone, so an open(O_RDONLY | O_TRUNC) against a
file the caller cannot write requests only MAY_READ and passes the
local check. The OPEN is then satisfied locally and the truncation is
issued to the server as a SETATTR(size=0) over the delegation stateid,
which the server accepts under standard write-delegation semantics.
POSIX requires that this open fail with EACCES.

Include MAY_WRITE in the mask whenever O_TRUNC is set so the local
check matches the access the server would have enforced.

Suggested-by: Trond Myklebust <trondmy@kernel.org>
Fixes: af22f94ae02a ("NFSv4: Simplify _nfs4_do_access()")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>

regulator: mt6363: add missing MODULE_DEVICE_TABLE()

The driver has an OF match table wired to .of_match_table, but does
not export the table with MODULE_DEVICE_TABLE().

Add the missing MODULE_DEVICE_TABLE(of, ...) entry so module alias
information is generated for OF based module autoloading.

This is a source-level fix. It does not claim dynamic hardware
reproduction; the evidence is the driver-owned match table, its use by
the platform driver, and the missing module alias publication.

Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Link: https://patch.msgid.link/20260704124352.7981-1-pengpeng@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: mt6316: add missing MODULE_DEVICE_TABLE()

The driver has an OF match table wired to .of_match_table, but does
not export the table with MODULE_DEVICE_TABLE().

Add the missing MODULE_DEVICE_TABLE(of, ...) entry so module alias
information is generated for OF based module autoloading.

This is a source-level fix. It does not claim dynamic hardware
reproduction; the evidence is the driver-owned match table, its use by
the platform driver, and the missing module alias publication.

Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Link: https://patch.msgid.link/20260704122926.21586-1-pengpeng@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

selftests/sched_ext: Verify nohz_full tick behavior

Finite-slice EXT tasks need the periodic scheduler tick to expire their
slices even when nohz_full is enabled.

Add a regression test that selects a nohz_full CPU and exercises both
infinite-to-finite and finite-to-finite slice transitions across an idle
interval. For each finite task, verify that its ops.tick() callback is
invoked.

Skip the test when an allowed nohz_full CPU and a separate housekeeping
CPU are not available.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: Enable tick for finite slices on nohz_full

set_next_task_scx() updates the tick dependency before __schedule()
updates rq->curr. When switching from a non-EXT task, such as idle, to
an EXT task with a finite slice, sched_update_tick_dependency() checks
the outgoing task and can allow the tick to remain stopped.

The dependency can also be lost without a slice-type transition. After a
finite-slice task leaves the CPU idle, the enqueue path can clear the
dependency against the idle rq->curr. SCX_RQ_CAN_STOP_TICK still records
a finite slice, so another finite task skips the transition block and
can run without the ticks needed to expire its slice.

The reverse mismatch can also happen when the last finite-slice EXT task
is dequeued: sub_nr_running() updates the dependency before rq->curr
changes, so the outgoing task state can keep the dependency set after
the CPU goes idle.

Fix this by unconditionally enabling the scheduler tick whenever a
finite-slice EXT task is selected on a nohz_full CPU. Moreover, when the
last runnable EXT task leaves, ignore the outgoing EXT slice state so
the generic scheduler can correctly re-evaluate and clear the tick
dependency.

Fixes: 22a920209ab6 ("sched_ext: Implement tickless support")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

audit: fix potential integer overflow in audit_log_n_hex()

The function calculates new_len as len << 1 for hex encoding. This
has two overflow risks: the shift itself can overflow when len is
large, and the result can be truncated when assigned to new_len
(declared as int) from the size_t calculation.

Fix by using check_shl_overflow() to catch shift overflow and
changing new_len and loop counter i to size_t to prevent truncation.

Cc: stable@vger.kernel.org
Fixes: 168b7173959f ("AUDIT: Clean up logging of untrusted strings")
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
[PM: remove vertical whitspace noise]
Signed-off-by: Paul Moore <paul@paul-moore.com>

sched_ext: Preserve rq tracking across local DSQ dispatch

dispatch_to_local_dsq() can run from scx_bpf_dsq_move_to_local() while
ops.dispatch() has recorded the current rq. Moving a task to a local DSQ
may switch to the source or destination rq before synchronously invoking
ops.dequeue() through the following path:

  SCX_CALL_OP(dispatch, rq)
    ops.dispatch()
      scx_bpf_dsq_move_to_local()
        scx_flush_dispatch_buf()
          finish_dispatch()
            dispatch_to_local_dsq()
              scx_dispatch_enqueue()
                local_dsq_post_enq()
                  call_task_dequeue()
                    SCX_CALL_OP_TASK(dequeue, locked_rq, ...)

The nested callback saves the recorded rq and restores it on return. If
the rq tracking does not follow the lock switch, update_locked_rq() can
trigger the following lockdep assertion while restoring an rq which is
no longer held:

  WARNING: kernel/sched/sched.h:1641 at call_task_dequeue+0x160/0x170
  Call Trace:
    scx_dispatch_enqueue+0x2b0/0x460
    dispatch_to_local_dsq+0x138/0x230
    scx_flush_dispatch_buf+0x1af/0x220
    scx_bpf_dsq_move_to_local___v2+0xe2/0x1c0
    bpf__sched_ext_ops_dispatch+0x4b/0xa7
    do_pick_task_scx+0x3b6/0x910
    __pick_next_task+0x105/0x1f0
    __schedule+0x3e7/0x1980

Introduce switch_rq_lock() to update the tracking state together with
each rq lock handoff. Use it in dispatch_to_local_dsq(),
move_remote_task_to_local_dsq() and the in-balance paths of
scx_dsq_move(), ensuring that scx_locked_rq() consistently refers to the
rq whose lock is actually held throughout the lock dance.

Fixes: 7fb39e4eb4c3 ("sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP")
Cc: stable@vger.kernel.org # 7.1+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

SUNRPC: pin upper rpc_clnt across the TLS connect_worker

The TLS connect path has a use-after-free: nothing pins the
upper rpc_clnt across the delayed connect_worker. xs_connect()
stores task->tk_client in sock_xprt::clnt as a raw pointer
and queues the worker; for TLS-secured transports that worker
is xs_tcp_tls_setup_socket(), which reads several fields out
of the saved pointer (cl_timeout, cl_program, cl_prog,
cl_vers, cl_cred, cl_stats) to construct the args for the
inner handshake rpc_clnt.

The xprt does not reference the rpc_clnt; the rpc_clnt
references the xprt. xs_destroy() does cancel the
connect_worker, but it runs only when the xprt's refcount
drops to zero, which cannot happen until the rpc_clnt
releases its cl_xprt reference in rpc_free_client_work().
When a TLS handshake fails fatally (for example, an mTLS
mount whose client cert does not match the server), the
connecting task is woken with -EACCES and exits, the mount
caller invokes rpc_shutdown_client(), and the upper rpc_clnt
is freed before the queued connect_worker fires.
xs_tcp_tls_setup_socket() then dereferences the freed clnt,
producing the refcount_t underflow Michael Nemanov reported.

Take a reference on the upper rpc_clnt in xs_connect() for
TLS transports via a new rpc_hold_client() helper, and drop
it in the connect_worker's exit path with rpc_release_client().
The xprt_lock_connect() / xprt_unlock_connect() pairing
already serialises xs_connect() with xs_tcp_tls_setup_socket(),
so the take and release are balanced one-for-one.

The non-TLS connect worker (xs_tcp_setup_socket) never reads
sock_xprt::clnt, so leave that path alone and avoid the
clnt-holds-xprt-holds-clnt cycle that would otherwise prevent
xprt destruction.

Reported-by: Michael Nemanov <michael.nemanov@vastdata.com>
Closes: https://lore.kernel.org/linux-nfs/40e3d522-dfcf-4fc1-9c55-b5e81f1536d5@vastdata.com/
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Michael Nemanov <michael.nemanov@vastdata.com>
Reviewed-by: Michael Nemanov <michael.nemanov@vastdata.com>
Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>

SUNRPC: release lower rpc_clnt if killed waiting for XPRT_LOCKED

xs_tcp_tls_setup_socket() creates a temporary "lower" rpc_clnt with
rpc_create() to drive the inner TLS handshake, then waits for
XPRT_LOCKED on its xprt with TASK_KILLABLE so a stuck handshake can
be aborted by signal. When the wait is interrupted, the function
jumps to out_unlock without releasing lower_clnt. The success path
and the out_close error path both call
rpc_shutdown_client(lower_clnt); only the killed-wait path skips it,
leaking the clnt and its underlying xprt.

Call rpc_shutdown_client() on this path before joining out_unlock.
xprt_release_write() is not needed here because XPRT_LOCKED was
never acquired.

Fixes: 26e8bfa30dac ("SUNRPC/TLS: Lock the lower_xprt during the tls handshake")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Michael Nemanov <michael.nemanov@vastdata.com>
Reviewed-by: Michael Nemanov <michael.nemanov@vastdata.com>
Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>

KVM: nVMX: Don't use vmcs01.GUEST_CR3 to snapshot L1's CR3 when EPT is disabled

Add a dedicated field in "struct nested_vmx" to track L1's pre-VM-Enter CR3
instead of using vmcs01.GUEST_CR3, which isn't anywhere near as safe as the
comment purports it to be. E.g. in addition to the warn_on_missed_cc bug
(that was fixed by relocating the consistency check), if getting vmcs12
pages (during actual nested VM-Entry) fails and EPT is disabled (in KVM),
KVM will return control to userspace with vmcs01.GUEST_CR3 holding a guest-
controlled value.

Alternatively, KVM could force a reload of vmcs01.GUEST_CR3 by resetting
the MMU context in the error path, but as above, the safety of the vmcs01
approach is extremely questionable, e.g. it took all of ~4 months for the
code to break.

Fixes: 671ddc700fd0 ("KVM: nVMX: Don't leak L1 MMIO regions to L2")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20260612145642.452392-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: nVMX: Move vTPR vs. TPR Threshold consistency check into "normal" checks

Move the off-by-default consistency check for vmcs12.tpr_threshold vs.
the virtual APIC vTPR into the "normal" controls checks, as waiting until
KVM has loaded some amount of state is unnecessary and actively dangerous.
Specifically, failure to unwind vmcs01.GUEST_CR3 to KVM's value when EPT
is disabled results in KVM running L1 with an L1-controlled CR3, not with
KVM's CR3!

Alternatively, KVM could simply reset the MMU to force a reload of
vmcs01.GUEST_CR3, but the _only_ reason the check was shoved into a "late"
flow was to wait until the vmcs12 pages were retrieved.  Rather than build
up more crusty code, simply access vTPR using a regular guest memory access
(performance isn't a concern).  To circumvent the restrictions that led to
KVM deferring nested_get_vmcs12_pages(), (a) use a VM-scoped API to read
guest memory so that it always hits non-SMM memslots (for RSM), and (b)
skip the check (since its off-by-default anyways) when the vCPU doesn't
want to run, i.e. when userspace is restoring/stuffing state.

If reading guest memory fails, simply skip the consistency check, as KVM's
de facto ABI is that VMX instruction accesses to non-existent memory get
PCI Bus Error semantics, where reads return 0xFFs.  And if vTPR=0xFF, then
the vTPR is guaranteed to be greater than or equal to TPR_THRESHOLD.

Fixes: 1100e4910ad2 ("KVM: nVMX: Add an off-by-default module param to WARN on missed consistency checks")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260612145642.452392-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs

Ignore KVM's internal "service pending PV EOI" request if the vCPU has
disabled PV EOIs since the request was made.  Asserting that PV EOIs are
enabled can fail if reading guest memory in pv_eoi_get_user() fails, i.e.
if pv_eoi_test_and_clr_pending() bails early, *and* the vCPU also disables
PV EOIs.

  kernel BUG at arch/x86/kvm/lapic.c:3338!
  Oops: invalid opcode: 0000 [#1] SMP
  CPU: 4 UID: 1000 PID: 890 Comm: pv_eoi_test Not tainted 7.0.0-d585aa5894d8-vm #337 PREEMPT
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_lapic_sync_from_vapic+0x12b/0x140 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_vcpu_ioctl_run+0x1075/0x1c30 [kvm]
   kvm_vcpu_ioctl+0x2d5/0x980 [kvm]
   __x64_sys_ioctl+0x8a/0xd0
   do_syscall_64+0xb5/0xb40
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
   </TASK>
  Modules linked in: kvm_intel kvm irqbypass
  ---[ end trace 0000000000000000 ]---

Fixes: ae7a2a3fb6f8 ("KVM: host side for eoi optimization")
Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20260624220516.3033391-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

selftests/rseq: Fix a building error for riscv arch

RISC-V rseq selftests include asm/fence.h from tools/arch/riscv,
but the rseq Makefile only adds tools/include in the CFLAGS, this
results in the building failure both for native and cross build:

    In file included from rseq.h:131,
                     from rseq.c:37:
    rseq-riscv.h:11:10: fatal error: asm/fence.h: No such file or directory

To fix it, add the matching tools/arch/$(ARCH)/include path in the
CFLAGS and derive ARCH from SUBARCH for standalone native builds where
ARCH is not set.

Fixes: c92786e179e0 ("KVM: riscv: selftests: Use the existing RISCV_FENCE macro in `rseq-riscv.h`")
Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://patch.msgid.link/20260707082348.36896-1-hui.wang@canonical.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>

KVM: x86: Nullify irqfd->producer if updating IRTE for bypass fails

Nullify irqfd->producer if updating the IRTE for bypass fails, as leaving a
dangling pointer will result in a use-after-free if the irqfd is reachable
through KVM's routing, but the producer is freed separately. E.g. for VFIO
PCI, the producer is embedded in struct "vfio_pci_irq_ctx" and freed when
the vector is disabled, which can happen independent of routing updates.

Fixes: 77e1b8332d1d ("KVM: x86: Decouple device assignment from IRQ bypass")
Cc: stable@vger.kernel.org
Signed-off-by: leixiang <leixiang@kylinos.cn>
Link: https://patch.msgid.link/1782119051448443.14545.seg@mailgw.kylinos.cn
[sean: drop PPC change, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>

riscv: defconfig: enable ARCH_ULTRARISC

Enable `ARCH_ULTRARISC` in the default RISC-V defconfig.

Signed-off-by: Jia Wang <wangjia@ultrarisc.com>
Link: https://patch.msgid.link/20260515-ultrarisc-pinctrl-v1-9-bf559589ea8a@ultrarisc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>

riscv: add UltraRISC SoC family Kconfig support

The first SoC in the UltraRISC series is UR-DP1000, containing octa
UltraRISC CP100 cores.

Signed-off-by: Jia Wang <wangjia@ultrarisc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260427-ultrarisc-pcie-v4-1-98935f6cdfb5@ultrarisc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>

ASoC: cs42l43: Correct report for forced microphone jack

Currently if the jack is forced to the microphone mode, it will report
as line in. Correct the report to microphone.

Fixes: fc918cbe874e ("ASoC: cs42l43: Add support for the cs42l43")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260708103430.1395207-1-ckeepax@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'hid-for-linus-2026070801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- OOB, UAF, NULL-deref fixes in core and picolcd, logitech, letsketch,
   appleir and multitouch drivers (Georgiy Osokin, HyeongJun An, Lee
   Jones, Manish Khadka, Maoyi Xie and Trung Nguyen)

- fix for integer wraparound (and corresponding regression selftest) in
   hid-bpf (Yiyang Chen)

* tag 'hid-for-linus-2026070801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  selftests/hid: multitouch: test a large ContactCountMaximum
  HID: multitouch: fix out-of-bounds bit access on mt_io_flags
  selftests/hid: Cover hid_bpf_get_data() size overflow
  selftests/hid: Load only requested struct_ops maps
  HID: bpf: Fix hid_bpf_get_data() range check
  HID: lg-g15: cancel pending work on remove to fix a use-after-free
  HID: logitech-dj: Fix maxfield check in DJ short report validation
  HID: core: Fix OOB read in hid_get_report for numbered reports
  HID: picolcd: prevent NULL pointer dereference in picolcd_send_and_wait()
  HID: appleir: fix UAF on pending key_up_timer in remove()
  HID: letsketch: fix UAF on inrange_timer at driver unbind

USB: core: ratelimit cabling message

If a cable is bad, it stays bad. There is no need to flood the log with
messages about it. So go for a ratelimited version.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20260605090110.1514785-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

usb: misc: usbio: fix disconnect UAF in client teardown

usbio_disconnect() walks usbio->cli_list in reverse and uninitializes each
auxiliary device. auxiliary_device_uninit() drops the device reference, and
for an unbound child that can run usbio_auxdev_release() and free the
containing struct usbio_client.

list_for_each_entry_reverse() advances after the loop body by reading
client->link.prev. If the current client is freed by
auxiliary_device_uninit(), the iterator dereferences freed memory.

Use list_for_each_entry_safe_reverse() so the previous client is
cached before the body can drop the final reference. This preserves
reverse teardown order while keeping the next iterator cursor independent
of the current client's lifetime.

Validation reproduced this kernel report:
BUG: KASAN: slab-use-after-free in usbio_disconnect+0x12e/0x150

Call Trace:
<TASK>
dump_stack_lvl+0x66/0xa0
print_report+0xce/0x630
? usbio_disconnect+0x12e/0x150
? srso_alias_return_thunk+0x5/0xfbef5
? __virt_addr_valid+0x188/0x320
? usbio_disconnect+0x12e/0x150
kasan_report+0xe0/0x110
? usbio_disconnect+0x12e/0x150
usbio_disconnect+0x12e/0x150
usb_unbind_interface+0xf3/0x400
really_probe+0x316/0x660
__driver_probe_device+0x106/0x240
driver_probe_device+0x4a/0x110
__device_attach_driver+0xf1/0x1a0
? __pfx___device_attach_driver+0x10/0x10
bus_for_each_drv+0xf9/0x160
? __pfx_bus_for_each_drv+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? trace_hardirqs_on+0x18/0x130
? srso_alias_return_thunk+0x5/0xfbef5
? _raw_spin_unlock_irqrestore+0x44/0x60
__device_attach+0x133/0x2a0
? __pfx___device_attach+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? do_raw_spin_unlock+0x9a/0x100
? srso_alias_return_thunk+0x5/0xfbef5
device_initial_probe+0x55/0x70
bus_probe_device+0x4a/0xd0
device_add+0x9b9/0xc10
? __pfx_device_add+0x10/0x10
? _raw_spin_unlock_irqrestore+0x44/0x60
? srso_alias_return_thunk+0x5/0xfbef5
? lockdep_hardirqs_on_prepare+0xea/0x1a0
? srso_alias_return_thunk+0x5/0xfbef5
? usb_enable_lpm+0x3c/0x260
usb_set_configuration+0xb64/0xf20
usb_generic_driver_probe+0x5f/0x90
usb_probe_device+0x71/0x1b0
really_probe+0x46b/0x660
__driver_probe_device+0x106/0x240
driver_probe_device+0x4a/0x110
__device_attach_driver+0xf1/0x1a0
? __pfx___device_attach_driver+0x10/0x10
bus_for_each_drv+0xf9/0x160
? __pfx_bus_for_each_drv+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? trace_hardirqs_on+0x18/0x130
? srso_alias_return_thunk+0x5/0xfbef5
? _raw_spin_unlock_irqrestore+0x44/0x60
__device_attach+0x133/0x2a0
? __pfx___device_attach+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? do_raw_spin_unlock+0x9a/0x100
? srso_alias_return_thunk+0x5/0xfbef5
device_initial_probe+0x55/0x70
bus_probe_device+0x4a/0xd0
device_add+0x9b9/0xc10
? __pfx_device_add+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? add_device_randomness+0xb7/0xf0
usb_new_device+0x492/0x870
hub_event+0x1b10/0x29c0
? __pfx_hub_event+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? lock_acquire+0x187/0x300
? process_one_work+0x475/0xb90
? srso_alias_return_thunk+0x5/0xfbef5
? lock_release+0xc8/0x290
? srso_alias_return_thunk+0x5/0xfbef5
process_one_work+0x4d7/0xb90
? __pfx_process_one_work+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? __list_add_valid_or_report+0x37/0xf0
? __pfx_hub_event+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
worker_thread+0x2d8/0x570
? __pfx_worker_thread+0x10/0x10
kthread+0x1ad/0x1f0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x3c9/0x540
? __pfx_ret_from_fork+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? __switch_to+0x2e9/0x730
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>

Fixes: 121a0f839dbb ("usb: misc: Add Intel USBIO bridge driver")
Cc: stable <stable@kernel.org>
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://patch.msgid.link/20260618124029.3704089-1-zzzccc427@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "usb: typec: mux: avoid duplicated mux switches"

This reverts commit b145c3f29d62f71cc9d2d714e2d4ae4c8d3f863d.

The deduplication logic appears to cause issues with separate
SBU muxes. The mode-switch call on these (like gpio-sbu-mux)
never appeared, so no successful mode-switch happened. The more
high-end Parade PS883X redrivers are not affected due to being
retimer-switch. The revert fixes dp altmode mode-switch for both.

Tested on:
  Lenovo Thinkbook 16 G7 QOY
  Lenovo Ideapad 5 2in1 14Q8X9
  Microsoft Windows Dev Kit 2023 (Blackrock)
  Lenovo Thinkpad T14s G6

Fixes: b145c3f29d62 ("usb: typec: mux: avoid duplicated mux switches")
Cc: stable <stable@kernel.org>
Signed-off-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260530-typc-mux-modeset-v1-1-64b0281e2cd6@oldschoolsolutions.biz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pmdomain: imx: Fix i.MX8MP VC8000E power up sequence

Per errata[1]:
ERR050531: VPU_NOC power down handshake may hang during VC8000E/VPUMIX
power up/down cycling.
Description: VC8000E reset de-assertion edge and AXI clock may have a
timing issue.
Workaround: Set bit2 (vc8000e_clk_en) of BLK_CLK_EN_CSR to 0 to gate off
both AXI clock and VC8000E clock sent to VC8000E and AXI clock sent to
VPU_NOC m_v_2 interface during VC8000E power up(VC8000E reset is
de-asserted by HW)

Add a bool variable is_errata_err050531 in
'struct imx8m_blk_ctrl_domain_data' to represent whether the workaround
is needed. If is_errata_err050531 is true, first clear the clk before
powering up gpc, then enable the clk after powering up gpc.

[1] https://www.nxp.com/webapp/Download?colCode=IMX8MP_1P33A

Fixes: a1a5f15f7f6cb ("soc: imx: imx8m-blk-ctrl: add i.MX8MP VPU blk ctrl")
Cc: stable@vger.kernel.org
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

pmdomain: imx: Fix i.MX8MP power notifier

Using imx8mm_vpu_power_notifier() for i.MX8MP is wrong, as it ungates
the VPU clocks to provide the ADB clock, which is necessary on i.MX8MM,
but on i.MX8MP there is a separate gate (bit 3) for the NoC. So add
imx8mp_vpu_power_notifier() for i.MX8MP.

Fixes: a1a5f15f7f6cb ("soc: imx: imx8m-blk-ctrl: add i.MX8MP VPU blk ctrl")
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

cifs: validate DFS referral string offsets

parse_dfs_referrals() validates that the response header and referral
array fit in the received buffer, but each referral also contains string
offsets supplied by the server.

Those offsets are used to compute the DfsPath and NetworkAddress string
pointers without checking whether they still point inside the response
buffer. A malformed referral can therefore make the computed pointer
exceed the end of the buffer. The resulting negative max_len is then
passed to cifs_strndup_from_utf16(), and the non-Unicode path forwards it
to kstrndup() as a size_t, allowing strnlen() to read out of bounds.

Validate each string offset before deriving the string pointer.

Fixes: 4ecce920e13a ("CIFS: move DFS response parsing out of SMB1 code")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: use GFP_KERNEL for DFS cache allocations

In dfs_cache.c, the helper functions alloc_target(), setup_referral(),
and update_cache_entry_locked() currently utilize GFP_ATOMIC to
allocate memory.

However, all of these functions are executed in sleepable conditions.
Use GFP_KERNEL instead, to reduce the risk of allocation failure and
stop putting unnecessary pressure on emergency memory pools in
low-memory scenarios.

Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

pmdomain: imx93-blk-ctrl: Extract PHY as shared domain for DSI/CSI

The MIPI DSI and CSI domains share control bits for clock and reset, which
can lead to incorrect behavior if one domain disables the shared resource
while the other is still active.

To fix the issue, introduce a shared MIPI PHY power domain to own the
common resources and make DSI and CSI its subdomains. This ensures the
shared bits are properly managed and not disabled while still in use.

Fixes: e9aa77d413c9 ("soc: imx: add i.MX93 media blk ctrl driver")
Cc: stable@vger.kernel.org
Signed-off-by: Guoniu Zhou <guoniu.zhou@oss.nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

dt-bindings: power: imx93: Add MIPI PHY power domain

Add MIPI PHY power domain for shared PHY resources used by both
MIPI DSI and CSI blocks.

Signed-off-by: Guoniu Zhou <guoniu.zhou@oss.nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Fixes: e9aa77d413c9 ("soc: imx: add i.MX93 media blk ctrl driver")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulfh@kernel.org>

s390: Add build salt to the vDSO

The vDSO needs to have a unique build id in a similar manner
to the kernel and modules. Use the build salt macro.

Signed-off-by: Bastian Blank <waldi@debian.org>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

s390/zcrypt: Remove the empty file

The files has no real user because CEX2 and CEX3 device drivers
are removed, also remove these empty files.

Fixes: 5ac8c72462cd ("s390/zcrypt: remove CEX2 and CEX3 device drivers")
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

s390/mm: Fix type mismatch in get_align_mask().

Commit 86f48f922ba79 ("s390/mmap: disable mmap alignment when
randomize_va_space = 0") introduced get_align_mask() with return type of
'int', while the target field 'info.align_mask' in struct
vm_unmapped_area_info is 'unsigned long'.

With currently used masks, this should not cause truncation issues, but
fix it and return 'unsigned long' to avoid future problems.

Fixes: 86f48f922ba79 ("s390/mmap: disable mmap alignment when randomize_va_space = 0")
Cc: stable@vger.kernel.org # v6.9+
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

s390/diag: Add missing array_index_nospec() call to memtop_get_page_count()

'level' is user space controlled and used to read from an array. Add the
missing array_index_nospec() call to prevent speculative execution.

Cc: stable@vger.kernel.org
Fixes: 0d30871739ab ("s390/diag: Add memory topology information via diag310")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

iio: event: Fix event FIFO reset race

`iio_event_getfd()` creates the event file descriptor with
`anon_inode_getfd()`, which allocates a new fd, creates the anonymous
file and installs it in the process fd table before returning to the
caller.

The IIO code resets the event FIFO after `anon_inode_getfd()` has returned,
but before `IIO_GET_EVENT_FD_IOCTL` has copied the fd number to userspace.
But since fd tables are shared between threads, another thread can guess
the newly allocated fd number and issue a `read()` on it as soon as the fd
has been installed.

This means the `kfifo_to_user()` in `iio_event_chrdev_read()` can run in
parallel with the `kfifo_reset_out()` in `iio_event_getfd()`.

The kfifo documentation says that `kfifo_reset_out()` is only safe when it
is called from the reader thread and there is only one concurrent reader.
Otherwise it is dangerous and must be handled in the same way as
`kfifo_reset()`.

If that happens, `kfifo_to_user()` can advance the FIFO `out` index based
on state from before the reset, after the reset has already moved the `out`
index to the current `in` index. That can leave the FIFO with an `out`
index past the `in` index. A later `read()` can then see an underflowed
FIFO length and copy more data than the event FIFO buffer contains. This
can result in an out-of-bounds read and leak adjacent kernel memory to
userspace.

Move the FIFO reset before `anon_inode_getfd()`. At that point the event fd is
marked busy, but the new fd has not been installed yet, so userspace cannot
access it while the FIFO is reset.

Fixes: b91accafbb10 ("iio:event: Fix and cleanup locking")
Reported-by: Codex:gpt-5.5
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Reviewed-by: Nuno Sá <nuno.sa@analog.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>

USB: chaoskey: Fix slab-use-after-free in chaoskey_release()

The chaoskey driver has a use-after-free bug in its release routine.
If the user closes the device file after the USB device has been
unplugged, a debugging log statement will try to access the
usb_interface structure after it has been deallocated:

BUG: KASAN: slab-use-after-free in dev_driver_string (drivers/base/core.c:2406)
Read of size 8 at addr ffff888168e8a0b8 by task chaoskey_raw_re/10106

Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
kasan_report (mm/kasan/report.c:595)
dev_driver_string (drivers/base/core.c:2406)
__dynamic_dev_dbg (lib/dynamic_debug.c:906)
chaoskey_release (drivers/usb/misc/chaoskey.c:323)
__fput (fs/file_table.c:510)
fput_close_sync (fs/file_table.c:615)
__x64_sys_close (fs/open.c:1507 fs/open.c:1492 fs/open.c:1492)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

The driver's last reference to the interface structure is dropped in
the chaoskey_free() routine, so the code must not use the interface --
even in a debugging statement -- after that routine returns.
(Exception: If we know that another reference is held by someone else,
such as the device core while the disconnect routine runs, there's no
problem. Thanks to Johan Hovold for pointing this out.)

Since the bad access is part of an unimportant debugging statement,
we can fix the problem simply by removing the whole statement.

Reported-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Closes: https://lore.kernel.org/linux-usb/20EC9664-054E-438B-B411-2145D347F97B@gmail.com/
Tested-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: 66e3e591891d ("usb: Add driver for Altus Metrum ChaosKey device (v2)")
Cc: stable <stable@kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/bb5b1dc6-eb59-43e1-8d26-51e658e88bbe@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

watchdog: airoha: Prevent division by zero when clock frequency is zero

clk_get_rate() can return 0 when the clock provider is not properly
configured or the clock is unmanaged. The driver uses wdt_freq as a
divisor directly in airoha_wdt_probe() to compute max_timeout and in
airoha_wdt_get_timeleft() to compute the remaining time, which results
in a division by zero.

Add a check for wdt_freq == 0 in probe and return -EINVAL with
dev_err_probe() to prevent the division by zero and provide a
diagnostic message.

Fixes: 3cf67f3769b8 ("watchdog: Add support for Airoha EN7851 watchdog")
Signed-off-by: Wayen Yan <win847@gmail.com>
Link: https://lore.kernel.org/r/178347932594.81327.4834644880399144119@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

watchdog: pretimeout: Fix UAF in watchdog_unregister_governor()

When a watchdog governor is unregistered, it updates existing watchdog
devices that were using this governor by falling back to `default_gov`.

If the governor being unregistered is currently set as `default_gov`,
the `default_gov` is never cleared.  This leads to 2 use-after-free
issues:
1. New watchdog devices registered after this point will inherit the
   dangling `default_gov`.
2. Existing watchdog devices using the unregistered governor will have
   their `wdd->gov` reassigned to the dangling `default_gov`.

Fix the UAF by clearing `default_gov` if it matches the governor being
unregistered.

Fixes: da0d12ff2b82 ("watchdog: pretimeout: add panic pretimeout governor")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://lore.kernel.org/r/20260707101803.3598173-1-tzungbi@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

ipvs: ensure inner headers in ICMP errors are in headroom

Sashiko points out that after stripping the outer headers
with pskb_pull() we should ensure the inner IP headers
in ICMP errors from tunnels are present in the skb headroom
for functions like ipv4_update_pmtu(), icmp_send() and
IP_VS_DBG().

Also, add more checks for the length of the inner headers.

Fixes: f2edb9f7706d ("ipvs: implement passive PMTUD for IPIP packets")
Link: https://sashiko.dev/#/patchset/20260702073430.67680-1-zhaoyz24%40mails.tsinghua.edu.cn
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: use parsed transport offset in SCTP state lookup

set_sctp_state() reads the SCTP chunk header again in order to drive the
IPVS SCTP state table. For IPv6 it computes the offset with
sizeof(struct ipv6hdr), while the surrounding IPVS code uses iph.len from
ip_vs_fill_iph_skb(), where ipv6_find_hdr() has already skipped
extension headers and found the real transport header.

This makes the state machine read from the wrong offset for IPv6 SCTP
packets that carry extension headers. For example, an INIT packet with an
8-byte destination options header can be scheduled correctly by
sctp_conn_schedule(), but set_sctp_state() reads the first byte of the
SCTP verification tag as a DATA chunk type. The connection then moves
from NONE to ESTABLISHED instead of INIT1, gets the longer established
timeout, and updates the active/inactive destination counters
incorrectly. This happens even though the SCTP handshake has not
completed.

Use the parsed transport offset passed down from ip_vs_set_state() for
the SCTP chunk-header lookup. For IPv4 and IPv6 packets without
extension headers this preserves the existing offset.

Fixes: 2906f66a5682 ("ipvs: SCTP Trasport Loadbalancing Support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/20260705123040.35755-1-zhaoyz24@mails.tsinghua.edu.cn/
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: Claude Code:GLM-5.2
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: use parsed transport offset in TCP state lookup

TCP state handling reparses the skb to find the TCP header. For IPv6 it
uses sizeof(struct ipv6hdr), while the surrounding IPVS code already
parsed the packet with ip_vs_fill_iph_skb() and has the real
transport-header offset in iph.len.

This makes TCP state handling look at the wrong bytes when an IPv6
packet carries extension headers. Use the parsed transport offset passed
down from ip_vs_set_state() when reading the TCP header.

For IPv4 and for IPv6 packets without extension headers, the passed
offset matches the previous value.

Fixes: 0bbdd42b7efa6 ("IPVS: Extend protocol DNAT/SNAT and state handlers")
Link: https://lore.kernel.org/netdev/20260705125659.37744-1-zhaoyz24@mails.tsinghua.edu.cn/
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Assisted-by: Claude Code:GLM-5.2
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

ipvs: pass parsed transport offset to state handlers

IPVS callers already parse the packet into struct ip_vs_iphdr before
updating connection state. For IPv6 this records the real
transport-header offset after extension headers in iph.len.

Pass this parsed transport offset through ip_vs_set_state() and the
protocol state_transition() callback so protocol handlers can use the
same packet context as scheduling and NAT handling. This patch only
changes the common callback plumbing and adapts the protocol callback
signatures; TCP and SCTP start using the value in follow-up patches.

Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: handle unreadable frags

sashiko reports:
When an skb with unreadable fragments (such as from devmem TCP, where
skb_frags_readable(skb) returns false) is processed by the u32 module,
skb_copy_bits() will safely return a negative error code [..]

xt_u32: bail out with hotdrop in this case.
gather_frags: return -1, just as if we had no fragment header.
nfnetlink_queue: restrict to the linear part.
nfnetlink_log: restrict to the linear part.

v2:
- skb_zerocopy helpers don't copy readable flag, i.e. nfnetlink_queue
is broken too
xt_u32 shouldn't return true if hotdrop was set.

Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags")
Cc: stable@vger.kernel.org
Acked-by: Mina Almasry <almasrymina@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: flowtable: support IPIP tunnel with direct xmit

The combination of IPIP tunnel with direct xmit, eg. bridge device,
breaks because no dst_entry is provided to check the skb headroom and to
set the iph->frag_off field. This leads to invalid dst usage and can
trigger a crash in the tunnel transmit path.

Fix this by moving dst_cache and dst_cookie out of the runtime union so
that they can be shared by neighbour, xfrm, and direct tunnel flows.
For FLOW_OFFLOAD_XMIT_DIRECT tuples carrying tunnel metadata, preserve
route state in these shared fields and release it through the common
dst release path.

Since dst_entry is now available to the three supported xmit modes and
dst_release() already deals with NULL dst, remove the xmit type check
in nft_flow_dst_release(). Moreover, skip the check if the dst entry
is NULL in nf_flow_dst_check() which is now the case for the direct
xmit case.

Based on patch from Rein Wei <n05ec@lzu.edu.cn>.

Fixes: d30301ba4b07 ("netfilter: flowtable: Add IPIP tx sw acceleration")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Reported-by: Zhengyang Chen <chzhengyang2023@lzu.edu.cn>
Reported-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>

netfilter: flowtable: IPIP tunnel hardware offload is not yet support

No driver supports for IPIP tunnels yet, give up early on setting up the
hardware offload for this scenario.

This patch adds a stub that can be enhanced to add more configuration
that are currently not supported. As of now, the offload work is
enqueued to the worker, then ignored if the hardware offload
configuration is not supported.

Check the NF_FLOW_HW flag to know if this entry was already tried once
to be offloaded so this is not retried on refresh when unsupported. Move
NF_FLOW_HW flag check to nf_flow_offload_add(). If this NF_FLOW_HW flag
is unset the _del and _stats variants are never called.

This can be updated later on to skip hardware offload work to be queued
in case hardware offload does not support it.

Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration")
Fixes: ab427db17885 ("netfilter: flowtable: Add IPIP rx sw acceleration")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Reported-by: Zhengyang Chen <chzhengyang2023@lzu.edu.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>