git.ipfire.org Git - thirdparty/kernel/linux.git/log

dm-integrity: fix a bug if the bio is out of limits

If dm_integrity_check_limits fails, the code would exit with
DM_MAPIO_KILL. However, the range would be already locked at this point,
and it wouldn't be unlocked, resulting in a deadlock. Let's move the
limit check up, so that when it exits, no resources are leaked.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Assisted-by: Claude:claude-opus-4.6
Fixes: fb0987682c62 ("dm-integrity: introduce the Inline mode")
Cc: stable@vger.kernel.org

dm-integrity: don't increment hash_offset twice

hash_offset is already incremented in the loop "for (i = 0; i < to_copy;
i++, ts--)". Do not increment it again.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Assisted-by: Claude:claude-opus-4.6
Fixes: 84597a44a9d8 ("dm-integrity: dm integrity: add optional discard support")
Cc: stable@vger.kernel.org

dm-integrity: fix leaking uninitialized kernel memory

If hash size is less than device's tuple size, dm-integrity is supposed
to zero the remaining space. There was a bug in the code that zeroing
didn't work. This commit fixes it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Assisted-by: Claude:claude-opus-4.6
Fixes: fb0987682c62 ("dm-integrity: introduce the Inline mode")
Cc: stable@vger.kernel.org

dm-integrity: fix the 'fix_hmac' option

When the "fix_hmac" argument is used, dm-integrity is supposed to check
the superblock with the journal_mac. However, there was a logic bug in
the code - the code only checked the superblock mac if the bit
SB_FLAG_FIXED_HMAC was set in the superblock. So, the attacker could
clear this bit and bypass the checking trivially.

This commit changes dm-integrity so that when the user specified the
"fix_hmac" flag and the superblock doesn't have the bit
SB_FLAG_FIXED_HMAC set, the activation is aborted with an error.

Unfortunatelly, there's a bug in the integritysetup tool that when using
the 'open' command it passes the "fix_hmac" argument to the kernel even
if the user specified --integrity-legacy-hmac. The bug will be fixed in
the upcoming 2.8.7 release.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Shukai Ni <shukai.ni@kuleuven.be>

Merge tag 'reset-fixes-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pza/linux into arm/fixes

Reset controller fixes for v7.2

* Fix the SpacemiT K3 USB2 AHB reset bit location.
* Add missing COMBOPHY_RESET definition for Altera Agilex5.
* Fix the reset-sunxi initialization error path to release the
  requested memory region.
* Correct polarity of MIPI CSI resets on NXP i.MX8MQ. The corresponding
  fix in the CSI2 driver, 6d79bb8fd2aa, is already contained in v7.2-rc1.

* tag 'reset-fixes-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pza/linux:
  reset: imx7: Correct polarity of MIPI CSI resets on i.MX8MQ
  reset: sunxi: fix memory region leak on ioremap failure
  dt-bindings: reset: altr: add COMBOPHY_RESET for Agilex5
  reset: spacemit: k3: fix USB2 ahb reset

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

MAINTAINERS: Update SpacemiT SoC git tree repository

Due to security concern, switch SpacemiT kernel SoC tree's repository
from github.com to kernel.org

Signed-off-by: Yixun Lan <dlan@kernel.org>
Link: https://lore.kernel.org/r/20260707-07-spacemit-git-repo-url-v1-1-137697316a4c@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'asoc-fix-v7.2-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v7.2

A fairly standard set of device specific fixes and quriks for new
devices, nothing too remarkable here.

powerpc/pseries/Kconfig: Enable CONFIG_VPA_PMU to be used with KVM

Currently, CONFIG_VPA_PMU is not enabled by default, and consequently
cannot be used for KVM guests at all, unless explicitly enabled on
host kernel.

Mark CONFIG_VPA_PMU as "default m" to ensure it is available when KVM is
being used.

Cc: stable@vger.kernel.org # v6.13+
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
[Maddy: Changed tag order]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260615091120.84169-1-gautam@linux.ibm.com

ppp: defer channel free to an RCU grace period to fix pppol2tp RX UAF

pppol2tp_recv() runs in the L2TP UDP-encap softirq RX path:

l2tp_udp_encap_recv() -> l2tp_recv_common() -> pppol2tp_recv()
   -> ppp_input(&po->chan)

It runs under rcu_read_lock() holding only an l2tp_session reference and
takes NO reference on the internal PPP channel (struct channel,
chan->ppp) that ppp_input() dereferences.

The pppox socket is SOCK_RCU_FREE, so 'po' and the embedded ppp_channel
are RCU-safe.  But the internal struct channel is a separate allocation
that ppp_release_channel() frees with a plain kfree():

close(data socket) -> pppol2tp_release() -> pppox_unbind_sock()
   -> ppp_unregister_channel() -> ppp_release_channel() -> kfree(pch)

For a channel that is bound (PPPIOCGCHAN) but not attached to a ppp unit
(no PPPIOCCONNECT, pch->ppp == NULL) and not bridged, teardown skips
both ppp_disconnect_channel()'s synchronize_net() and
ppp_unbridge_channels()'s synchronize_rcu(), so the kfree() has no grace
period.  rcu_read_lock() in pppol2tp_recv() does not protect against a
plain kfree(), so an in-flight ppp_input() on one CPU can dereference
the channel just freed by close() on another CPU.

The bug is reachable by an unprivileged user.

Defer the channel free to an RCU callback via call_rcu() so the grace
period fences any in-flight ppp_input(). The disconnect and unbridge
teardown paths already fence with synchronize_net()/synchronize_rcu();
call_rcu() does the same here without stalling the close() path.

Fixes: ee40fb2e1eb5 ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Reviewed-by: Qingfang Deng <qingfang.deng@linux.dev>
Link: https://patch.msgid.link/E793FCF2-58DE-4387-A983-C7B4BC3158BD@doyensec.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/landlock: Skip scoped_signal subtest with MSG_OOB if not available

MSG_OOB might be disabled in the kernel for unix sockets (by not
selecting CONFIG_AF_UNIX_OOB), and in this case the related tests
of the scoped_signal_test are currently failing. Add a runtime
probe using socketpair() to detect MSG_OOB support and skip the
test gracefully if it is unavailable.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://patch.msgid.link/20260710081642.405916-1-thuth@redhat.com
Cc: stable@vger.kernel.org
Fixes: f34e9ce5f479 ("selftests/landlock: Test signal created by out-of-bound message")
Signed-off-by: Mickaël Salaün <mic@digikod.net>

selftests/landlock: Fix screwed up pointers in the scoped_signal_test

The scoped_signal_test uses pthread_join(..., (void **)&ret)) in
a couple of places, i.e. the return value of the thread is stored
in the shape of a "void *" into the memory location of &ret.
Pointers are 64-bit on modern computers, but the ret variable is
declared as a simple "enum thread_return" which is only 32 bits.
So the pthread_join() will overflow the ret variable by 4 byte.

The problem is very visible on big endian systems like s390x
where the test is failing: The least significant byte that carries
the return code of the thread is not written into the ret variable
here, but somewhere else in the stack frame, so the comparison
for the right return code is failing here.

Fix it by getting rid of the enum and defining the THREAD_* constants
and "ret" variables as proper "void *" pointers. This way we can
also get rid of some ugly (void *) castings in a couple of spots.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://patch.msgid.link/20260709164340.339656-1-thuth@redhat.com
Cc: stable@vger.kernel.org
Fixes: c8994965013e ("selftests/landlock: Test signal scoping for threads")
[mic: Add clang-format markups]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Update formatting

Following commit 99df2a8eba34 ("clang-format: fix formatting of guard()
and scoped_guard() statements"), update scoped_guard() formatting.
Also, see the related fix [1].

Cc: Günther Noack <gnoack@google.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20260708105713.2073335-1-mic@digikod.net
Link: https://patch.msgid.link/20260708110635.2083515-1-mic@digikod.net
Reviewed-by: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Fix kernel-doc for the nested quiet layer flag

kernel-doc emits "Excess struct member 'quiet' description in
'landlock_layer'" because "quiet" is a bitfield inside the named nested
struct "flags", but its inline comment used the bare member name
"@quiet:", which kernel-doc attributes to the enclosing landlock_layer.

Use the canonical dotted notation "@flags.quiet:" so kernel-doc resolves
the nested member, and include it in the generated documentation.

Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Tingmao Wang <m@maowtm.org>
Fixes: a260c0055665 ("landlock: Add a place for flags to layer rules")
Link: https://patch.msgid.link/20260703141711.2016964-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>

selftests/landlock: Add test for TCP fast open

Enforce that TCP Fast Open is controlled by
LANDLOCK_ACCESS_NET_CONNECT_TCP. Semantics of connect() and
sendmsg(MSG_FASTOPEN) should be identical from Landlock's perspective.
Also enforce error code consistency, since UDP sockets ignore the
MSG_FASTOPEN flag while Unix sockets reject it.

Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://patch.msgid.link/20260701214628.33319-2-matthieu@buffet.re
Cc: stable@vger.kernel.org
[mic: Fix formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Fix TCP Fast Open connection bypass

The documentation of the socket_connect() LSM hook states that it
controls connecting a socket to a remote address. It has not been the
case since the addition of TCP Fast Open (RFC 7413) support, which
allows opening a TCP connection (thus, setting a socket's destination
address) via the MSG_FASTOPEN flag passed to
sendto()/sendmsg()/sendmmsg(). The problem then got duplicated into
MPTCP.

Landlock did not take it into account when its TCP support was added,
leaving a bypass of TCP connect policy.

Ideally a call to the LSM hook would be added in the fastopen code path,
in order to fix this generically. But connect() hooks are designed to
run with the socket locked, unlike sendmsg() hooks.

Closes: https://github.com/landlock-lsm/linux/issues/41
Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://patch.msgid.link/20260701214628.33319-1-matthieu@buffet.re
Cc: stable@vger.kernel.org
[mic: Wrap commit message]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

perf/aux: Fix page UAF in map_range()

map_range() reads rb->aux_pages[], rb->aux_nr_pages and rb->aux_pgoff via
perf_mmap_to_page() while holding only event->mmap_mutex. Those fields are
serialized by rb->aux_mutex, and mmap_mutex is per event.

Thus, two events sharing one rb via PERF_EVENT_IOC_SET_OUTPUT can race
rb_alloc_aux() with map_range(), leading to a page-UAF scenario as follows:

  CPU 0                           CPU 1
  =====                           =====
  rb_alloc_aux()                  map_range()
  [1]: allocate rb->aux_pages[0]
  [2]: rb->aux_nr_pages++
                                  [3]: perf_mmap_to_page()
                                         returns rb->aux_pages[0]
                                  [4]: map it as VM_PFNMAP
  [5]: rb->aux_pgoff = 1

  munmap the page
  [6]: free rb->aux_pages[0]

Pages mapped as VM_PFNMAP have no refcount protection, so CPU 1 holds a
mapping to a freed physical frame.

Fix this by taking rb->aux_mutex across the page walk in map_range().

Fixes: b709eb872e19 ("perf: map pages in advance")
Signed-off-by: Lee Jia Jie <jiajie.lee@starlabs.sg>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>

nouveau/vmm: fix another SPT/LPT race

We've had an unknown Turing issue for a while with page faults since
large pages and compression.

I've got a patch series that syncs all our L2 handling with ogkm and it
made this fault happen more.

After writing a bunch of debugging patches, I spotted an invalid LPT
entry where there should have been a valid one.

A 64K MAP succeeds on a range, but a subsequent SPT put drops SPT refs
across multiple ranges,

We shouldn't assume all ranges where SPTEs go away will have the same
sparse/invalid/valid state, just iterate over each instead and do the
right thing.

Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes: d19512f5abb1 ("nouveau/vmm: start tracking if the LPT PTE is valid. (v6)")
Link: https://patch.msgid.link/20260615044737.3419585-1-airlied@gmail.com
[ Properly format commit message. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
(cherry picked from commit d008141ed4ce924167a03d46fbce9ad1fe4efa29)
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'drm-xe-fixes-2026-07-09' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix PTE index in xe_vm_populate_pgtable for chunked binds (Matt Brost)
- Wait on external BO kernel fences in exec IOCTL (Matt Brost)
- Remove duplicate include (Anas Khan)
- Free madvise VMA array on L2 flush failure (Guangshuo Li)
- Stub notifier_lock helpers when DRM_GPUSVM=n (Shuicheng Lin)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/alASIbW318Rl-HTv@fedora

Merge tag 'amd-drm-fixes-7.2-2026-07-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-7.2-2026-07-09:

amdgpu:
- PSP 15.0.9 update
- SMU 15.0.9 update
- VCN 5.3 fix
- VI ASPM fix
- Userq fix
- lifetime fix for amdgpu_vm_get_task_info_pasid()
- Gfx10 fix
- SMU 14 fix

amdkfd:
- CRIU bounds checking fixes
- secondary context id fix
- Event bounds checking fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260709212303.15913-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2026-07-09' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v7.2-rc3:
- Fix uaf in amdxdna mmap failure path.
- A lot of deadlocks, access races and return value fixes in amdxdna.
- Fix analogix_dp bitshifts during link training.
- Use direct label in drm_exec.
- Fix absent indirect bo handling in v3d.
- Sync on first active crtc in fb_dirty, rather than first crtc.
- Rework try_harder in the buddy allocator.
- Make imagination function static to solve compiler warning.
- Fix imagination error checking.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/71e5b48b-307f-47f5-8fd5-b60ea43e4196@linux.intel.com

Merge tag 'drm-intel-fixes-2026-07-09' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

Fix underrun regressions on Panther Lake by reverting the recent
SCL=0 enablement for always-on VRR timing. It also includes a fix
display LT PHY SSC programming and a small set of i915 fixes
addressing NULL pointer dereferences, memory leaks and bound checks.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/ak-xZPqluaXVJGtP@intel.com

Merge tag 'v7.2-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
"This contains a set of SMB server fixes mostly around session setup,
  multichannel/session binding, and protocol-compatible error reporting:

   - Fix SID-to-id mapping so only SIDs with a valid local Unix
     representation are translated, while preserving other Windows
     SIDs in NT ACL xattrs

   - Fix SMB3 multichannel binding across multi-round authentication,
     keep the derived channel key separate from the established session
     key, and enforce the 32-channel session limit

   - Match Windows-compatible close timestamp behavior by coalescing
     automatic write time updates smaller than 15ms

   - Return STATUS_DISK_FULL for SET_INFO allocation failures caused
     by ENOSPC or EFBIG

   - Fix several signed SESSION_SETUP error paths so clients see the
     intended server status instead of replacing it with
     STATUS_ACCESS_DENIED

   - Fix reauthentication on bound channels and reject different-user
     channel binding with STATUS_ACCESS_DENIED

   - Use the referenced session dialect/signing algorithm when
     validating and signing rejected cross-dialect binding requests"

* tag 'v7.2-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: use the session dialect for rejected binding signatures
  ksmbd: mark rejected cross-dialect bindings as signed
  ksmbd: sign rejected SMB2.1 session binding responses
  ksmbd: handle channel binding with a different user
  ksmbd: find bound sessions during reauthentication
  ksmbd: mark invalid session responses as signed
  smb/server: map SET_INFO ENOSPC to disk full
  ksmbd: coalesce sub-15ms write time updates on close
  ksmbd: fix multichannel binding and enforce channel limit
  ksmbd: validate SID namespace before mapping IDs

cifs: Remove CIFSSMBSetPathInfoFB() fallback function

This fallback function CIFSSMBSetPathInfoFB() is called only from
CIFSSMBSetPathInfo() function. CIFSSMBSetPathInfo() is used in
smb_set_file_info() which contains all required fallback code, including
fallback via filehandle, since commit f122121796f9 ("cifs: Fix changing
times and read-only attr over SMB1 smb_set_file_info() function") and
commit 92210ccd877b ("cifs: Add fallback code path for cifs_mkdir_setinfo()").

So the CIFSSMBSetPathInfoFB() is just code duplication, which is not needed
anymore. Therefore remove it.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: Fix and improve cifs_is_path_accessible() function

Do not call SMBQueryInformation() command for path with SMB wildcard
characters on non-UNICODE connection because server expands wildcards.
Function cifs_is_path_accessible() needs to check if the real path exists
and must not expand wildcard characters.

Do not dynamically allocate memory for small FILE_ALL_INFO structure and
instead allocate it on the stack. This structure is allocated on stack by
all other functions.

When CAP_NT_SMBS was not negotiated then do not issue CIFSSMBQPathInfo()
command. This command returns failure by non-NT Win9x SMB servers, so there
is no need try it. The purpose of cifs_is_path_accessible() function is
just to check if the path is accessible, so SMBQueryInformation() for old
servers is enough.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

mm/memory-failure: trace: change memory_failure_event to ras subsystem

Commit 97f0b1345219 ("tracing: add trace event for memory-failure")
introduced memory_failure_event in ras subsystem. commit 31807483d395
("mm/memory-failure: remove the selection of RAS") changed
memory_failure_event to memory_failure subsystem. This breaks the
backward compatibility, some user programs rely on it.

Change memory_failure_event to ras subsystem to keep backward
compatibility.

Link: https://lore.kernel.org/20260605081213.154660-1-xieyuanbin1@huawei.com
Fixes: 31807483d395 ("mm/memory-failure: remove the selection of RAS")
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Reported-by: Yi Lai <yi1.lai@intel.com>
Reported-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Closes: https://lore.kernel.org/linux-mm/CY8PR11MB7134346A3E4BB28ECA28D6E989132@CY8PR11MB7134.namprd11.prod.outlook.com
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_reporting: allow driver to set batch capacity

At the moment, if a virtio balloon device has a page reporting vq but its
size is < PAGE_REPORTING_CAPACITY (32), the balloon driver fails probe.

But, there's no way for host to know this value, so it can easily create a
smaller vq and suddenly adding the reporting capability to the device
makes all of the driver fail.  Not pretty.

Add a capacity field to page_reporting_dev_info so drivers can control the
maximum number of pages per report batch.

In virtio-balloon, set the capacity to the reporting virtqueue size,
letting page_reporting adapt to whatever the device provides.

Capacity need not be a power of two.  Code previously called out division
by PAGE_REPORTING_CAPACITY as cheap since it was a power of 2, but no
performance difference was observed with non-power-of-2 values.

If capacity is 0 or exceeds PAGE_REPORTING_CAPACITY, it defaults to
PAGE_REPORTING_CAPACITY.  The 0 check and the clamping is done in
page_reporting_register(), before the reporting work is scheduled, so we
never get division by 0.

Link: https://lore.kernel.org/444c24cf39f3f3620fc90ef4695bd6b0979f4c4b.1783232420.git.mst@redhat.com
Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page reports to host")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Gregory Price <gourry@gourry.net>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Eugenio Pérez <eperezma@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/kmemleak: fix checksum computation for per-cpu objects

The per-cpu object checksum folds each CPU's CRC together with XOR and
seeds every CRC with 0.  Both choices make update_checksum() miss content
changes:

  - XOR is self-cancelling, so equal contents on two CPUs cancel out and
    simultaneous identical changes leave the checksum unchanged.
  - crc32(0, ...) over all-zero content is 0, so a freshly allocated,
    zeroed per-cpu area checksums to 0, matching the initial value, and
    the object is never seen to change.

See discussions at [0].

When update_checksum() wrongly reports an actively modified object as
unchanged, kmemleak stops greying it for an extra scan and can report a
live per-cpu object as a leak.

Fold the per-cpu CRC as a single rolling checksum across all CPUs and
initialise the object checksum to ~0 so the first computed value always
registers as a change, even for content that hashes to 0.
reset_checksum() is seeded the same way.

Link: https://lore.kernel.org/all/akfYImSNDh3OjIfR@gmail.com
Link: https://lore.kernel.org/20260703-kmemleak_checksum-v1-1-5e0ab7d6966f@debian.org
Fixes: 6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")
Signed-off-by: Breno Leitao <leitao@debian.org>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: disallow overlapping input ranges for damon_set_regions()

damon_set_regions() assumes the input ranges are sorted by the address and
don't overlap each other.  Hence the assumption was initially to be
explicitly validated.  But commit 97d482f4592f ("mm/damon/sysfs: reuse
damon_set_regions() for regions setting") has mistakenly removed the
validation.

This can make DAMON behave in unexpected ways.  At the best, the
monitoring results snapshot will just look weird since there will be
overlapping regions.  DAMOS will also work weirdly, applying the same
action multiple times for overlapping regions, and make DAMOS quota weird.
More seriously, depending on the setup and regions updates sequence,
negative size regions can be made.  It will trigger WARN_ONCE() if the
kernel is built with CONFIG_DAMON_DEBUG_SANITY=y.  Depending on the
monitoring results, the negative size region can further trigger division
by zero in damon_merge_two_regions().

Note that some of the consequences including the WARN_ONCE() and the
divide by zero depend on commits that were introduced after the root cause
commit 97d482f4592f ("mm/damon/sysfs: reuse damon_set_regions() for
regions setting").

Fix the problems by checking the assumption and returning an error if
the input ranges don't meet the assumption.

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260703165610.92894-1-sj@kernel.org
Link: https://lore.kernel.org/20260630041806.151124-1-sj@kernel.org
Fixes: 97d482f4592f ("mm/damon/sysfs: reuse damon_set_regions() for regions setting")
Signed-off-by: SJ Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add Usama as a THP reviewer

Usama has been active around THP, really enjoys working on THPs, and tries
to review all the patches that come in.

Let's invite him to the THP party as a reviewer to help with the ongoing
THP review load.

Link: https://lore.kernel.org/20260702140257.44780-1-lance.yang@linux.dev
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Barry Song <baohua@kernel.org>
Acked-by: SJ Park <sj@kernel.org>
Acked-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fat: avoid stack overflow warning

Building the fat kunit tests on with -fsanitize=alignment reveals some
rather excessive stack usage:

fs/fat/fat_test.c: In function 'fat_clus_to_blknr_test':
fs/fat/fat_test.c:33:1: error: the frame size of 4736 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
   33 | }
      | ^
fs/fat/fat_test.c: In function 'fat_get_blknr_offset_test':
fs/fat/fat_test.c:52:1: error: the frame size of 4800 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]

The problem is clearly related to the on-stack copy of a local
msdos_sb_info structure.  Avoid this by making that copy 'static const'
and changing the called functions to accept a constant input.

Link: https://lore.kernel.org/20260515204456.2692208-1-arnd@kernel.org
Fixes: 410002f8139c ("kunit: fat: test cluster and directory i_pos layout helpers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Adi Nata <adinata.softwareengineer@gmail.com>
Cc: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: validate ranges in damon_set_regions()

DAMON core logic assumes zero length regions don't exist.  However, a few
DAMON API callers including DAMON_SYSFS, DAMON_RECLAIM and DAMON_LRU_SORT
allow users to set empty monitoring target regions.  This could result in
WARN_ONCE() on CONFIG_DAMON_DEBUG_SANITY enabled kernel, and
divide-by-zero from damon_merge_two_regions().

For example, the WANR_ONCE() can be triggered like below.

    # grep DAMON_DEBUG_SANITY /boot/config-$(uname -r)
    # CONFIG_DAMON_DEBUG_SANITY=y
    # damo start
    # cd /sys/kernel/mm/damon/admin/kdamonds/0
    # echo 0 > contexts/0/targets/0/regions/0/start
    # echo 0 > contexts/0/targets/0/regions/0/end
    # echo commit > state
    # dmesg
    [....]
    [   73.705780] ------------[ cut here ]------------
    [   73.707552] start 0 >= end 0
    [   73.708452] WARNING: mm/damon/core.c:359 at damon_new_region+0x6e/0x80, CPU#1: kdamond.0/758
    [...]

All DAMON API callers eventually use damon_set_regions() to setup the
regions.  Add the validation logic in the function.

Link: https://lore.kernel.org/20260630035221.146458-1-sj@kernel.org
Fixes: 43b0536cb471 ("mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)")
Signed-off-by: SJ Park <sj@kernel.org>
Cc: Yang yingliang <yangyingliang@huawei.com>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

m68k: avoid -Wunused-but-set-parameter in clear_user_page()

The loop in clear_user_pages() iterates over all pages and calls
clear_user_page() for each of them.  During the loop "vaddr" is modified.
However on m68k clear_user() is a macro which does not use "vaddr".  The
compiler sees a variable which is modified but never used and emits a
warning for that:

include/linux/highmem.h: In function 'clear_user_pages':
include/linux/highmem.h:234:63: warning: parameter 'vaddr' set but not used [-Wunused-but-set-parameter=]
    static inline void clear_user_pages(void *addr, unsigned long vaddr,

Other architectures use an inline function for clear_user_page() which
avoids the warning.  This is not possible on m68k, as dlush_dcache_page()
is another macro which is not yet defined where clear_user_page() is
defined.  Including cacheflush_mm.h will trigger recursive and lots of
other issues.

So hide the warning with a cast to (void) instead.

While we are here, do the same for copy_user_page().

Link: https://lore.kernel.org/20260525-m68k-clear_user_page-v2-1-0c8981c6eca1@weissschuh.net
Fixes: 62a9f5a85b98 ("mm: introduce clear_pages() and clear_user_pages()")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/huge_memory: set PG_has_hwpoisoned only after new folio head is established

__split_folio_to_order() copies the hwpoison state onto each new sub-folio
while splitting a folio to a non-zero order.  It does so via

if (handle_hwpoison && page_range_has_hwpoisoned(new_head, new_nr_pages))
folio_set_has_hwpoisoned(new_folio);

*before* clear_compound_head(new_head)/prep_compound_page(new_head, ...)
turns @new_head from a tail page into a proper folio head.

PG_has_hwpoisoned is a FOLIO_SECOND_PAGE flag, so
folio_set_has_hwpoisoned() resolves to folio_flags(folio, 1).  With the
new compound_info-based page-flags layout, folio_flags() asserts the page
is not a tail:

VM_BUG_ON_PGFLAGS(page->compound_info & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags.f), page);

At the current call site @new_head still has the tail marker
(compound_info bit 0 set, PG_head clear), so on CONFIG_DEBUG_VM kernels
this hits:

  kernel BUG at include/linux/page-flags.h:354
  folio_flags+0x82
  folio_set_has_hwpoisoned
  __split_folio_to_order
  __split_unmapped_folio
  __folio_split
  truncate_inode_partial_folio  (shmem hole-punch / MADV_REMOVE)

Reproduced by syzkaller: hwpoison-inject a few subpages of a large shmem
folio, then MADV_REMOVE (fallocate punch hole) on the same range, which
splits the partial folio to a non-zero order.

memory_failure() tries to split the poisoned folio to order 0 first, but
that split is best-effort; when it fails the folio is left large with
PG_has_hwpoisoned set, the case fa5a06170036 added this hwpoison copying
for.

Move the folio_set_has_hwpoisoned() call to after
clear_compound_head()/prep_compound_page(), where @new_folio is a real
order-new_order head folio (handle_hwpoison implies new_order != 0, so a
second page always exists).  The flag still lands on the same struct page
(page[1] of the new folio); only the ordering relative to compound-head
setup changes, satisfying the FOLIO_SECOND_PAGE precondition.

Link: https://lore.kernel.org/20260701174235.3173401-1-riel@surriel.com
Fixes: fa5a06170036 ("mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order")
Signed-off-by: Rik van Riel <riel@surriel.com>
Assisted-by: Claude:claude-opus-4-8
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Tested-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/page_vma_mapped: fix device-private PMD handling

Commit 65edfda6f3f2 ("mm/rmap: extend rmap and migration support
device-private entries") introduced the concept of device-private PMD
entries, but did not correctly update the rmap walk code to account for
them.

As a result, when page_vma_mapped_walk() encounters device-private PMD
entries, it takes no action other than to acquire the PMD lock and exit.

However this is highly problematic for two reasons - firstly, device
private entries possess a PFN so check_pmd() needs to be called to ensure
an overlapping PFN range.

Secondly, and more importantly, if PVMW_MIGRATION is set the caller
assumes the returned entry is a migration entry, resulting in memory
corruption when the caller tries to interpret the device private entry as
such.

In addition, commit 146287290023 ("mm/huge_memory: implement
device-private THP splitting") allowed device private PMDs to be split
like THP mappings, but again did not update this code path.

As a result, we might race a PMD split prior to acquiring the PMD lock.

This patch addresses all of these issues by invoking check_pmd(), ensuring
PMVW_MIGRATION is not set and checks whether a split raced us we do for
PMD THP and migration entries.

Instead of checking for a subset of the cases after taking the pmd_lock(),
put device-private along with pmd_trans_huge() and
pmd_is_migration_entry(). Also remove thp_migration_supported() as it is
already guarded by pmd_is_migration_entry().

[akpm@linux-foundation.org: fix Raspberry Pi 1 build, per David]
Link: https://lore.kernel.org/20260630021540.17297-1-richard.weiyang@gmail.com
Fixes: 65edfda6f3f2 ("mm/rmap: extend rmap and migration support device-private entries")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Suggested-by: David Hildenbrand <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Tested-by: Klara Modin <klarasmodin@gmail.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: <stable@vger.kernel.org>q
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: s/SeongJae/SJ/

My legal and preferred first names are SeongJae and SJ, respectively.  I
was using the legal name for commits and tags, while using the preferred
name for conversations.  It sometimes confuses people including myself.
Consistently use the preferred name.

Together remove copyright notes on files.  Those are only confusing for
people who are not familiar with the law.  Meanwhile, we can infer the
information in a better way from git logs and public information.

Link: https://lore.kernel.org/20260630013820.143366-1-sj@kernel.org
Signed-off-by: SJ Park <sj@kernel.org>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

userfaultfd: prevent registration of special VMAs

Vova Tokarev says:

  userfaultfd allows registration on shadow stack VMAs.  With userfaultfd
  access, you can register on the shadow stack, discard a page ... and
  inject a page with chosen return addresses via UFFDIO_COPY.

Update vma_can_userfault() to reject VM_SHADOW_STACK.

While on it, also reject VM_SPECIAL so that if a driver would implement
vm_uffd_ops, it wouldn't be possible to register special VMAs with
userfaultfd.

Since VM_SPECIAL includes VM_DONTEXPAND which is set but hugetlb, exclude
hugetlb VMAs from the check for VM_SPECIAL.

Link: https://lore.kernel.org/20260618095017.2553004-1-rppt@kernel.org
Fixes: 54007f818206 ("mm: Introduce VM_SHADOW_STACK for shadow stack memory")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: vova tokarev <vladimirelitokarev@gmail.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

xen-blkfront: fix double completion of split requests on resume

When a block request is too large for a single ring entry and the
backend does not support indirect descriptors, blkfront splits it across
two ring requests. This only happens when the frontend runs on a
64K-page kernel (e.g. arm64): there, even a single-page request may not
fit in one ring slot and must be split. blkif_ring_get_request() is
called twice and both shadow slots (shadow[id] and shadow[extra_id])
point at the *same* struct request, linked through associated_id.

blkif_completion() collapses the pair on the normal completion path,
recycling the second slot and completing the request once. The
suspend/resume walk in blkfront_resume() does not: it visits every
shadow slot with ->request set and calls blk_mq_end_request() or
re-queues ->request. For an in-flight split request it therefore
processes the shared struct request twice on resume/migration -- a
double completion.

Skip the secondary slot of a split request in the resume walk so each
logical request is processed exactly once. The secondary slot is the
linked one (associated_id != NO_ASSOCIATED_ID) that carries no
scatter-gather list (num_sg == 0); the first slot always keeps the sg
list. The bug is only reachable on suspend/resume or live migration of
such a guest, so it has no local reproducer.

Fixes: 6cc568339047 ("xen/blkfront: Handle non-indirect grant with 64KB pages")
Assisted-by: 0sec:claude-opus-4-8
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://patch.msgid.link/20260709100853.7489-1-doruk@0sec.ai
Signed-off-by: Jens Axboe <axboe@kernel.dk>

tools/sched_ext: scx - Fix cmask_subset(), cmask_equal() and cmask_weight()

cmask_equal(), cmask_weight() and cmask_subset() bounded their word walks
with CMASK_NR_WORDS(nr_cids), which pads by one word and can't tell the last
word in use without @base. The walks could thus cover a slack word past the
active range, which cmask_reframe() leaves non-zero: a stale bit there gave
cmask_equal() a spurious mismatch, cmask_weight() an inflated count, and
cmask_subset() a spurious violation. cmask_subset() could also read
@b->bits[] one word past its allocation (within the arena's fault-recovered
range, so harmless), and deviated from the kernel scx_cmask_subset() by
failing any @a range that doesn't nest inside @b's even when the overhanging
bits are all clear.

Bound the cmask_equal() and cmask_weight() walks by the words the range
actually spans, with early returns for empty ranges. Rewrite cmask_subset()
to match the kernel semantics: scan @a's overhangs for set bits with
cmask_next_set() and walk the words of the range intersection.
cmask_subset() moves below cmask_next_set(), which it now uses. Padding bits
don't need masking as every cmask helper keeps them clear.

Fixes: a58e6b79b432 ("sched_ext: Add cmask, a base-windowed bitmap over cid space")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

sched_ext: Fix premature ops->priv publication in scx_alloc_and_add_sched()

scx_alloc_and_add_sched() publishes @sch through ops->priv before allocating
the cgroup path. If that allocation fails, the unwind path clears ops->priv
and frees @sch immediately. scx_prog_sched() callers can dereference
ops->priv from RCU context the moment it is set, so freeing without a grace
period can use-after-free a concurrent kfunc caller.

Move the publication below the cgroup path allocation so that every failure
path after publication frees @sch through kobject_put(), whose release path
defers the freeing by a grace period.

Fixes: 105dcd005be2 ("sched_ext: Introduce scx_prog_sched()")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

sched_ext: Record an error on errno-only sub-enable failure

scx_sub_enable_workfn() has several failure paths that only return an errno
(e.g. -ENOMEM from an allocation) and jump to err_disable without calling
scx_error(). scx_flush_disable_work() runs the disable, and thus ops.exit(),
only when an error has been recorded, so an errno-only failure leaves the
half-initialized sub-scheduler linked.

Record an error at the err_disable sink so every errno-only failure runs the
disable path.

Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

cpufreq: Make cpufreq_update_pressure() fall back to cpuinfo.max_freq

If arch_scale_freq_ref() is not defined for a given arch (like x86, for
example), cpufreq_update_pressure() will always set cpufreq_pressure to
zero for all CPUs in the system, which is generally problematic on
systems with asymmetric capacity [1].

However, in the absence of arch_scale_freq_ref(), it is reasonable
to assume that cpuinfo.max_freq is the maximum sustainable frequency
for the given cpufreq policy. Moreover, there are cases in which
arch_scale_freq_ref() would need to be defined to return essentially
the cpuinfo.max_freq value anyway (for example, intel_pstate on
hybrid platforms).

For the above reasons, update cpufreq_update_pressure() to fall back to
using cpuinfo.max_freq as the reference frequency if zero is returned by
arch_scale_freq_ref().

Fixes: 75d659317bb1 ("cpufreq: Add a cpufreq pressure feedback for the scheduler")
Link: https://lore.kernel.org/lkml/CAKfTPtBuRLfYNnR4w--cFZYZy-R8gaPEgVwCcaMmbCcJ2H-muQ@mail.gmail.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # cluster scheduling
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://patch.msgid.link/5086499.GXAFRqVoOG@rafael.j.wysocki

cpufreq: intel_pstate: Set non-turbo capacity to HWP_GUARANTEED_PERF()

Setting cpu->capacity_perf to cpu->pstate.max_pstate_physical in the
"no turbo" case is inconsistent with what happens elsewhere in the
driver and causes arch_scale_cpu_capacity() to be incorrect. It also
skews arch_scale_freq_capacity() which ends up differing from 1024 for
the guaranteed P-state.

Address that by setting capacity_perf to HWP_GUARANTEED_PERF() in the
"no turbo" case.

Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/12928972.O9o76ZdvQC@rafael.j.wysocki

Revert "io_uring: grab RCU read lock marking task run"

This reverts commit ed64f5c546b3d5e3a4840f6c055448ce90edf56c.

Since commit:

648790e09527 ("io_uring: restore RCU read section in io_req_local_work_add()")

io_ctx_mark_taskrun() is only ever called with the RCU read lock
already held, like previously. Hence's there's no need for this commit
anymore, which grabbed the RCU read lock inside io_ctx_mark_taskrun().

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: restore RCU read section in io_req_local_work_add()

The task-work refactor that moved io_req_local_work_add() out of
io_uring.c into the new io_uring/tw.c dropped the whole-body
guard(rcu)() that used to cover the function body.

For DEFER_TASKRUN rings the ring teardown still relies on that RCU read
section pairing with its grace period:

/* pairs with RCU read section in io_req_local_work_add() */
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
synchronize_rcu();
io_ring_ctx_free(ctx);

io_req_local_work_add() keeps dereferencing ctx after mpscq_push() has
published the request to the work list (ctx->cq_wait_nr, and
ctx->submitter_task in the final wake_up_state()), without holding a ctx
reference across that window. The RCU read section was the only thing
guaranteeing an in-flight adder had finished touching ctx before
io_ring_ctx_free() ran; synchronize_rcu() only waits for readers that
are actually inside an RCU read-side critical section. With the guard
gone the grace period no longer pairs with anything on the add side, so
ctx can be freed and reused while io_req_local_work_add() is still using
it.

Fixes: d46ab2c98aba ("io_uring: switch local task_work to a mpscq")
Signed-off-by: Woraphat Khiaodaeng <worapat.kd2@gmail.com>
Link: https://patch.msgid.link/20260709035100.2269-1-worapat.kd2@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

smb: client: mask server-provided mode to 07777 in modefromsid

When modefromsid is active, parse_dacl() applies the server-provided
sub_auth[2] value from the NFS mode SID to cf_mode without masking to
07777. Apply the correct masking, same as in the read path.

Fixes: e2f8fbfb8d09c ("cifs: get mode bits from special sid on stat")
Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Assisted-by: Kiro:claude-opus-4.6
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

bpf: Fix UAF in sock clone early bailouts

Similar to recent commit 9b51a6155d14 ("bpf,fork: wipe ->bpf_storage
before bailouts that access it"), sk_clone() performs an initial
shallow copy of the socket field ->sk_bpf_storage via sock_copy()
for the cloned socket newsk.

If sk_clone() bails out early (e.g. if sk_filter_charge() fails) prior
to calling bpf_sk_storage_clone(), newsk->sk_bpf_storage still points
to the parent socket's BPF local storage. When newsk is subsequently
freed via sk_free(), the deallocation path (__sk_destruct() ->
bpf_sk_storage_free()) destroys the parent socket's BPF local storage,
leading to a use-after-free (UAF) on the parent socket.

Fix this by resetting newsk->sk_bpf_storage to NULL immediately after
sock_copy() in sk_clone(), and remove the now redundant initialization
from bpf_sk_storage_clone().

Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Fixes: f12dd75959b0 ("bpf: net: Set sk_bpf_storage back to NULL for cloned sk")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260709025316.999913-1-mattbobrowski@google.com

ALSA: hda: MAINTAINERS: Fix missing cirrus* file reference

When the HDA source was reorganized some of the cirrus* files were moved
into a new 'side-codecs' subdirectory. But MAINTAINERS wasn't updated to
add a cirrus* file reference to cover these moved files.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260709122211.615785-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/cirrus_scodec: Make Kconfig visible if KUNIT

Make the Kconfig item for cirrus_scodec visible if CONFIG_KUNIT is
enabled. This is so that its KUnit test can be enabled by KUnit
scripts without requiring a large amount of irrelevant additional
components.

The general rule for KUNIT_ALL_TESTS is that it should only enable
tests for components that are already selected. However, the UML
environment does not support ACPI, which means the HDA codec drivers
that use cirrus_scodec cannot be selected. But cirrus_scodec does not
need ACPI.

By making the Kconfig option visible if CONFIG_KUNIT, the KUnit test
can be enabled with only the minimal set of functionality that is
required for cirrus_scodec.

This is still compliant with the KUNIT_ALL_TESTS rule "only tests
for enabled modules" because by default cirrus_scodec will only be
enabled if the drivers that use it are enabled. It must be
intentionally enabled to force it to be included for testing.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260709121224.614350-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'net-7.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter, Bluetooth and batman-adv.

  Current release - regressions:

   - bluetooth: fix using chan->conn as indication to no remote netdev

  Current release - new code bugs:

   - netfilter: cap to maximum number of expectation per master on
     updates

  Previous releases - regressions:

   - bluetooth:
      - fix UAF of hci_conn_params in add_device_complete
      - fix null ptr deref in hci_abort_conn()

   - igmp: remove multicast group from hash table on device destruction

   - batman-adv: prevent TVLV OOB check overflow

   - eth: mlx5/mlx5e:
      - fix off-by-one in single-FDB error rollback
      - skip peer flow cleanup when LAG seq is unavailable
      - fix crashes in dynamic per-channel stats and HV VHCA agent

   - eth: mana: Sync page pool RX frags for CPU

  Previous releases - always broken:

   - netfilter:
      - mark malformed IPv6 extension headers for hotdrop
      - terminate table name before find_table_lock()
      - ipvs: use parsed transport offset in TCP state lookup

   - sched: act_pedit: fix TOCTOU heap OOB write in tc offload

   - ethtool: rss: fix hfunc and input_xfrm parsing on big endian

   - ipv4/ipv6: fix UAF and memory leak in IGMP/MLD

   - tls: consume empty data records in tls_sw_read_sock()

   - eth:
      - octeontx2-af: fix VF bringup affecting PF promiscuous state
      - gue: validate REMCSUM private option length"

* tag 'net-7.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  macsec: don't read an unset MAC header in macsec_encrypt()
  dibs: loopback: validate offset and size in move_data()
  octeontx2-af: fix VF bringup affecting PF promiscuous state
  ethtool: rss: Fix hfunc and input_xfrm parsing on big endian
  net/mlx5: Fix L3 tunnel entropy refcount leak
  net: macb: drop in-flight Tx SKBs on close
  net: mana: Sync page pool RX frags for CPU
  net: mana: Validate the packet length reported by the NIC
  selftests/net: fix EVP_MD_CTX leak in tcp_mmap
  ipvs: ensure inner headers in ICMP errors are in headroom
  ipvs: use parsed transport offset in SCTP state lookup
  ipvs: use parsed transport offset in TCP state lookup
  ipvs: pass parsed transport offset to state handlers
  netfilter: handle unreadable frags
  netfilter: flowtable: support IPIP tunnel with direct xmit
  netfilter: flowtable: IPIP tunnel hardware offload is not yet support
  netfilter: flowtable: use dst in this direction when pushing IPIP header
  netfilter: ipset: allocate the proper memory for the generic hash structure
  netfilter: ipset: cleanup the add/del backlog when resize failed
  netfilter: ipset: exclude gc when resize is in progress
  ...

crypto: aes - Fix conditions for selecting MAC dependencies

Starting in commit 7137cbf2b5c9 ("crypto: aes - Add cmac, xcbc, and
cbcmac algorithms using library"), the aes module (CRYPTO_AES) supports
CBC based MACs using the corresponding library functions.

To avoid including unneeded functionality, that support honors the
existing CRYPTO_CMAC, CRYPTO_XCBC, and CRYPTO_CCM kconfig options.  The
dependencies are selected if at least one of those is enabled.

However, the select statements don't correctly handle the case where
CRYPTO_AES=y and (for example) CRYPTO_CMAC=m.  In that case the
dependencies get selected at level 'm', due to how the kconfig language
works.  That causes a linker error.

Fix this by changing the selection conditions to use '!= n'.

A similar issue also exists for CRYPTO_LIB_AES's conditional selection
of CRYPTO_LIB_UTILS.  The same '!= n' would work, but instead just make
CRYPTO_LIB_AES always select CRYPTO_LIB_UTILS.  CRYPTO_LIB_UTILS is
lightweight, and it's needed by most AES modes and many other things.

Fixes: 7137cbf2b5c9 ("crypto: aes - Add cmac, xcbc, and cbcmac algorithms using library")
Fixes: 309a7e514da7 ("lib/crypto: aes: Add support for CBC-based MACs")
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260709022954.45113-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: docs: Improve introduction sentence

Make it clear that lib/crypto/ is a kernel-internal library. It's easy
for people to come across this page, especially the HTML version online,
without that context.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://patch.msgid.link/20260709022747.44635-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

lib/crypto: docs: Fix some sentence fragments

Currently, the section about the library API for each algorithm begins
with a noun phrase that was intended to serve as an elaboration on the
title. It's better to use complete sentences.

Suggested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Link: https://patch.msgid.link/20260709022651.44216-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

KVM: s390: pci: Fix handling of AIF enable without AISB

When a guest seeks to register IRQs without a summary bit specified,
ensure that the associated GAITE then stores 0 for the guest AISB
location instead of virt_to_phys(page_address(NULL)).

Fixes: 3c5a1b6f0a18 ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding")
Cc: stable@vger.kernel.org
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

drm/imagination: fix error checking of pvr_vm_context_lookup()

Since pvr_vm_context_lookup() returns either NULL or a pointer, then stop
using IS_ERR() for checking the return value.

Using IS_ERR() leads to the kernel oops reported below. It can be
reproduced by passing an invalid VM context handle from userspace to the
DRM_IOCTL_PVR_CREATE_CONTEXT ioctl.

[   92.733119] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000148
[   92.742042] Mem abort info:
[   92.744890]   ESR = 0x0000000096000004
[   92.748686]   EC = 0x25: DABT (current EL), IL = 32 bits
[   92.754020]   SET = 0, FnV = 0
[   92.757154]   EA = 0, S1PTW = 0
[   92.760337]   FSC = 0x04: level 0 translation fault
[   92.765243] Data abort info:
[   92.768129]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[   92.773626]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   92.778763]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   92.784098] user pgtable: 4k pages, 48-bit VAs, pgdp=000000088ed23000
[   92.790550] [0000000000000148] pgd=0000000000000000, p4d=0000000000000000
[   92.797381] Internal error: Oops: 0000000096000004 [#1]  SMP
[   92.803027] Modules linked in: powervr
[   92.852533] CPU: 0 UID: 0 PID: 409 Comm: triangle Not tainted 7.1.0-rc5-g98b46e693b91 #1 PREEMPT
[   92.861385] Hardware name: Texas Instruments AM68 SK (DT)
[   92.866766] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   92.873709] pc : pvr_vm_get_fw_mem_context+0x0/0xc [powervr]
[   92.879376] lr : pvr_queue_create+0x26c/0x440 [powervr]
[   92.884595] sp : ffff8000837fbb00
[   92.887895] x29: ffff8000837fbb60 x28: 0000000000000000 x27: ffff8000837fbce8
[   92.895015] x26: ffff000807f61a40 x25: ffff000807f61a00 x24: ffff000807f64400
[   92.902135] x23: ffff00080a5ab000 x22: ffff800079b24730 x21: ffff000807f61800
[   92.909254] x20: ffff00080999e680 x19: 0000000000000000 x18: 0000000000000000
[   92.916373] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001
[   92.923492] x14: 0000000000000000 x13: 0000000000000002 x12: ffff80008145b298
[   92.930611] x11: ffff8000844e5000 x10: ffff80008165a130 x9 : 0000000000000100
[   92.937730] x8 : 0000000000000001 x7 : ffff0008076b27e0 x6 : ffff00080ec43b7c
[   92.944850] x5 : ffff00080ec43b78 x4 : 0000000000000000 x3 : ffff00080999e680
[   92.951968] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[   92.959088] Call trace:
[   92.961521]  pvr_vm_get_fw_mem_context+0x0/0xc [powervr] (P)
[   92.967173]  pvr_context_create+0x190/0x410 [powervr]
[   92.972218]  pvr_ioctl_create_context+0x44/0x8c [powervr]
[   92.977608]  drm_ioctl_kernel+0xbc/0x124 [drm]
[   92.982127]  drm_ioctl+0x1f8/0x4dc [drm]
[   92.986098]  __arm64_sys_ioctl+0xac/0x104
[   92.990102]  invoke_syscall+0x54/0x10c
[   92.993842]  el0_svc_common.constprop.0+0x40/0xe0
[   92.998532]  do_el0_svc+0x1c/0x28
[   93.001835]  el0_svc+0x38/0x11c
[   93.004969]  el0t_64_sync_handler+0xa0/0xe4
[   93.009139]  el0t_64_sync+0x198/0x19c
[   93.012792] Code: aa1703e0 d2800014 95cb0ba4 17ffffe8 (f940a400)
[   93.018869] ---[ end trace 0000000000000000 ]---

Fixes: d2d79d29bb98 ("drm/imagination: Implement context creation/destruction ioctls")
Cc: stable@vger.kernel.org
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Alessio Belle <alessio.belle@imgtec.com>
Link: https://patch.msgid.link/20260707-staging-ddkopsrc-2435-v1-1-24e160d44476@imgtec.com
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>

drm/imagination: make pvr_fw_trace_init_mask_ops static

The pvr_fw_trace_init_mask_ops is not used outside pvr_fw_trace.c
so make it static to avoid the following sparse warning:

drivers/gpu/drm/imagination/pvr_fw_trace.c:74:31: warning: symbol 'pvr_fw_trace_init_mask_ops' was not declared. Should it be static?

Fixes: c6978643ea1c ("drm/imagination: Validate fw trace group_mask")
Reviewed-by: Alessio Belle <alessio.belle@imgtec.com>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://patch.msgid.link/20260703162338.2848039-1-ben.dooks@codethink.co.uk
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>

firmware: arm_scmi: Rate-limit queue-full warnings in IRQ context

The scmi_notify() function is called from interrupt context to queue
received notification events onto a per-protocol kfifo. When the kfifo
is full, it logs a warning via dev_warn() for every dropped event.

Under conditions where the platform sends a burst of SCMI notifications
faster than the deferred worker can drain the queue, this results in a
flood of dev_warn() calls from IRQ context. Each call acquires the
console lock and may execute blocking console writes, causing the CPU
to be held in interrupt context for an extended period and leading to
observable system stalls.

Fix this by switching to dev_warn_ratelimited() to limit the frequency
of log messages when the notification queue is full. This reduces
console overhead in interrupt context and prevents CPU stalls caused by
excessive logging, while still preserving diagnostic visibility.

Fixes: bd31b249692e ("firmware: arm_scmi: Add notification dispatch and delivery")
Signed-off-by: Pushpendra Singh <pushpendra.singh@oss.qualcomm.com>
Link: https://patch.msgid.link/20260708072339.3021140-1-pushpendra.singh@oss.qualcomm.com
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>

firmware: arm_scmi: Use 64-bit division for clock rate rounding

SCMI clock range descriptors report rates as 64-bit values. When handling
a range clock, scmi_clock_determine_rate() rounds the requested rate up to
the next supported step using the SCMI RATE_STEP value.

The current code uses div64_ul() for this calculation. Since div64_ul()
takes an unsigned long divisor, the 64-bit RATE_STEP value can be truncated
on 32-bit builds. In the worst case, a non-zero 64-bit step can be narrowed
to zero before the division.

Store RATE_STEP in a u64, reject a malformed zero step, and use
DIV64_U64_ROUND_UP() so the divisor is handled as a 64-bit value.

This does not change behavior for valid firmware reporting a non-zero step
that fits in unsigned long.

Tested on Xunlong Orange Pi 5 Plus / RK3588 with SCMI over SMC. SCMI
clocks probed successfully before and after the change. SCMI-backed CPU
clocks were exercised through cpufreq-dt by switching each CPU policy
between its lowest and highest available OPP.

Fixes: ecde921eb460 ("firmware: arm_scmi: Add clock determine_rate operation")
Signed-off-by: Steve Dunnagan <sdunnaga@redhat.com>
Link: https://patch.msgid.link/20260701195923.444270-1-sdunnaga@redhat.com
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>

gpu/buddy: bail out of try_harder when alignment cannot be honoured

The try_harder contiguous fallback could return a range whose start
offset did not match the caller's min_block_size. When a candidate's
start is misaligned, realign it: free the misaligned run and reallocate
exactly @size at the next lower min_block_size boundary. This keeps the
returned size unchanged with no surplus to trim, and rejects the request
only when no aligned candidate fits.

v2: align misaligned candidates down to min_block_size instead of
bailing out, for both the RHS and LHS paths (Matthew).

Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Timur Kristóf <timur.kristof@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Tested-by: John Olender <john.olender@gmail.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260709131050.1022759-1-Arunpravin.PaneerSelvam@amd.com

cifs: Show reason why autodisabling serverino support

Extend cifs_autodisable_serverino() function to print also text message why
the function was called.

The text message is printed just once for mount then autodisabling
serverino support. Once the serverino support is disabled for mount it will
not be re-enabled. So those text messages do not cause flooding logs.

This change allows to debug issues why cifs.ko decide to turn off server
inode number support and hence disable support for detection of hardlinks.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: fix incorrect nlink returned by fstat()

Reproducer:

  1. mount -t cifs //${server_ip}/export /mnt
  2. touch /mnt/file1; ln /mnt/file1 /mnt/file2; ln /mnt/file1 /mnt/file3
  3. C program: int fd = open("/mnt/file1", O_RDONLY);
  4. C program: struct stat stbuf; fstat(fd, &stbuf);
                stbuf.st_nlink is always 1, should be 3

Setting `unknown_nlink` to true in `SMB2_open()` triggers the
`CIFS_FATTR_UNKNOWN_NLINK` flag in `cifs_open_info_to_fattr()`,
which safely preserves the existing i_nlink in
`cifs_nlink_fattr_to_inode()`.

See the detailed procedure below:

  path_openat
    open_last_lookups
      lookup_open
        atomic_open
          cifs_atomic_open // dir->i_op->atomic_open
            cifs_lookup
              cifs_get_inode_info
                cifs_get_fattr
                  smb2_query_path_info // server->ops->query_path_info
                    smb2_compound_op
                      SMB2_open_init
                      case SMB2_OP_QUERY_INFO
                      SMB2_query_info_init(FILE_ALL_INFORMATION,)
                  cifs_open_info_to_fattr
                    fattr->cf_nlink = le32_to_cpu(info->NumberOfLinks)
                update_inode_info
                  cifs_iget
                    cifs_fattr_to_inode
                      cifs_nlink_fattr_to_inode
                        set_nlink(inode, fattr->cf_nlink)
    do_open
      vfs_open
        do_dentry_open
          cifs_open
            cifs_nt_open
              smb2_open_file // server->ops->open
                SMB2_open
                  buf->unknown_nlink = true
              cifs_get_inode_info
                cifs_get_fattr
                  cifs_open_info_to_fattr
                    if (data->unknown_nlink) // true
                    fattr->cf_flags |= CIFS_FATTR_UNKNOWN_NLINK
                update_inode_info
                  cifs_fattr_to_inode
                    cifs_nlink_fattr_to_inode
                      if (fattr->cf_flags & CIFS_FATTR_UNKNOWN_NLINK) // true
                      return // do not modify nlink

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: zero-initialize stack-allocated cifs_open_info_data

Stack-allocated cifs_open_info_data may contain random data.
This can make some fields have wrong value if they are not set later.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: pass cifs_open_info_data to SMB2_open()

Let SMB2_open() fill the smb2_file_all_info embedded in cifs_open_info_data
directly. This removes the temporary smb2_file_all_info copy in
smb2_open_file().

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: use stack-allocated smb2_file_all_info in smb3_query_mf_symlink()

SMB2_open() only fills the fixed fields, so a stack-allocated
smb2_file_all_info is sufficient here.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix overflow in passthrough ioctl bounds check

smb2_ioctl_query_info() validates the PASSTHRU_FSCTL response payload
before copying it to userspace.

The payload offset and length both come from 32-bit fields. The bounds
check currently adds OutputOffset and qi.input_buffer_length directly, so
the addition can wrap in 32-bit arithmetic before the result is compared
against the response buffer length.

A malicious server can use a large OutputOffset and a small OutputCount
to make the wrapped sum pass the bounds check. The later copy_to_user()
then reads from io_rsp + OutputOffset, outside the response buffer.

Use size_add() for the offset plus length check so overflow is treated as
out of bounds.

Fixes: 2b1116bbe898 ("CIFS: Use common error handling code in smb2_ioctl_query_info()")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

macsec: don't read an unset MAC header in macsec_encrypt()

macsec_encrypt() reads the Ethernet header via eth_hdr(skb)
(skb->head + skb->mac_header) to memmove() the 12 source/destination MAC
bytes forward and make room for the SecTAG.

On the AF_PACKET SOCK_RAW + PACKET_QDISC_BYPASS transmit path the skb
reaches the macsec ndo_start_xmit() with the MAC header unset, so
eth_hdr(skb) resolves to skb->head + (u16)~0 and the read is out of
bounds: a 12-byte heap over-read that is also emitted on the wire as the
frame's outer source/destination MAC. KASAN reports a slab-out-of-bounds
read in macsec_start_xmit() on 6.0; on current mainline a CONFIG_DEBUG_NET
build flags it as an unset mac header in skb_mac_header().

On the TX path the L2 header is at skb->data, so use skb_eth_hdr(), added
by commit 96cc4b69581d ("macvlan: do not assume mac_header is set in
macvlan_broadcast()") for exactly this purpose.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Cc: stable@vger.kernel.org
Signed-off-by: Daehyeon Ko <4ncienth@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260703083634.2035145-1-4ncienth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dibs: loopback: validate offset and size in move_data()

The loopback move_data() performs a memcpy into the registered DMB
without checking whether offset + size exceeds the DMB length. Unlike
real ISM hardware, which enforces memory region bounds natively, the
software loopback has no such protection.

A peer-supplied out-of-bounds offset or oversized write would result in
an OOB write past the allocated kernel buffer. Add an explicit bounds
check before the memcpy to reject such requests with -EINVAL.

Fixes: f7a22071dbf3 ("net/smc: implement DMB-related operations of loopback-ism")
Cc: stable@vger.kernel.org
Reported-by: Federico Kirschbaum <federico.kirschbaum@xbow.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reported-by: Baul Lee <baul.lee@xbow.com>
Link: https://patch.msgid.link/20260707074318.1448662-1-dust.li@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/xe/userptr: Stub notifier_lock helpers when DRM_GPUSVM=n

When CONFIG_DRM_GPUSVM=n (e.g. um-allyesconfig), the only caller of
xe_pt_svm_userptr_notifier_lock() is compiled out, triggering:

  drivers/gpu/drm/xe/xe_pt.c:1418:13: warning:
    'xe_pt_svm_userptr_notifier_lock' defined but not used
    [-Wunused-function]

The helpers cannot simply be removed in this case: the matching
xe_pt_svm_userptr_notifier_unlock() is also referenced from
xe_pt_update_ops_run(), which lives outside any DRM_GPUSVM ifdef and is
gated only at runtime by pt_update_ops->needs_svm_lock. The symbol must
exist in all builds.

Provide empty static inline stubs for !DRM_GPUSVM, matching the pattern
used by xe_svm_notifier_lock()/_unlock() in xe_svm.h.

Fixes: dca6e08c923a ("drm/xe/userptr: Hold notifier_lock for write on inject test path")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606302210.QqcLbOEN-lkp@intel.com/
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260630192221.2998168-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 3359422bf0a1140e96d783a19a397686e580a3ca)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: free madvise VMA array on L2 flush failure

xe_vm_madvise_ioctl() allocates madvise_range.vmas in get_vmas().
After get_vmas() succeeds with at least one VMA, error paths must go
through free_vmas so the array is released before the madvise details are
destroyed.

The L2 flush validation path added for PAT madvise rejects some
SVM/userptr ranges after get_vmas() has succeeded, but jumps directly to
madv_fini. This skips kfree(madvise_range.vmas), leaking the VMA array on
each failed ioctl.

Jump to free_vmas instead, matching the other validation failure paths
after get_vmas() has succeeded.

Fixes: 4f39a194d41e ("drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260708073422.725186-1-lgs201920130244@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit c3a1c3579b1250060da73507a4acef712974c78a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: remove duplicate <kunit/test-bug.h> include

xe_pci.c includes <kunit/test-bug.h> twice, separated only by the
<kunit/test.h> include. Drop the redundant second include; this is a
non-functional cleanup flagged by scripts/checkincludes.pl.

Fixes: 6cad22853cb8 ("drm/xe/kunit: Add stub to read_gmdid")
Signed-off-by: Anas Khan <anxkhn28@gmail.com>
Link: https://patch.msgid.link/20260702112820.34675-1-anxkhn28@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 84ed5b0a925721aaf069d36e18a99db966ff4e80)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Wait on external BO kernel fences in exec IOCTL

Before arming a user job, xe_exec_ioctl() only added the VM's
dma-resv KERNEL slot as a dependency. That slot covers rebinds and
the kernel operations of the VM's private BOs, but not external BOs
(bo->vm == NULL), which carry their kernel operations (evictions,
moves, ...) in their own dma-resv KERNEL slot.

The DMA_RESV_USAGE_KERNEL slot is the cross-driver contract for
memory management operations that must complete before the BO or its
backing store may be used: any accessor is required to wait on the
KERNEL fences before touching the resv. By skipping the external BOs'
KERNEL slots, the exec path violated that contract and could schedule
a user job while a kernel operation on an external BO mapped by the VM
was still in flight, racing against it and potentially reading or
writing memory that was being moved.

Replace the VM-only dependency with an iteration over every object
locked by the exec, adding each object's KERNEL slot as a job
dependency. This covers the VM resv (rebinds and private BOs) as well
as every external BO, mirroring the drm_gpuvm_resv_add_fence() call
that later publishes the job fence to the same set of objects.
Long-running mode continues to skip this, as before.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Assisted-by: GitHub_Copilot:claude-opus-4.8
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260702215805.4011228-1-matthew.brost@intel.com
(cherry picked from commit a6b842acf3ddd1efc53a56de9260cfa718fb35e7)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Fix PTE index in xe_vm_populate_pgtable() for chunked binds

xe_vm_populate_pgtable() indexed the source PTE array (update->pt_entries)
by the per-call loop counter, assuming each call starts at the first entry
of the update. That holds for the CPU bind path
(xe_migrate_update_pgtables_cpu), which populates a whole update in a single
call, but not for the GPU bind path: write_pgtable() splits an update into
MAX_PTE_PER_SDI (510) sized MI_STORE_DATA_IMM chunks, invoking the populate
callback once per chunk with an advancing qword_ofs but a fresh command-
buffer destination pointer.

As a result, every chunk after the first re-read pt_entries from index 0
instead of from its true offset, so PTEs beyond the first 510 entries of a
single update were programmed with the wrong physical pages, shifting the
mapping by exactly MAX_PTE_PER_SDI pages.

This stayed latent because a single update only exceeds 510 qwords when a
large (e.g. 2M) region is bound as individual 4K PTEs rather than a single
huge-page entry, which happens when the backing store is sufficiently
fragmented. It was surfaced by the BO defrag path, which deliberately
rebinds such fragmented ranges via the GPU bind path, producing
deterministic data corruption offset by 510 pages.

Index pt_entries by the chunk's absolute offset relative to update->ofs so
both the CPU and GPU paths pick the correct entries.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Assisted-by: GitHub_Copilot:claude-opus-4.8
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260702012434.3861171-1-matthew.brost@intel.com
(cherry picked from commit e6f2d0b757c4fb577a513c577140109d1d292a9a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

octeontx2-af: fix VF bringup affecting PF promiscuous state

Mbox handling of nix_set_rx_mode for a VF with promiscuous and
all_multi flags set to false causes deletion of the PF's promiscuous
and allmulti MCAM rules. This occurs because the APIs that
enable/disable these rules operate only on the PF, even when the
mbox request is made via a VF interface.

Guard both rvu_npc_enable_allmulti_entry() and
rvu_npc_enable_promisc_entry() disable paths with an is_vf() check so
that a VF bringing up or tearing down its interface cannot inadvertently
clear the PF's MCAM rules.

Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature")
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Nitin Shetty J <nshettyj@marvell.com>
Link: https://patch.msgid.link/20260702045616.3002773-2-nshettyj@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-26-07-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*.

Most of these are LLM fixes for old issues flagged by sashiko/LLMs.

Many of these trigger drive-by-findings in sashiko. In particular:

- many load/store tearing and missing memory barriers, races
  etc. in ipset, esp. with GC and resizing.
  Keeping the proposed patches spinning for yet-another-iteration
  keeps legit fixes back, so I prefer to add these now and follow
  up with other reports later.
- flowtable work queue still has possible races with teardown,
  but same rationale as with ipset: drive-by findings, not
  problems coming with the flowtable IPIP changeset in this PR.
- ever since unreadable frag skb support was added in 6.12, we can no
  longer do: BUG_ON(skb_copy_bits( ...): it will fire with such skbs.
  Mina Almasry is looking at similar patterns elsewhere in the stack.

1) Guard skb->mac_header adjustment after IPv6 defragmentation in
nf_conntrack_reasm.  From Xiang Mei.

2) NUL-terminate ebtables table names before calling find_table_lock() to
prevent stack-out-of-bounds reads.  Also from Xiang Mei.

3) Zero the ebtables chainstack array, else error unwind may free bogus
pointer when CPU mask is sparse.  All three issues date from 2.6 days.

4) Ensure ebtables module names are c-strings, same bug pattern as 2).
Bug added in 4.6.

5) Fix catchall element handling for inverted lookups in nft_lookup. Fold the
catchall lookup into ext before computing the match status.  Was like
this ever since catchall elements got introduced in 5.13.
From Tamaki Yanagawa.

6-9) ipset updates from Jozsef Kadlecsik:
- mark rcu protected areas correctly
- address gc and resize clash in the comment extension
- add/del backlog cleanup in the error path
- allocate right size for the generic hash structure

10-12): IPIP flowtable updates from Pablo Neira Ayuso:
- Use the current direction's route when pushing IPIP headers
   Fix incorrect headroom and fragmentation offset calculations.
- Avoid hardware offload for IPIP tunnels due to lack of driver support.
- Support IPIP tunnels with direct xmit in netfilter flowtable.
   dst_cache and dst_cookie are moved outside the union to share route
   state across flows.  This is a followup to work done in 6.19 cycle.

13) Don't BUG() on skb_copy_bits error. Handle unreadable fragments by
either returning an error or restricting the copy operations to linear area,
This became an issue when unreable frag support was merged in 6.12.

14-16): IPVS updates from Yizhou Zhao:
- Pass parsed transport offset to IPVS state handlers.
   update callback signatures.
- use correct transport header offset on state lookp in TCP.
   As-is it was possible for ipv6 extension header data to be
   treated as L4 header.
- same for SCTP.  This was also broken since 2.6 days.

17) Ensure inner IP headers in ICMP errors are in the skb headroom after
stripping outer headers. Add more checks for the length of inner headers.
This was broken since 3.7 days.
From Julian Anastasov.

netfilter pull request nf-26-07-08

* tag 'nf-26-07-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  ipvs: ensure inner headers in ICMP errors are in headroom
  ipvs: use parsed transport offset in SCTP state lookup
  ipvs: use parsed transport offset in TCP state lookup
  ipvs: pass parsed transport offset to state handlers
  netfilter: handle unreadable frags
  netfilter: flowtable: support IPIP tunnel with direct xmit
  netfilter: flowtable: IPIP tunnel hardware offload is not yet support
  netfilter: flowtable: use dst in this direction when pushing IPIP header
  netfilter: ipset: allocate the proper memory for the generic hash structure
  netfilter: ipset: cleanup the add/del backlog when resize failed
  netfilter: ipset: exclude gc when resize is in progress
  netfilter: ipset: mark the rcu locked areas properly
  netfilter: nft_lookup: fix catchall element handling with inverted lookups
  netfilter: ebtables: module names must be null-terminated
  netfilter: ebtables: zero chainstack array
  netfilter: ebtables: terminate table name before find_table_lock()
  netfilter: nf_conntrack_reasm: guard mac_header adjustment after IPv6 defrag
====================

Link: https://patch.msgid.link/20260708140309.19633-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ethtool: rss: Fix hfunc and input_xfrm parsing on big endian

ETHTOOL_A_RSS_HFUNC and ETHTOOL_A_RSS_INPUT_XFRM are NLA_U32 attributes,
but ethnl_rss_set() and ethnl_rss_create_doit() parse them with
ethnl_update_u8(), which reads a single byte.

On little endian this happens to read the least significant byte and
works as long as the value fits in a byte. On big endian it reads the
most significant byte, so the requested value is parsed incorrectly.

The destination fields in struct ethtool_rxfh_param are u8, so the
attribute can't be read directly with ethnl_update_u32().
Cap the hfunc policy at U8_MAX so an out of range value is rejected
instead of being silently truncated into the u8 field, and add
ethnl_update_u8_u32() to read the full u32 and narrow it into the u8
destination.

Fixes: 82ae67cbc423 ("ethtool: rss: support setting hfunc via Netlink")
Fixes: d3e2c7bab124 ("ethtool: rss: support setting input-xfrm via Netlink")
Fixes: a166ab7816c5 ("ethtool: rss: support creating contexts via Netlink")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260706055017.3355806-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/mlx5: Fix L3 tunnel entropy refcount leak

mlx5_tun_entropy_refcount_inc() counts both VXLAN and L2-to-L3
tunnel reformat entries as entropy-enabling users. The matching
decrement path only handled VXLAN, leaving L2-to-L3 tunnel entries
counted after release.

Handle MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL in
mlx5_tun_entropy_refcount_dec() as well so the enabling entry
refcount remains balanced.

Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260703141423.1723-1-lirongqing@baidu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ALSA: hda/realtek: Add quirk for TongFang X6xx45xU

Fix microphone detection on built in headphone jack for some devices.

Signed-off-by: Eckhart Mohr <e.mohr@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Link: https://patch.msgid.link/20260708132135.102680-1-wse@tuxedocomputers.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/realtek - Fixed Headphone noise issue for Dell QCM1255

This platform booted with Ubuntu 24.04 with Pipewire audio server. So,
it has pop noise with headphone. But it's normal with Pulseaudio server.
This patch was the workaround. Connect the headphones to DAC 0x2.
The popping sound will disappear.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/34b990cb56914148ba02fa8e9d176479@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'batadv-net-pullrequest-20260708' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes, all by Sven Eckelmann:

- ensure minimal ethernet header on TX

- fix VLAN priority offset

- clean untagged VLAN on netdev registration failure

- tt: avoid request storms during pending request

- tt: prevent TVLV OOB check overflow

- frag: free unfragmentable packet

- frag: fix primary_if leak on failed linearization

- mcast: avoid OOB read of num_dests header

- dat: fix tie-break for candidate selection

* tag 'batadv-net-pullrequest-20260708' of https://git.open-mesh.org/batadv:
  batman-adv: dat: fix tie-break for candidate selection
  batman-adv: mcast: avoid OOB read of num_dests header
  batman-adv: frag: fix primary_if leak on failed linearization
  batman-adv: frag: free unfragmentable packet
  batman-adv: tt: prevent TVLV OOB check overflow
  batman-adv: tt: avoid request storms during pending request
  batman-adv: clean untagged VLAN on netdev registration failure
  batman-adv: fix VLAN priority offset
  batman-adv: ensure minimal ethernet header on TX
====================

Link: https://patch.msgid.link/20260708091821.314516-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: drop in-flight Tx SKBs on close

The MACB driver has since forever leaked the outgoing SKBs that
have not yet been marked as completed. They live in queue->tx_skb
which gets freed without remorse nor checking.

macb_free_consistent() gets called in a few codepaths, but only close will
trigger the added expressions. In macb_open() and macb_alloc_consistent()
failure cases, queues' tx_skb just got allocated and are empty.

Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
Cc: stable@vger.kernel.org
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/20260702-macb-drop-tx-v4-1-1c833eebdbc8@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'fix-mana-rx-with-bounce-buffering'

Dexuan Cui says:

====================
Fix MANA RX with bounce buffering

With swiotlb=force, the MANA NIC fails to work properly due to commit
730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead
of full pages to improve memory efficiency.").

This happens because, with the standard MTU=1500, the aforementioned
commit uses page pool frags with PP_FLAG_DMA_MAP, but fails to call
page_pool_dma_sync_for_cpu() to sync the received packet for CPU acces
before handing the RX buffer to the stack.

Here patch #2 adds the required page_pool_dma_sync_for_cpu().

Patch #1 validates the packet length reported by the NIC. With patch #2,
page_pool_dma_sync_for_cpu() uses the packet length, so we don't want
to blindly trust the packet length, just in case.

There is no change between v2 and v3.
v3 just swaps the order of the 2 patches in v2, as suggested by Simon [3].

References:
[1] v1: https://lore.kernel.org/netdev/20260618035029.249361-1-decui@microsoft.com/
[2] v2: https://lore.kernel.org/netdev/20260624222605.1794719-1-decui@microsoft.com/
[3] https://lore.kernel.org/netdev/20260626145048.GB1310988@horms.kernel.org/
====================

Link: https://patch.msgid.link/20260702041237.617719-1-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Sync page pool RX frags for CPU

MANA allocates RX buffers from page pool fragments when frag_count is
greater than 1. In that case the buffers remain DMA mapped by page pool
and the RX completion path does not call dma_unmap_single(). As a result,
the implicit sync-for-CPU normally performed by dma_unmap_single() is
missing before the packet data is passed to the networking stack.

This breaks RX on configurations which require explicit DMA syncing, for
example when booted with swiotlb=force.

Fix this by recording the page pool page and DMA sync offset when the RX
buffer is allocated, and syncing the received packet range for CPU access
before handing the RX buffer to the stack.

Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.")
Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Link: https://patch.msgid.link/20260702041237.617719-3-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mana: Validate the packet length reported by the NIC

Validate the packet length reported in the RX CQE before passing it
to skb processing. The CQE is supplied by the NIC device and should
not be blindly trusted.

Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Link: https://patch.msgid.link/20260702041237.617719-2-decui@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests/net: fix EVP_MD_CTX leak in tcp_mmap

In tcp_mmap.c, both child_thread() and main() allocate an EVP_MD_CTX
via EVP_MD_CTX_new() when integrity checking is enabled, but neither
function releases the context. child_thread() misses the free in its
common cleanup block, and main() returns without freeing the context.

This results in a SHA256 context leak on every run that uses the
‑i (integrity) option. Add the missing EVP_MD_CTX_free() calls to
the appropriate cleanup paths to fix the leak.

Fixes: 5c5945dc695c ("selftests/net: Add SHA256 computation over data sent in tcp_mmap")
Signed-off-by: Wang Yan <wangyan01@kylinos.cn>
Link: https://patch.msgid.link/20260702025949.442523-1-wangyan01@kylinos.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

KVM: s390: Improve kvm_s390_vm_stop_migration()

There is no need to clear cmma-dirty state if the VM is not using CMMA.

Skip the CMMA-related code if CMMA is not in use.

Fixes: 6cfd47f91f6a ("KVM: s390: Fix cmma dirty tracking")
Fixes: 190df4a212a7 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

KVM: s390: Fix dat_crste_walk_range() early return

If a walk entry handler for a lower level returns a value,
dat_crste_walk_range() will not return immediately, but instead loop
again and move to the next entry.

This means that some entries are potentially skipped, and early return
is ignored. Skipped entries might lead to all kinds of issues, given
that the caller expects them to not be skipped. Early return is often
used to interrupt a walk when a rescheduling is needed; if it is
ignored it can lead to stalls.

Fix by breaking from the loop immediately if the walk to a lower level
returned non-zero.

Fixes: 2db149a0a6c5 ("KVM: s390: KVM page table management functions: walks")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

KVM: s390: vsie: Avoid potential deadlock with real spaces

The natural lock ordering is mmu_lock -> children_lock, but in
gmap_create_shadow() the reverse order is used when handling shadowing
of real address spaces.

Convert the inner locking of kvm->mmu_lock to a trylock; return -EAGAIN
if the lock is busy, and let the caller try again.

This path is not expected to happen in real-life scenarios, so its
performance is not important.

Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

KVM: s390: pci: Fix GISC refcount leak on AIF enable failure

kvm_s390_gisc_register() registers the guest ISC before pinning
the guest interrupt forwarding pages and allocating the AISB bit.
If any of the later setup steps fails, the function unwinds the
pinned pages and other local state, but does not unregister the
GISC reference. Add the missing kvm_s390_gisc_unregister() to the
error unwind path.

Fixes: 3c5a1b6f0a18 ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260624061910.2794734-1-haoxiang_li2024@163.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>

drm/fb-helper: Only consider active CRTCs for vblank sync

Only synchronize fbdev output to the vblank of an active CRTC. Go over
the list of CRTCs and pick the first that matches. Fixes warnings as
the one shown below

[ 77.201354] WARNING: drivers/gpu/drm/drm_vblank.c:1320 at drm_crtc_wait_one_vblank+0x194/0x1cc [drm], CPU#1: kworker/1:7/1867
[ 77.201354] omapdrm omapdrm.0: [drm] vblank wait timed out on crtc 0

This currently happens if the fbdev output is not on CRTC 0.

Atomic and non-atomic drivers require distinct code paths. As for other
fbdev operations, implement both and select the correct one at runtime.

Not finding an active CRTC is not a bug. Do not wait in this case, but
flush the display update as before.

v4:
- avoid possible deadlocks with locking context (Sashiko)
v3:
- drop excessive state validation (Jani)
- acquire plane and CRTC mutices (Sashiko)
v2:
- move look-up code into separate helper
- support drivers with legacy modesetting
v1:
- see https://lore.kernel.org/dri-devel/1c9e0e24-9c4a-4259-8700-cf9e5fd60ca3@suse.de/

Co-authored-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: d8c4bddcd8bcb ("drm/fb-helper: Synchronize dirty worker with vblank")
Tested-by: Icenowy Zheng <zhengxingda@iscas.ac.cn>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Closes: https://bugs.debian.org/1138033
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260702145021.226932-1-tzimmermann@suse.de

Merge tag 'iio-fixes-for-7.2a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus

Jonathan writes:

IIO: 1st set of fixes for the 7.2 cycle

Usual mixed bag of recently introduced issues and much older ones.

core
- Ensure kfifo is reset before fd is allocated avoiding concurrent use of
  fifo with reset.
multiple drivers
- Fix up missing Kconfig dependencies.
hid-sensors
- Add support for multibyte read as necessary precursor to...
- Fix stale or zero output when reading raw values for quaternions.
adi,adis
- Add IRQF_NO_THREAD to ensure interrupt is not pushed to the software
  interrupt chip used for trigger demux in the IIO core from a thread.
bosch,bmc150
- Hardening against device returning a reserved out of range value for
  how many entries are in the FIFO.
bosch,bmi160
- Add IRQF_NO_THREAD to ensure interrupt is not pushed to the software
  interrupt chip used for trigger demux in the IIO core from a thread.
dynaimage,al3010
- Fix wrong scale for highest gain_range due to too many digits in the
  micro part (val2).
freescale,mpl3115
- Fix unbalanced runtime pm on error in read_raw().
invensens,icm42600
- Avoid wrong divisor for fifo timestamps when using the watermark
  interrupt.
- Fix timestamp accuracy loss due to excessive divisor for calculations.
kionix,kxsd9
- Fix unbalanced runtime pm on an error in write_raw().
microchip,mcp37feb02
- Fix an uninitialized reference voltage value for particular DT config.
melix,mlx90635
- Build on basis of right Kconfig symbol.
nxp,lpc32xx
- Ensure completion initialized before requesting irq. Hardening against
  spurious IRQ.
nxp,saradc
- Fix a delay calculation.
sharp,gp2ap0002
- Fix unbalanced runtime pm on error in read_raw().
st,lsm6dsx
- Fix an issue seen in wild where an unplanned CPU reset can leave the
  device on the wrong register page, thus leaving the driver wedged.
st,st_sensors library
- Make sure to handle a device that provides data as big endian correctly.
st,spear
- Ensure completion initialized before requesting irq. Hardening against
  spurious IRQ.
taos,tsl2591
- Don't eat return from devm_request_threaded_irq() as that breaks
  deferred probing.
ti,ads1119
- Fix a pm reference count leak in an error path.
ti,ads124s08
- Handle gpio look up errors correctly.

* tag 'iio-fixes-for-7.2a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (28 commits)
  iio: event: Fix event FIFO reset race
  iio: imu: inv_icm42600: fix timestamp clock period by using lower value
  iio: light: al3010: fix incorrect scale for the highest gain range
  iio: adc: nxp-sar-adc: Fix the delay calculation in nxp_sar_adc_wait_for()
  iio: light: tsl2591: return actual error from probe IRQ failure
  iio: imu: inv_icm42600: fix timestamping by limiting FIFO reading
  iio: imu: st_lsm6dsx: deselect shub page before reading whoami
  iio: adc: ad7779: add missing 'select IIO_TRIGGERED_BUFFER' to Kconfig
  iio: adc: ad4130: add missing `select IIO_TRIGGERED_BUFFER` to Kconfig
  iio: adc: ti-ads124s08: Return reset GPIO lookup errors
  iio: temperature: Build mlx90635 with CONFIG_MLX90635
  iio: light: al3320a: add missing REGMAP_I2C to Kconfig
  iio: light: al3010: add missing REGMAP_I2C to Kconfig
  iio: light: al3000a: add missing REGMAP_I2C to Kconfig
  iio: common: st_sensors: honour channel endianness in read_axis_data
  iio: imu: bmi160: add IRQF_NO_THREAD to data-ready trigger IRQ
  iio: imu: adis: add IRQF_NO_THREAD to non-FIFO trigger IRQ
  iio: hid-sensor-rotation: Fix stale or zero output when reading raw values
  HID: sensor-hub: Add sensor_hub_input_attr_read_values() for multi-byte reads
  iio: adc: spear: Initialize completion before requesting IRQ
  ...

scsi: lpfc: Fix memory leak in lpfc_sli4_driver_resource_setup()

The memory allocated for mboxq using mempool_alloc() is not freed in
some of the early exit error paths. Fix that by moving the
mempool_free() call to an earlier point after last use.

Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20260707065304.949135-1-nihaal@cse.iitm.ac.in
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: sg: Report request-table problems when any status is set

SG_GET_REQUEST_TABLE reports per-request diagnostic state through
sg_req_info::problem. The field is meant to indicate whether there is an
error to report for a completed request.

sg_fill_request_table() currently combines masked_status, host_status
and driver_status with bitwise AND. This only reports a problem when all
three status fields are non-zero at the same time. A normal target check
condition, for example, has masked_status set while host_status and
driver_status may both be zero, so the request is incorrectly reported
as clean.

Use the same condition as sg_new_read(), which sets SG_INFO_CHECK when
any of the three status fields is non-zero.

Cc: stable@vger.kernel.org
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/54B60C19F7DB8889+20260707030845.970018-1-raoxu@uniontech.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

scsi: ufs: core: tracing: Do not dereference pointers in TP_printk()

The trace events in drivers/ufs/core/ufs_trace.h were converted to take
a pointer to the hba structure as an argument for the tracepoint and
then in TP_printk() the printing of the dev_name from the ring buffer
was converted to using the dev dereferenced pointer from the hba saved
pointer.

This is not allowed as the TP_printk() is executed at the time the trace
event is read from /sys/kernel/tracing/trace file. That can happen
literally, seconds, minutes, hours, weeks, days, or even months later!
There is no guarantee that the hba pointer will still exist by the time
it is dereferenced when the "trace" file is read.

Instead, save the device name from the hba pointer at the time the
tracepoint is called and place it into the ring buffer event. Then the
TP_printk() can read the name directly from the ring buffer and remove
the possibility that it will read a freed pointer and crash the kernel.

This was detected when testing the trace event code that looks for
TP_printk() parameters doing illegal derferences[1]

[1] https://lore.kernel.org/all/20260630184836.74d477b6@gandalf.local.home/

Cc: stable@vger.kernel.org
Fixes: 583e518e7100 ("scsi: ufs: core: Add hba parameter to trace events")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260630185412.283c26c5@gandalf.local.home
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Merge branch 7.2/scsi-queue into 7.2/scsi-fixes

Pull in outstanding commits from 7.2/scsi-queue.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

regulator: core: regulator_lock_two() should test for EDEADLK not EDEADLOCK

Compare against -EDEADLK, which is what ww_mutex_lock() actually
returns and what every other deadlock check in this file already uses.

Function regulator_lock_two() acquires two regulators via
regulator_lock_nested() -> ww_mutex_lock().  On contention,
ww_mutex_lock() returns -EDEADLK, which is the caller's signal to drop
the lock it holds and retry the acquisition in the canonical order.

However, regulator_lock_two() tests the return value against -EDEADLOCK
rather than -EDEADLK.  On most architectures, EDEADLK and EDEADLOCK are
the same value, so the comparison happens to be correct and the bug is
invisible.  But on MIPS, SPARC, and PowerPC, those two errors have
different values.  The test is wrong: a genuine -EDEADLK backoff no
longer matches -EDEADLOCK, so instead of unlocking and retrying, the
code falls into WARN_ON(ret) and returns with only one of the two
regulators locked.

In practice, this is a bug only on MIPS, because the regulator core is
not built or used on the other two platforms.

In general, EDEADLK is preferred over EDEADLOCK for new code.

Fixes: cba6cfdc7c3f ("regulator: core: Avoid lockdep reports when resolving supplies")
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Link: https://patch.msgid.link/20260708235722.2953579-1-ttabi@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>

smb: client: fix busy dentry warning on unmount after DIO

Commit c68337442f03 ("cifs: Fix busy dentry used after unmounting") fixed
the issue in cifs where deferred close of a file led to a dentry reference
count not being released in umount, by flushing deferredclose_wq in
cifs_kill_sb() to solve it.

However, the cifs DIO path suffers from the same busy-dentry problem caused
by a delayed dentry reference-count release:

[dio] [cifsd] [close + umount]
netfs_unbuffered_write_iter_locked
...
cifs_demultiplex_thread
netfs_unbuffered_write
  cifs_issue_write
  netfs_wait_for_in_progress_stream [1]
...
netfs_write_subrequest_terminated
  netfs_subreq_clear_in_progress
   netfs_wake_collector // wake [1]
  netfs_put_subrequest
netfs_put_request
  queue_work(system_dfl_wq, xxx) [2]
// dio write return cifs_close
_cifsFileInfo_put
  // cfile->count 2->1
  --cfile->count [3]

// umount
cifs_kill_sb
kill_anon_super
  // warning triggered!
  shrink_dcache_for_umount [4]
[system_dfl_wq] [5]
netfs_free_request
...
_cifsFileInfo_put
  // cfile->count 1->0
  --cfile->count
  queue_work(fileinfo_put_wq, xxx)

[fileinfo_put_wq] [6]
cifsFileInfo_put_work
cifsFileInfo_put_final
  dput

If the umount path is triggered before [5], it results warning:
BUG: Dentry 00000000eab1f070{i=9a917b66ae404fec,n=test}  still in use (1)
[unmount of cifs cifs]

The existing per-inode ictx->io_count wait in cifs_evict_inode() does not
help: it lives in the inode eviction path, which runs after
shrink_dcache_for_umount() has already warned about the busy dentries.

Fix it by adding a per-superblock outstanding-rreq counter that is
incremented in cifs_init_request() and decremented in cifs_free_request().
In cifs_kill_sb(), before kill_anon_super(), wait for this counter to reach
0 - which guarantees that all cleanup_work for this sb have run and thus
all relevant cfile puts are queued on fileinfo_put_wq or serverclose_wq.
Then drain the workqueue so the dentry refs are dropped.

This is a targeted wait, not a flush of the system-wide system_dfl_wq.

Fixes: 340cea84f691c ("cifs: open files should not hold ref on superblock")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: Fix support for creating SFU fifo

SFU fifos are natively supported (created and recognized) at least by:
- Microsoft POSIX subsystem
- OpenNT/Interix subsystem
- Microsoft SFU (Windows Services for UNIX)
- Microsoft SUA (Subsystem for UNIX-based Applications)
- Windows NFS server (up to the Windows Server 2008 R2)

Windows NFS server since Windows Server 2012 uses new reparse point format
for storing new fifos, but still can recognize this old format (also in the
latest Windows Server 2022 version).

SFU-style fifo is empty regular file which has system attribute set.

These SFU-style fifos are already recognized by Linux SMB client.

But Linux SMB client is currently creating new SFU fifos in different
format which is not compatible with all those SFU-style consumers. Fix this
by creating new fifos in correct SFU format which would be recognized by
all those applications and also by existing Linux SMB clients.

This change affects only creating new fifos when mount option -o sfu is used.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: Fix support for creating SFU socket

SFU sockets are natively supported by Interix 3.0 subsystem and also by
later versions. It is part of Microsoft SFU (Windows Services for UNIX) and
Microsoft SUA (Subsystem for UNIX-based Applications). They can be created
and existing (stored on local disk or remote SMB share) can be recognized.

SFU sockets are recognized also by NFS server included in Windows Server.
Windows NFS server versions since Windows Server 2012 uses new reparse
point format for storing new sockets, but still can recognize this old
format (also in the latest Windows Server 2022 version).

SFU-style socket is a regular file which has system attribute set and
content of the file is one zero byte.

These SFU-style sockets are already recognized by Linux SMB client.

But Linux SMB client is currently creating new SFU socket in different
format which is not compatible with all those SFU applications. Fix this by
creating new sockets in correct SFU format which would be recognized by all
SFU, SUA, NFS and existing Linux SMB clients.

This change affects only creating new sockets when mount option -o sfu is used.

Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix atime clamp check in read completion

cifs_rreq_done() updates the inode atime to current_time(inode) after a
netfs read.  It then preserves the CIFS rule that atime should not be
older than mtime, because some applications break if atime is less than
mtime.  That rule only requires clamping when atime < mtime.

The current check uses the raw non-zero result of timespec64_compare().
It therefore takes the clamp path for both atime < mtime and
atime > mtime.  The latter is the normal case when reading an older file:
the newly recorded atime is newer than the file mtime.  The completion
handler then immediately moves atime back to mtime, losing the access
time that was just recorded.  Userspace tools that rely on atime, such as
stat, find -atime, backup tools or cold-data classifiers, can therefore
see a recently read CIFS file as not recently accessed.

This is easy to miss because the bug is silent: read I/O still succeeds,
no error is reported, and many systems either do not check atime after
reads or mount with policies such as relatime/noatime.  It becomes
visible when a CIFS file has an mtime older than the current time, the
file is read, and the local inode atime is inspected before a later
revalidation replaces the cached timestamps.

Clamp only when atime is actually older than mtime.  This matches the
same atime/mtime rule used when applying CIFS inode attributes.

Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Cc: stable@vger.kernel.org
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

ASoC: tas2562: fix deprecated 'shut-down' GPIO always cleared after lookup

In tas2562_parse_dt(), the fallback lookup for the deprecated
"shut-down" GPIO property is broken due to a missing pair of braces.

The code intends to reset sdz_gpio to NULL only when the lookup
returns an error that is not -EPROBE_DEFER (so the driver gracefully
continues without a GPIO). However, without braces the statement:

tas2562->sdz_gpio = NULL;

falls outside the IS_ERR() check and is executed unconditionally
for every path through the if block, including a successful GPIO
lookup.

This means any device using the deprecated 'shut-down' DT property
will always have sdz_gpio == NULL after probe, making the GPIO
completely non-functional.

Fix this by adding the missing braces to scope the NULL assignment
inside the IS_ERR() branch, matching the pattern already used for
the primary 'shutdown' GPIO lookup above.

Fixes: f78a97003b8b ("ASoC: tas2562: Update shutdown GPIO property")
Signed-off-by: Uday Khare <udaykhare77@gmail.com>
Link: https://patch.msgid.link/20260706153109.10953-1-udaykhare77@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

drm/amdkfd: Check bounds on CRIU restore queue type and mqd size

We weren't checking whether the values provided in the private
data in kfd CRIU restore were within bounds.

For queue type, add a KFD_QUEUE_TYPE_MAX and ensure the provided
type is less than it.

For mqd_size, add new function mqd_size_from_queue_type and confirm
that the provided mqd_size matches expectations.

Reviewed-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f19d8086f6644083c913d70bfdeee20e1b6f46a5)
Cc: stable@vger.kernel.org