Eric Biggers [Sat, 18 Apr 2026 22:13:09 +0000 (15:13 -0700)]
smb: client: Remove obsolete cmac(aes) allocation
Since the crypto library API is now being used instead of crypto_shash,
the "cmac(aes)" crypto_shash that is being allocated and stored in
'struct cifs_secmech' is no longer used. Remove it.
That makes the kconfig selection of CRYPTO_CMAC and the module softdep
on "cmac" unnecessary. So remove those too.
Finally, since this removes the last use of crypto_shash from the smb
client, also remove the remaining crypto_shash-related helper functions.
Note: cifs_unicode.c was relying on <linux/unaligned.h> being included
transitively via <crypto/internal/hash.h>. Since the latter include is
removed, make cifs_unicode.c include <linux/unaligned.h> explicitly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Eric Biggers [Sat, 18 Apr 2026 22:13:08 +0000 (15:13 -0700)]
smb: client: Use AES-CMAC library for SMB3 signature calculation
Convert smb3_calc_signature() to use the AES-CMAC library instead of a
"cmac(aes)" crypto_shash.
The result is simpler and faster code. With the library there's no need
to allocate memory, no need to handle errors except for key preparation,
and the AES-CMAC code is accessed directly without inefficient indirect
calls and other unnecessary API overhead.
For now a "cmac(aes)" crypto_shash is still being allocated in
'struct cifs_secmech'. Later commits will remove that, simplifying the
code even further.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
This patch implements several micro-optimizations on lz77_compress()
with the goal of reducing the number of instructions per [input]
byte (a.k.a. IPB).
Changes:
- change hashtable to be u32 (instead of u64) -- change the hash
function to reflect that (adds lz77_hash() and lz77_read32() helpers)
- batch-write literals instead of 1 by 1 -- now that we have a well
defined hot path (match finding) and a cold path (encode literals +
match), batch writing makes a significant difference
- implement adaptive skipping of input bytes -- skip input bytes more
aggressively if too few matches are being found
- name some constants for more meaningful context
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: compress: fix counting in LZ77 match finding
- lz77_match_len() increments @cur before checking for equality,
leading to off-by-one match len in some cases.
Fix by moving pointers increment to inside the loop.
Also rename @wnd arg to @match (more accurate name).
- both lz77_match_len() and lz77_compress() checked for
"buf + step < end" when the correct is "<=" for such cases.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: compress: fix buffer overrun in lz77_compress()
@dst buffer is allocated with same size as @src, which, for good
compression cases, works fine.
However, when compression goes bad (e.g. random bytes payloads), the
compressed size can increase significantly, and even by stopping the
main loop at 7/8 of @slen, writing leftover literals could write past
the end of @dst because of LZ77 metadata.
To fix this, add lz77_compressed_alloc_size() helper to compute the
correct allocation size for @dst, accounting for metadata and worst
cast scenario (all literals).
While this is overprovisioning memory, it's not only correct, but also
allows lz77_compress() main loop to run without ever checking @dst
limits (i.e. a perf improvement).
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: scope end_of_dacl to CIFS_DEBUG2 use in parse_dacl
After validate_dacl() was factored out in commit 149822e5541c, the
local end_of_dacl in parse_dacl() is only read by the dump_ace()
call under #ifdef CONFIG_CIFS_DEBUG2. With CIFS_DEBUG2 off the
variable is assigned but never used, which gcc -W=1 flags as
-Wunused-but-set-variable.
Remove the local and compute the end-of-dacl pointer inline at the
single call site inside the existing CIFS_DEBUG2 guard. No
functional change: when CIFS_DEBUG2 is enabled the argument value
is identical to what the removed local carried; when CIFS_DEBUG2
is disabled the code was already dead.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604220046.tGkRxVtS-lkp@intel.com/ Fixes: 149822e5541c ("smb: client: validate the whole DACL before rewriting it in cifsacl") Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Steve French <stfrench@microsoft.com>
Instead of fixing this with some kind of "is module initialized"
approach, this patch instead moves that functionality to procfs,
setting a write op for the existing open_dirs entry, where
writing a 0 to it will drop the cached directory entries.
Also make it available only when CONFIG_CIFS_DEBUG=y.
A small change needed now is to not call flush_delayed_work()
on invalidate_all_cached_dirs() when called from procfs (can't sleep in
that context).
So add a @sync arg to invalidate_all_cached_dirs() to control when to
flush the delayed works.
Fixes: dde6667fa3c8 ("smb: client: add drop_dir_cache module parameter to invalidate cached dirents") Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: require a full NFS mode SID before reading mode bits
parse_dacl() treats an ACE SID matching sid_unix_NFS_mode as an NFS
mode SID and reads sid.sub_auth[2] to recover the mode bits.
That assumes the ACE carries three subauthorities, but compare_sids()
only compares min(a, b) subauthorities. A malicious server can return
an ACE with num_subauth = 2 and sub_auth[] = {88, 3}, which still
matches sid_unix_NFS_mode and then drives the sub_auth[2] read four
bytes past the end of the ACE.
Require num_subauth >= 3 before treating the ACE as an NFS mode SID.
This keeps the fix local to the special-SID mode path without changing
compare_sids() semantics for the rest of cifsacl.
Fixes: e2f8fbfb8d09 ("cifs: get mode bits from special sid on stat") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: validate the whole DACL before rewriting it in cifsacl
build_sec_desc() and id_mode_to_cifs_acl() derive a DACL pointer from a
server-supplied dacloffset and then use the incoming ACL to rebuild the
chmod/chown security descriptor.
The original fix only checked that the struct smb_acl header fits before
reading dacl_ptr->size or dacl_ptr->num_aces. That avoids the immediate
header-field OOB read, but the rewrite helpers still walk ACEs based on
pdacl->num_aces with no structural validation of the incoming DACL body.
A malicious server can return a truncated DACL that still contains a
header, claims one or more ACEs, and then drive
replace_sids_and_copy_aces() or set_chmod_dacl() past the validated
extent while they compare or copy attacker-controlled ACEs.
Factor the DACL structural checks into validate_dacl(), extend them to
validate each ACE against the DACL bounds, and use the shared validator
before the chmod/chown rebuild paths. parse_dacl() reuses the same
validator so the read-side parser and write-side rewrite paths agree on
what constitutes a well-formed incoming DACL.
Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Assisted-by: Codex:gpt-5-4 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: client: fix OOB read in smb2_ioctl_query_info QUERY_INFO path
smb2_ioctl_query_info() has two response-copy branches: PASSTHRU_FSCTL
and the default QUERY_INFO path. The QUERY_INFO branch clamps
qi.input_buffer_length to the server-reported OutputBufferLength and then
copies qi.input_buffer_length bytes from qi_rsp->Buffer to userspace, but
it never verifies that the flexible-array payload actually fits within
rsp_iov[1].iov_len.
A malicious server can return OutputBufferLength larger than the actual
QUERY_INFO response, causing copy_to_user() to walk past the response
buffer and expose adjacent kernel heap to userspace.
Guard the QUERY_INFO copy with a bounds check on the actual Buffer
payload. Use struct_size(qi_rsp, Buffer, qi.input_buffer_length)
rather than an open-coded addition so the guard cannot overflow on
32-bit builds.
Fixes: f5778c398713 ("SMB3: Allow SMB3 FSCTL queries to be sent to server from tools") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Assisted-by: Claude:claude-opus-4-6 Assisted-by: Codex:gpt-5-4 Signed-off-by: Steve French <stfrench@microsoft.com>
Amit Barzilai [Mon, 20 Apr 2026 13:44:23 +0000 (16:44 +0300)]
fbdev: clps711x-fb: Request memory region for MMIO
Use devm_platform_get_and_ioremap_resource() for resource 0 (the MMIO
control register range) instead of open-coding platform_get_resource()
and devm_ioremap() separately. The helper requests the memory region
before mapping it, which registers the range in /proc/iomem and prevents
another driver from mapping the same registers.
This makes resource 0 consistent with resource 1 (the framebuffer),
which already uses devm_platform_get_and_ioremap_resource().
Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Amit Barzilai <amit.barzilai22@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
Amit Barzilai [Mon, 20 Apr 2026 13:44:22 +0000 (16:44 +0300)]
fbdev: cobalt_lcdfb: Request memory region
Use devm_platform_get_and_ioremap_resource() instead of open-coding
platform_get_resource() and devm_ioremap() separately. The helper
requests the memory region before mapping it, which registers the range
in /proc/iomem and prevents another driver from mapping the same
registers.
Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Amit Barzilai <amit.barzilai22@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
Limit the digital gain and PA volumes to a combined -3 dB in the machine
driver to reduce the risk of speaker damage until we have active speaker
protection in place (or higher safe levels have been established).
Based on commit c481016bb4f8 ("ASoC: qcom: sc8280xp: limit speaker
volumes") which addressed the same issue on the sc8280x SoC with some
minor changes as explained below.
The Digital Volume behaves almost identical to sc8280x since both use
the same lpass-wsa-macro, but x1e80100 has two sets of controls prefixed
with WSA and WSA2.
For PA x1e80100 machines use wsa884x amplifiers which expose a linear
scale from -9 dB to 9 dB with a 1.5 dB step size giving us
0 dB = -9 dB + 6 * 1.5 dB.
On x1e80100 there are two different speaker topologies we need to handle:
2-Speakers: SpkrLeft, Spkr Right
4-Speakers: WooferLeft, WooferRight, TweeterLeft, TweeterRight
Johan Hovold [Tue, 21 Apr 2026 14:39:23 +0000 (16:39 +0200)]
spi: axiado: fix runtime pm imbalance on probe failure
Make sure that the controller is active before disabling clocks on late
probe failure and on driver unbind to avoid a clock disable imbalance.
Also make sure that the usage count is balanced on probe failure (e.g.
probe deferral) so that the controller can be suspended when a driver is
later bound.
Note that the runtime PM state can only be set when runtime PM is
disabled.
Fixes: e75a6b00ad79 ("spi: axiado: Add driver for Axiado SPI DB controller") Cc: stable@vger.kernel.org # 7.0 Cc: Vladimir Moravcevic <vmoravcevic@axiado.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260421143925.1551781-2-johan@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
When CONFIG_FB_SAVAGE_I2C is enabled, savagefb_probe() can build both an
EDID-derived monspecs.modedb and a modelist from it before later failing.
The normal success path frees monspecs.modedb after the initial mode selection,
but the probe error path only deletes the I2C busses and misses the
EDID-derived allocations.
Free both the modelist and monspecs.modedb on the failed: unwind path.
Co-developed-by: Myeonghun Pak <mhun512@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Co-developed-by: Taegyu Kim <tmk5904@psu.edu> Signed-off-by: Taegyu Kim <tmk5904@psu.edu> Signed-off-by: Yuho Choi <dbgh9129@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
fbdev: offb: fix PCI device reference leak on probe failure
offb_init_nodriver() gets a referenced PCI device with pci_get_device().
If pci_enable_device() fails, the function returns without dropping that
reference.
Release the PCI device reference before returning from the
pci_enable_device() failure path.
Fixes: 5bda8f7b5468 ("video: fbdev: offb: Call pci_enable_device() before using the PCI VGA device") Co-developed-by: Myeonghun Pak <mhun512@gmail.com> Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Co-developed-by: Ijae Kim <ae878000@gmail.com> Signed-off-by: Ijae Kim <ae878000@gmail.com> Co-developed-by: Taegyu Kim <tmk5904@psu.edu> Signed-off-by: Taegyu Kim <tmk5904@psu.edu> Signed-off-by: Yuho Choi <dbgh9129@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
in smb2_get_info_sec, a dummy security descriptor (SD) is returned if
the requested information is not supported.
the code is currently wrong, as DACL_PROTECTED is set in the type field,
but there is no DACL is present.
instead of faking a security, report a STATUS_NOT_SUPPORTED error.
this seems to fix a "Error 0x80090006: Invalid Signature" on file
transfers with Windows 11 clients (25H2, build 26200.8246).
capturing traffic shows that the client is sending a GET_INFO/SEC_INFO
request, with the additional_info field set to 0x20
(ATTRIBUTE_SECURITY_INFORMATION). Returning an empty SD
(with only SELF_RELATIVE set) does not fix the error.
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Hyunwoo Kim [Mon, 20 Apr 2026 15:31:47 +0000 (00:31 +0900)]
ksmbd: scope conn->binding slowpath to bound sessions only
When the binding SESSION_SETUP sets conn->binding = true, the flag stays
set after the call so that the global session lookup in
ksmbd_session_lookup_all() can find the session, which was not added to
conn->sessions. Because the flag is connection-wide, the global lookup
path will also resolve any other session by id if asked.
Tighten the global lookup so that the returned session must have this
connection registered in its channel xarray (sess->ksmbd_chann_list).
The channel entry is installed by the existing binding_session path in
ntlm_authenticate()/krb5_authenticate() when a SESSION_SETUP completes
successfully, so this condition is a strict equivalent of "this
connection has been accepted as a channel of this session". Connections
that have not bound to a given session cannot reach it via the global
table.
The existing conn->binding gate for entering the slowpath is preserved
so that non-binding connections keep the fast-path-only behavior, and
the session->state check is unchanged.
Fixes: f5a544e3bab7 ("ksmbd: add support for SMB3 multichannel") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Mon, 20 Apr 2026 17:51:25 +0000 (02:51 +0900)]
ksmbd: fix CreateOptions sanitization clobbering the whole field
smb2_open() attempts to clear conflicting CreateOptions bits
(FILE_SEQUENTIAL_ONLY_LE together with FILE_RANDOM_ACCESS_LE, and
FILE_NO_COMPRESSION_LE on a directory open), but uses a plain
assignment of the bitwise negation of the target flag:
This replaces the entire field with 0xFFFFFFFB / 0xFFFFFFEF rather
than clearing a single bit. With the SEQUENTIAL/RANDOM case, the
next check for FILE_OPEN_BY_FILE_ID_LE | CREATE_TREE_CONNECTION |
FILE_RESERVE_OPFILTER_LE then trivially matches and a legitimate
request is rejected with -EOPNOTSUPP. With the NO_COMPRESSION case,
every downstream test (FILE_DELETE_ON_CLOSE, etc.) operates on a
corrupted CreateOptions value.
Use &= ~FLAG to clear only the intended bit in both places.
Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Mon, 20 Apr 2026 18:45:11 +0000 (03:45 +0900)]
ksmbd: fix durable fd leak on ClientGUID mismatch in durable v2 open
ksmbd_lookup_fd_cguid() returns a ksmbd_file with its refcount
incremented via ksmbd_fp_get(). parse_durable_handle_context() in
the DURABLE_REQ_V2 case properly releases this reference on every
path inside the ClientGUID-match branch, either by calling
ksmbd_put_durable_fd() or by transferring ownership to dh_info->fp
for a successful reconnect. However, when an entry exists in the
global file table with the same CreateGuid but a different
ClientGUID, the code simply falls through to the new-open path
without dropping the reference obtained from ksmbd_lookup_fd_cguid().
Per MS-SMB2 section 3.3.5.9.10 ("Handling the
SMB2_CREATE_DURABLE_HANDLE_REQUEST_V2 Create Context"), the server
MUST locate an Open whose Open.CreateGuid matches the request's
CreateGuid AND whose Open.ClientGuid matches the ClientGuid of the
connection that received the request. If no such Open is found, the
server MUST continue with the normal open execution phase. A
CreateGuid hit with a ClientGUID mismatch is therefore the
"Open not found" case: proceeding with a new open is correct, but
the reference obtained purely as a side effect of the lookup must
not be leaked.
Repeated requests that hit this mismatch pin global_ft entries,
prevent __ksmbd_close_fd() from ever running for the corresponding
files, and defeat the durable scavenger, leading to long-lived
resource leaks.
Release the reference in the mismatch path and clear dh_info->fp so
subsequent logic does not mistake a non-matching lookup result for
a reconnect target.
Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Sun, 19 Apr 2026 11:02:55 +0000 (20:02 +0900)]
ksmbd: destroy async_ida in ksmbd_conn_free()
When per-connection async_ida was converted from a dynamically
allocated ksmbd_ida to an embedded struct ida, ksmbd_ida_free() was
removed from the connection teardown path but no matching
ida_destroy() was added. The connection is therefore freed with the
IDA's backing xarray still intact.
The kernel IDA API expects ida_init() and ida_destroy() to be paired
over an object's lifetime, so add the missing cleanup before the
connection is freed.
No leak has been observed in testing; this is a pairing fix to match
the IDA lifetime rules, not a response to a reproduced regression.
Fixes: d40012a83f87 ("cifsd: declare ida statically") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Sun, 19 Apr 2026 11:02:54 +0000 (20:02 +0900)]
ksmbd: destroy tree_conn_ida in ksmbd_session_destroy()
When per-session tree_conn_ida was converted from a dynamically
allocated ksmbd_ida to an embedded struct ida, ksmbd_ida_free() was
removed from ksmbd_session_destroy() but no matching ida_destroy()
was added. The session is therefore freed with the IDA's backing
xarray still intact.
The kernel IDA API expects ida_init() and ida_destroy() to be paired
over an object's lifetime, so add the missing cleanup before the
enclosing session is freed.
Also move ida_init() to right after the session is allocated so that
it is always paired with the destroy call even on the early error
paths of __session_create() (ksmbd_init_file_table() or
__init_smb2_session() failures), both of which jump to the error
label and invoke ksmbd_session_destroy() on a partially initialised
session.
No leak has been observed in testing; this is a pairing fix to match
the IDA lifetime rules, not a response to a reproduced regression.
Fixes: d40012a83f87 ("cifsd: declare ida statically") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Eric Biggers [Sat, 18 Apr 2026 22:17:07 +0000 (15:17 -0700)]
ksmbd: Use AES-CMAC library for SMB3 signature calculation
Now that AES-CMAC has a library API, convert ksmbd_sign_smb3_pdu() to
use it instead of a "cmac(aes)" crypto_shash.
The result is simpler and faster code. With the library there's no need
to dynamically allocate memory, no need to handle errors, and the
AES-CMAC code is accessed directly without inefficient indirect calls
and other unnecessary API overhead.
Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
DaeMyung Kang [Sat, 18 Apr 2026 17:28:44 +0000 (02:28 +0900)]
ksmbd: reset rcount per connection in ksmbd_conn_wait_idle_sess_id()
rcount is intended to be connection-specific: 2 for curr_conn, 1 for
every other connection sharing the same session. However, it is
initialised only once before the hash iteration and is never reset.
After the loop visits curr_conn, later sibling connections are also
checked against rcount == 2, so a sibling with req_running == 1 is
incorrectly treated as idle. This makes the outcome depend on the
hash iteration order: whether a given sibling is checked against the
loose (< 2) or the strict (< 1) threshold is decided by whether it
happens to be visited before or after curr_conn.
The function's contract is "wait until every connection sharing this
session is idle" so that destroy_previous_session() can safely tear
the session down. The latched rcount violates that contract and
reopens the teardown race window the wait logic was meant to close:
destroy_previous_session() may proceed before sibling channels have
actually quiesced, overlapping session teardown with in-flight work
on those connections.
Recompute rcount inside the loop so each connection is compared
against its own threshold regardless of iteration order.
This is a code-inspection fix for an iteration-order-dependent logic
error; a targeted reproducer would require SMB3 multichannel with
in-flight work on a sibling channel landing after curr_conn in hash
order, which is not something that can be triggered reliably.
Fixes: 76e98a158b20 ("ksmbd: fix race condition between destroy_previous_session() and smb2 operations()") Cc: stable@vger.kernel.org Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
regulator: qcom: Unify user-visible "Qualcomm" name
Various names for Qualcomm as a company are used in user-visible config
options: QCOM, Qualcomm and Qualcomm Technologies. Switch to unified
"Qualcomm" so it will be easier for users to identify the options when
for example running menuconfig.
Sean Chang [Sun, 19 Apr 2026 16:31:38 +0000 (00:31 +0800)]
NFS: Fix RCU dereference of cl_xprt in nfs_compare_super_address
The cl_xprt pointer in struct rpc_clnt is marked as __rcu. Accessing
it directly in nfs_compare_super_address() is unsafe and triggers
Sparse warnings.
Fix this by using rcu_dereference() within an RCU read-side critical
section to retrieve the transport pointer. This addresses the sparse
warning and ensures atomic access to the pointer, as the transport
can be updated via transport switching even while the superblock
remains active under sb_lock.
Fixes: 7e3fcf61abde ("nfs: don't share mounts between network namespaces") Signed-off-by: Sean Chang <seanwascoding@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Sean Chang [Sun, 19 Apr 2026 16:31:37 +0000 (00:31 +0800)]
NFS: remove redundant __private attribute from nfs_page_class
The nfs_page_class tracepoint uses a pointer for the 'req' field marked
with the __private attribute. This causes Sparse to complain about
dereferencing a private pointer within the trace ring buffer context,
specifically during the TP_fast_assign() operation.
This fixes a Sparse warning introduced in commit b6ef079fd984 ("nfs:
more in-depth tracing of writepage events") by removing the redundant
__private attribute from the 'req' field.
Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Sean Chang <seanwascoding@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFSv4.2: fix CLONE/COPY attrs in presence of delegated attributes
xfstest generic/407 is failing in 2 ways. It detects that after
doing a clone the client does not update it's mtime and it's ctime.
CLONE always sends a GETATTR operation and then calls
nfs_post_op_update_inode() based on the returned attributes.
Because of the delegated attributes the client ignores updating
the mtime. Then also, when delegated attributes are present, for
the change_attr the server replies with the same values as what
the client cached before and thus the generic/407 would flag that.
Instead, make sure we invalidate the blocks attr.
By adding updating delegated attributes in nfs42_copy_dest_done()
both COPY and CLONE would update mtime appropriately.
Fixes: e12912d94137 ("NFSv4: Add support for delegated atime and mtime attributes") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
After running xfstest generic/751, in certain conditions, can have
a writeback IO stuck while experiencing one of the two patterns.
Pattern#1: writeback IO experiences ENOSPC on an offset smaller
than the filesize. Example,
write offset=0 len=4096 how=unstable OK
write offset=8192 len=4096 how=unstable OK
write offset=12288 len=4096 how=unstable ENOSPC
write offset=4096 len=4096 how=unstable ENOSPC
client sends a commit and receives a verifier which is different
from the last successful write. It marks pages dirty and writeback
retries. But it again send writes unstable and gets into the same
pattern, running into the ENOSPC error and sending a commit because
writes were sent at unstable.
Pattern#2: an unstable write followed by a short write and ENOSPC.
write offset=0 len=4096 how=unstable OK
write offset=4096 len=4096 how=unstable returns OK but count=100
write offset=4197 len=3996 how=stable returns ENOSPC
client send a commit and receives a verifier different from
the last unstable write. The same behaviour is retried in a loop.
Instead, this patch proposes to identify those conditions and mark
requests to be done synchronously instead. Previous solution tried
to mark it in the nfs_page, however that's not persistent thus
instead mark it in the nfs_open_context.
Furthermore, the same problem occurs during localio code path so
recognize that IO needs to be done sync in that case as well.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Johan Hovold [Tue, 21 Apr 2026 12:56:32 +0000 (14:56 +0200)]
spi: imx: fix runtime pm leak on probe deferral
Make sure to balance the runtime PM usage count before returning on
probe failure (e.g. probe deferral) so that the controller can be
suspended when a driver is later bound.
Fixes: 43b6bf406cd0 ("spi: imx: fix runtime pm support for !CONFIG_PM") Cc: stable@vger.kernel.org # 5.10 Cc: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260421125632.1537235-1-johan@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
ntfs: use page allocation for resident attribute inline data
The current kmemdup() based allocation for IOMAP_INLINE can result in
inline_data pointer having a non-zero page offset. This causes
iomap_inline_data_valid() to fail the check:
and triggers the kernel BUG at fs/iomap/buffered-io.c:1061.
This particularly affects workloads with frequent small file access
(e.g. Firefox Nightly profile on NTFS with bind mount) when using the
new ntfs. This fix this by allocating a full page with alloc_page() so that
page_address() always returns a page-aligned address.
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
ntfs: fix mmap_prepare writable check for shared mappings
Linus pointed out that checking only VMA_WRITE_BIT is incorrect.
Private writable mappings (MAP_PRIVATE) set VM_WRITE but do not
write back to the filesystem. Also, mappings that can become
writable via mprotect() (VM_MAYWRITE) must be handled.
Use vma_desc_test_all(VMA_SHARED_BIT, VMA_MAYWRITE_BIT) instead,
which matches what other filesystems do.
Tiezhu Yang [Wed, 22 Apr 2026 07:45:34 +0000 (15:45 +0800)]
LoongArch: BPF: Support up to 12 function arguments for trampoline
Currently, LoongArch bpf trampoline supports up to 8 function arguments.
According to the statistics from commit 473e3150e30a ("bpf, x86: allow
function arguments up to 12 for TRACING"), there are over 200 functions
accept 9 to 12 arguments, so add 12 arguments support for trampoline.
With this patch, the following related testcases passed:
sudo ./test_progs -a tracing_struct/struct_many_args
sudo ./test_progs -a fentry_test/fentry_many_args
sudo ./test_progs -a fexit_test/fexit_many_args
Tiezhu Yang [Wed, 22 Apr 2026 07:45:34 +0000 (15:45 +0800)]
LoongArch: BPF: Support small struct arguments for trampoline
In the current BPF code, the struct argument size is at most 16 bytes,
enforced by the verifier. According to the Procedure Call Standard for
LoongArch, the struct argument size below 16 bytes are provided as part
of the 8 argument registers, that is to say, the struct argument may be
passed in a pair of registers if its size is more than 8 bytes and no
more than 16 bytes.
Extend the BPF trampoline JIT to support attachment to functions that
take small structures (up to 16 bytes) as argument, save and restore a
number of "argument registers" rather than a number of arguments.
With this patch, the following related testcases passed:
sudo ./test_progs -a tracing_struct/struct_args
sudo ./test_progs -a tracing_struct/union_args
Tiezhu Yang [Wed, 22 Apr 2026 07:45:34 +0000 (15:45 +0800)]
LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
invoke_bpf_mod_ret() is a small wrapper over invoke_bpf_prog(), it
should check the return value of invoke_bpf_prog() and then return
immediately if invoke_bpf_prog() failed, just open code and remove
it due to it is called only once.
Tiezhu Yang [Wed, 22 Apr 2026 07:45:34 +0000 (15:45 +0800)]
LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
The 8 and 16 bit read-modify-write instructions {amadd/amswap}.{b/h}
were newly added in the latest LoongArch Reference Manual, use them to
avoid the error of unknown opcode if possible.
Tiezhu Yang [Wed, 22 Apr 2026 07:45:34 +0000 (15:45 +0800)]
LoongArch: BPF: Add the default case in emit_atomic() and rename it
Like the other archs such as x86 and riscv, add the default case
in emit_atomic() to print an error message for the invalid opcode
and return -EINVAL, then make its return type as int.
While at it, given that all of the instructions in emit_atomic()
are only read-modify-write instructions, rename emit_atomic() to
emit_atomic_rmw() to make it clear, because there will be a new
function emit_atomic_ld_st() for load-acquire and store-release
instructions in the later patch.
Tiezhu Yang [Wed, 22 Apr 2026 07:45:13 +0000 (15:45 +0800)]
LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
The 8 and 16 bit read-modify-write atomic instructions amadd.{b/h} and
amswap.{b/h} were newly added in the latest LoongArch Reference Manual,
define the instruction format and check whether support via CPUCFG.
Furthermore, define the instruction format for DBAR which will be used
to support BPF load-acquire and store-release instructions.
LoongArch: Add spectre boundry for syscall dispatch table
The LoongArch syscall number is directly controlled by userspace, but
does not have a array_index_nospec() boundry to prevent access past the
syscall function pointer tables.
Most LoongArch processors are vulnerable to Spectre-V1 Proof-of-Concept
(PoC). And the generic mechanism, __user pointer sanitization, can be
used as a mitigation. This means to use array_index_nospec() to prevent
out of boundry access in syscall and other critical paths.
Implement the arch-specific cpu_show_spectre_v1() to show CPU Spectre-V1
vulnerabilites correctly.
LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
After commit 7c405fb3279b3924 ("rcu: Use an intermediate irq_work to
start process_srcu()"), Loongson-2K0300/2K0500 fail to boot. Because
IRQ_WORK need IPI but Loongson-2K0300/2K0500 don't have IPI HW.
So make arch_irq_work_has_interrupt() return true only if IPI HW exist.
Luo Qiu [Wed, 22 Apr 2026 07:45:12 +0000 (15:45 +0800)]
LoongArch: Use get_random_canary() for stack canary init
Like others, replace the custom stack canary initialization with the
get_random_canary() helper, following the pattern established in commit 622754e84b10 ("stackprotector: actually use get_random_canary()").
Signed-off-by: Luo Qiu <luoqiu@kylinsec.com.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Yuqian Yang [Wed, 22 Apr 2026 07:45:11 +0000 (15:45 +0800)]
LoongArch: Improve the logging of disabling KASLR
Whether KASLR is disabled is not handled in nokaslr() which is the early
param "nokaslr" setup function, but in kaslr_disabled(). However, the
logging was previously done in nokaslr() and lack detail. So we move the
logging to the right place and add more specific infomation about why it
is disabled.
Lisa Robinson [Wed, 22 Apr 2026 07:45:11 +0000 (15:45 +0800)]
LoongArch: Align FPU register state to 32 bytes
Move fpr to the beginning of struct loongarch_fpu so it is naturally
aligned to FPU_ALIGN (32 bytes), improving 256-bit SIMD (LASX) context
switch performance.
Also adjust process.c and fpu.S to work well with the new loongarch_fpu
layout.
Signed-off-by: Lisa Robinson <lisa@bytefly.space> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
Add HIGHMEM (High Memory) support for LoongArch, mostly needed by 32BIT
kernel because the size of kernel virtual memory space is only 512MB and
the size of usable physical memory is only 256MB in this case.
HIGHMEM adds permanent kernel mapping (PKMAP) and fixed kernel mapping
(FIX_KMAP), which increase usable physical memory up to 2.25GB (2304MB).
We can just use the generic copy_user_highpage(), so remove the custom
version.
After a kexec the logical processors and virtual processors already
exist in the hypervisor because they were created by the previous
kernel. Attempting to add them again causes either a BUG_ON or
corrupted VP state leading to MCEs in the new kernel.
Add hv_lp_exists() to probe whether an LP is already present by
calling HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME. When it succeeds the
LP exists and we skip the add-LP and create-VP loops entirely.
Also add hv_call_notify_all_processors_started() which informs the
hypervisor that all processors are online. This is required after
adding LPs (fresh boot) and is a no-op on kexec since we skip that
path.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Co-developed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
x86/hyperv: move stimer cleanup to hv_machine_shutdown()
Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to
hv_machine_shutdown() in the platform code. This ensures stimer cleanup
happens before the vmbus unload, which is required for root partition
kexec to work correctly.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
vmbus_alloc_synic_and_connect() declares a local 'int
hyperv_cpuhp_online' that shadows the file-scope global of the same
name. The cpuhp state returned by cpuhp_setup_state() is stored in
the local, leaving the global at 0 (CPUHP_OFFLINE). When
hv_kexec_handler() or hv_machine_shutdown() later call
cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the
BUG_ON in __cpuhp_remove_state_cpuslocked().
Remove the local declaration so the cpuhp state is stored in the
file-scope global where hv_kexec_handler() and hv_machine_shutdown()
expect it.
Fixes: 2647c96649ba ("Drivers: hv: Support establishing the confidential VMBus connection") Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
Sangyun Kim [Sun, 19 Apr 2026 08:08:38 +0000 (17:08 +0900)]
pwm: atmel-tcb: Cache clock rates and mark chip as atomic
atmel_tcb_pwm_apply() holds tcbpwmc->lock as a spinlock via
guard(spinlock)() and then calls atmel_tcb_pwm_config(), which calls
clk_get_rate() twice. clk_get_rate() acquires clk_prepare_lock (a
mutex), so this is a sleep-in-atomic-context violation.
On CONFIG_DEBUG_ATOMIC_SLEEP kernels every pwm_apply_state() that
enables or reconfigures the PWM triggers a "BUG: sleeping function
called from invalid context" warning.
Acquire exclusive control over the clock rates with
clk_rate_exclusive_get() at probe time and cache the rates in struct
atmel_tcb_pwm_chip, then read the cached rates from
atmel_tcb_pwm_config(). This keeps the spinlock-based mutual exclusion
introduced in commit 37f7707077f5 ("pwm: atmel-tcb: Fix race condition
and convert to guards") and removes the sleeping calls from the atomic
section.
With no sleeping calls left in .apply() and the regmap-mmio bus already
running with fast_io=true, also mark the chip as atomic so consumers
can use pwm_apply_atomic() from atomic context.
Fixes: 37f7707077f5 ("pwm: atmel-tcb: Fix race condition and convert to guards") Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Link: https://patch.msgid.link/20260419080838.3192357-1-sangyun.kim@snu.ac.kr
[ukleinek: Ensure .clk is enabled before calling clk_get_rate on it.] Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
io_uring: take page references for NOMMU pbuf_ring mmaps
Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.
The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page. Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.
Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them. The user can then write into whatever
the allocator hands out next.
Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close. NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap.
This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed. The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.
Cc: Jens Axboe <axboe@kernel.dk> Reported-by: Anthropic Assisted-by: gkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
[axboe: get rid of region lookup, just iterate pages in vma] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
"fprobe bug fixes:
- Prevent re-registration
Add an earlier check to reject re-registering an already active
fprobe before its state is modified during the initialization phase
- Robustness in failure paths:
- Ensure fprobes are correctly removed from all internal tables
and properly RCU-freed during registration failure
- Make unregister_fprobe() proceed with unregistration even if
temporary memory allocation fails
- RCU safety in module unloading
Avoid a potential "sleep in RCU" warning by removing a kcalloc()
call in the module notifier path. This also tries to remove
fprobe_hash_node even if memory allocation fails.
- Type-aware unregistration
Fix a bug where unregistering an fprobe did not account for
different types (entry-only vs entry-exit) at the same address,
which previously left "junk" entries in the underlying
ftrace/fgraph ops
- Unregistration of empty ftrace_ops
Avoid unneeded performance overhead due to making registered
ftrace_ops empty - which means 'trace all functions'. This counts
remaining entries and unregister ftrace_ops when it becomes empty.
Two new selftests to check above fixes:
- Module Unloading Test:
Specifically verifies that fprobe events on a module are correctly
cleaned up and do not trigger 'trace-all' behavior when the module
is removed.
- Multiple Fprobe Events Test:
Ensure that having multiple fprobes on the same function correctly
manages the ftrace hash map during removal"
* tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/ftrace: Add a testcase for multiple fprobe events
selftests/ftrace: Add a testcase for fprobe events on module
tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading
tracing/fprobe: Check the same type fprobe on table as the unregistered one
tracing/fprobe: Avoid kcalloc() in rcu_read_lock section
tracing/fprobe: Remove fprobe from hash in failure path
tracing/fprobe: Unregister fprobe even if memory allocation fails
tracing/fprobe: Reject registration of a registered fprobe before init
fixed an issue where poll->events and req->apoll_events weren't
synchronized, but then when the commit referenced in Fixes got added,
it didn't ensure the same thing.
If we mask in EPOLLONESHOT in the regular EPOLL_URING_WAKE path, then
ensure it's done for both. Including a link to the original report
below, even though it's mostly nonsense. But it includes a reproducer
that does show that IORING_CQE_F_MORE is set in the previous CQE,
while no more CQEs will be generated for this request. Just ignore
anything that pretends this is security related in any way, it's just
the typical AI nonsense.
(u16)0 - 1 promotes to (int)-1 which converts to SIZE_MAX as size_t,
causing a massive out-of-bounds write.
Reject ahslength == 0 with ISCSI_REASON_PROTOCOL_ERROR before the kmalloc.
Also reject ahslength values that exceed the actual AHS buffer advertised.
Fixes: 8f1f7d297bce ("scsi: target: iscsi: Add support for extended CDB AHS") Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://patch.msgid.link/20260415040728.187680-1-carlos.bilbao@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wang Shuaiwei [Tue, 14 Apr 2026 03:37:18 +0000 (11:37 +0800)]
scsi: ufs: core: Fix bRefClkFreq write failure in HS-LSS mode
According to the UFS spec, the bRefClkFreq attribute can only be written
when both sub-links are in LS-MODE. However, in HS LSS mode with
resetmode = HS_MODE, if the UFS device's default bRefClkFreq value
differs from the host controller's dev_ref_clk_freq setting, the write
operation will fail.
To fix this issue, introduce ufshcd_get_op_mode() function to detect the
current link operational mode. Call ufshcd_set_dev_ref_clk() only when
both sub-links are in LS-MODE to ensure the attribute can be written
successfully.
Merge tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel
Pull more drm updates from Dave Airlie:
"This is a followup which is mostly next material with some fixes.
Alex pointed out I missed one of his AMD MRs from last week, so I
added that, then Jani sent the pipe reordering stuff, otherwise it's
just some minor i915 fixes and a dma-buf fix.
drm:
- Add support for AMD VSDB parsing to drm_edid
dma-buf:
- fix documentation formatting
i915:
- add support for reordered pipes to support joined pipes better
- Fix VESA backlight possible check condition
- Verify the correct plane DDB entry
amdgpu:
- Audio regression fix
- Use drm edid parser for AMD VSDB
- Misc cleanups
- VCE cs parse fixes
- VCN cs parse fixes
- RAS fixes
- Clean up and unify vram reservation handling
- GPU Partition updates
- system_wq cleanups
- Add CONFIG_GCOV_PROFILE_AMDGPU kconfig option
- SMU vram copy updates
- SMU 13/14/15 fixes
- UserQ fixes
- Replace pasid idr with an xarray
- Dither handling fix
- Enable amdgpu by default for CIK APUs
- Add IBs to devcoredump
amdkfd:
- system_wq cleanups
radeon:
- system_wq cleanups"
* tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel: (62 commits)
drm/i915/display: change pipe allocation order for discrete platforms
drm/i915/wm: Verify the correct plane DDB entry
drm/i915/backlight: Fix VESA backlight possible check condition
drm/i915: Walk crtcs in pipe order
drm/i915/joiner: Make joiner "nomodeset" state copy independent of pipe order
dma-buf: fix htmldocs error for dma_buf_attach_revocable
drm/amdgpu: dump job ibs in the devcoredump
drm/amdgpu: store ib info for devcoredump
drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault
drm/amdgpu: Use amdgpu by default for CIK APUs too
drm/amd/display: Remove unused NUM_ELEMENTS macros
drm/amd/display: Replace inline NUM_ELEMENTS macro with ARRAY_SIZE
drm/amdgpu: save ring content before resetting the device
drm/amdgpu: make userq fence_drv drop explicit in queue destroy
drm/amdgpu: rework userq fence driver alloc/destroy
drm/amdgpu/userq: use dma_fence_wait_timeout without test for signalled
drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
drm/amdgpu/userq: add the return code too in error condition
drm/amdgpu/userq: fence wait for max time in amdgpu_userq_wait_for_signal
drm/amd/display: Change dither policy for 10 bpc output back to dithering
...
selftests/ftrace: Add a testcase for fprobe events on module
Add a testcase for fprobe events on module, which unloads a kernel
module on which fprobe events are probing and ensure the ftrace
hash map is cleared correctly.
tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading
Fix fprobe to unregister ftrace_ops if corresponding type of fprobe
does not exist on the fprobe_ip_table and it is expected to be empty
when unloading modules.
Since ftrace thinks that the empty hash means everything to be traced,
if we set fprobes only on the unloaded module, all functions are traced
unexpectedly after unloading module.
e.g.
Alex Markuze [Tue, 10 Feb 2026 09:06:26 +0000 (09:06 +0000)]
ceph: add subvolume metrics collection and reporting
Add complete infrastructure for per-subvolume I/O metrics collection
and reporting to the MDS. This enables administrators to monitor I/O
patterns at the subvolume granularity, which is useful for multi-tenant
CephFS deployments.
This patch adds:
- CEPHFS_FEATURE_SUBVOLUME_METRICS feature flag for MDS negotiation
- CEPH_SUBVOLUME_ID_NONE constant (0) for unknown/unset state
- Red-black tree based metrics tracker for efficient per-subvolume
aggregation with kmem_cache for entry allocations
- Wire format encoding matching the MDS C++ AggregatedIOMetrics struct
- Integration with the existing CLIENT_METRICS message
- Recording of I/O operations from file read/write and writeback paths
- Debugfs interfaces for monitoring (metrics/subvolumes, metrics/metric_features)
Metrics tracked per subvolume include:
- Read/write operation counts
- Read/write byte counts
- Read/write latency sums (for average calculation)
The metrics are periodically sent to the MDS as part of the existing
metrics reporting infrastructure when the MDS advertises support for
the SUBVOLUME_METRICS feature.
CEPH_SUBVOLUME_ID_NONE enforces subvolume_id immutability. Following
the FUSE client convention, 0 means unknown/unset. Once an inode has
a valid (non-zero) subvolume_id, it should not change during the
inode's lifetime.
Alex Markuze [Tue, 10 Feb 2026 09:06:25 +0000 (09:06 +0000)]
ceph: parse subvolume_id from InodeStat v9 and store in inode
Add support for parsing the subvolume_id field from InodeStat v9 and
storing it in the inode for later use by subvolume metrics tracking.
The subvolume_id identifies which CephFS subvolume an inode belongs to,
enabling per-subvolume I/O metrics collection and reporting.
This patch:
- Adds subvolume_id field to struct ceph_mds_reply_info_in
- Adds i_subvolume_id field to struct ceph_inode_info
- Parses subvolume_id from v9 InodeStat in parse_reply_info_in()
- Adds ceph_inode_set_subvolume() helper to propagate the ID to inodes
- Initializes i_subvolume_id in inode allocation and clears on destroy
Alex Markuze [Tue, 10 Feb 2026 09:06:24 +0000 (09:06 +0000)]
ceph: handle InodeStat v8 versioned field in reply parsing
Add forward-compatible handling for the new versioned field introduced
in InodeStat v8. This patch only skips the field without using it,
preparing for future protocol extensions.
The v8 encoding adds a versioned sub-structure that needs to be properly
decoded and skipped to maintain compatibility with newer MDS versions.
libceph: Fix slab-out-of-bounds access in auth message processing
If a (potentially corrupted) message of type CEPH_MSG_AUTH_REPLY
contains a positive value in its result field, it is treated as an
error code by ceph_handle_auth_reply() and returned to
handle_auth_reply(). Thereafter, an attempt is made to send the
preallocated message of type CEPH_MSG_AUTH, where the returned value is
interpreted as the size of the front segment to send. If the result
value in the message is greater than the size of the memory buffer
allocated for the front segment, an out-of-bounds access occurs, and
the content of the memory region beyond this buffer is sent out.
This patch fixes the issue by treating only negative values in the
result field as errors. Positive values are therefore treated as success
in the same way as a zero value. Additionally, a BUG_ON is added to
__send_prepared_auth_request() comparing the len parameter to
front_alloc_len to prevent sending the message if it exceeds the bounds
of the allocation and to make it easier to catch any logic flaws leading
to this.
rbd: fix null-ptr-deref when device_add_disk() fails
do_rbd_add() publishes the device with device_add() before calling
device_add_disk(). If device_add_disk() fails after device_add()
succeeds, the error path calls rbd_free_disk() directly and then later
falls through to rbd_dev_device_release(), which calls rbd_free_disk()
again. This double teardown can leave blk-mq cleanup operating on
invalid state and trigger a null-ptr-deref in
__blk_mq_free_map_and_rqs(), reached from blk_mq_free_tag_set().
Fix this by following the normal remove ordering: call device_del()
before rbd_dev_device_release() when device_add_disk() fails after
device_add(). That keeps the teardown sequence consistent and avoids
re-entering disk cleanup through the wrong path.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available.
We reproduced the bug on v7.0 with a real Ceph backend and a QEMU x86_64
guest booted with KASAN and CONFIG_FAILSLAB enabled. The reproducer
confines failslab injections to the __add_disk() range and injects
fail-nth while mapping an RBD image through
/sys/bus/rbd/add_single_major.
On the unpatched kernel, fail-nth=4 reliably triggered the fault:
Commit 41ebcc0907c5 ("crush: remove forcefeed functionality") from
May 7, 2012 (linux-next), leads to the following Smatch static
checker warning:
net/ceph/crush/mapper.c:1015 crush_do_rule()
warn: iterator 'j' not incremented
Before commit 41ebcc0907c5 ("crush: remove forcefeed functionality"),
we had this logic:
j = 0;
if (osize == 0 && force_pos >= 0) {
o[osize] = force_context[force_pos];
if (recurse_to_leaf)
c[osize] = force_context[0];
j++; /* <-- this was the only increment, now gone */
force_pos--;
}
/* then crush_choose_*(..., o+osize, j, ...) */
Now, the variable j is dead code — a variable that is set
and never meaningfully varied. This patch simply removes
the dead code.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Max Kellermann [Mon, 30 Mar 2026 08:43:19 +0000 (10:43 +0200)]
ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails
This MDS reconnect error path leaves s_cap_reconnect set.
send_mds_reconnect() sets the bit at the beginning of the reconnect,
but the first failing operation after that, ceph_pagelist_encode_32(),
can jump to `fail:` without clearing it.
__ceph_remove_cap() consults that flag to decide whether cap releases
should be queued. A reconnect-preparation failure therefore leaves the
session in reconnect mode from the cap-release path's point of view
and can strand release work until some later state transition repairs
it.
Max Kellermann [Fri, 27 Mar 2026 16:23:08 +0000 (17:23 +0100)]
ceph: only d_add() negative dentries when they are unhashed
Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.
In the current VFS that is not safe. d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket. If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:
This is reachable with reused cached negative dentries. A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":
- ceph_finish_lookup():
MDS reply is -ENOENT with no trace
-> d_add(dentry, NULL)
- ceph_lookup():
local ENOENT fast path for a complete directory with shared caps
-> d_add(dentry, NULL)
Both paths can therefore re-add an already-hashed negative dentry.
Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.
Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed. If the negative dentry
is already hashed, leave it in place and reuse it as-is.
This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.
kexinsun [Mon, 23 Feb 2026 13:15:07 +0000 (21:15 +0800)]
libceph: update outdated comment in ceph_sock_write_space()
The function try_write() was renamed to ceph_con_v1_try_write()
in commit 566050e17e53 ("libceph: separate msgr1 protocol
implementation") and subsequently moved to net/ceph/messenger_v1.c
in commit 2f713615ddd9 ("libceph: move msgr1 protocol implementation
to its own file"). Update the comment in ceph_sock_write_space()
accordingly.
[ idryomov: account for msgr2 in the updated comment as well ]
Since the call to crypto_shash_setkey() was replaced with
hmac_sha256_preparekey() which doesn't allocate memory regardless of the
alignment of the input key, remove the session key alignment logic from
process_auth_done(). Also remove the inclusion of crypto/hash.h, which
is no longer needed since crypto_shash is no longer used.
Sam Edwards [Wed, 18 Mar 2026 02:37:33 +0000 (19:37 -0700)]
ceph: fix num_ops off-by-one when crypto allocation fails
move_dirty_folio_in_page_array() may fail if the file is encrypted, the
dirty folio is not the first in the batch, and it fails to allocate a
bounce buffer to hold the ciphertext. When that happens,
ceph_process_folio_batch() simply redirties the folio and flushes the
current batch -- it can retry that folio in a future batch.
However, if this failed folio is not contiguous with the last folio that
did make it into the batch, then ceph_process_folio_batch() has already
incremented `ceph_wbc->num_ops`; because it doesn't follow through and
add the discontiguous folio to the array, ceph_submit_write() -- which
expects that `ceph_wbc->num_ops` accurately reflects the number of
contiguous ranges (and therefore the required number of "write extent"
ops) in the writeback -- will panic the kernel:
BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
This issue can be reproduced on affected kernels by writing to
fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat
pattern (total filesize should not matter) and gradually increasing the
system's memory pressure until a bounce buffer allocation fails.
Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct
value when move_dirty_folio_in_page_array() fails, but the folio already
started counting a new (i.e. still-empty) extent.
The defect corrected by this patch has existed since 2022 (see first
`Fixes:`), but another bug blocked multi-folio encrypted writeback until
recently (see second `Fixes:`). The second commit made it into 6.18.16,
6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch
therefore fixes a regression (panic) introduced by cac190c7674f.
Cc: stable@vger.kernel.org Fixes: d55207717ded ("ceph: add encryption support to writepage and writepages") Fixes: cac190c7674f ("ceph: fix write storm on fscrypted files") Signed-off-by: Sam Edwards <CFSworks@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Raphael Zimmer [Wed, 18 Mar 2026 17:09:03 +0000 (18:09 +0100)]
libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()
If a message of type CEPH_MSG_AUTH_REPLY contains a zero value for both
protocol and result, this is currently not treated as an error. In case
of ac->negotiating == true and ac->protocol > 0, this leads to setting
ac->protocol = 0 and ac->ops = NULL. Thereafter, the check for
ac->protocol != protocol returns false, and init_protocol() is not
called. Subsequently, ac->ops->handle_reply() is called, which leads to
a null pointer dereference, because ac->ops is still NULL.
This patch changes the check for ac->protocol != protocol to
!ac->protocol, as this also includes the case when the protocol was set
to zero in the message. This causes the message to be treated as
containing a bad auth protocol.
Dave Hansen [Tue, 21 Apr 2026 16:31:36 +0000 (09:31 -0700)]
x86/cpu: Disable FRED when PTI is forced on
FRED and PTI were never intended to work together. No FRED hardware is
vulnerable to Meltdown and all of it should have LASS anyway.
Nevertheless, if you boot a system with pti=on and fred=on, the kernel
tries to do what is asked of it and dies a horrible death on the first
attempt to run userspace (since it never switches to the user page
tables).
Disable FRED when PTI is forced on, and print a warning about it.
A quick brain dump about what a FRED+PTI implementation would look like
is below. I'm not sure it would make any sense to do it, but never say
never. All I know is that it's way too complicated to be worth it today.
<brain dump>
The SWITCH_TO_USER/KERNEL_CR3 bits are simple to fix (or at least we
have the assembly tools to do it already), as is sticking the FRED entry
text in .entry.text (it's not in there today).
The nasty part is the stacks. Today, the CPU pops into the kernel on
MSR_IA32_FRED_RSP0 which is normal old kernel memory and not mapped to
userspace. The hardware pushes gunk on to MSR_IA32_FRED_RSP0, which is
currently the task stacks. MSR_IA32_FRED_RSP0 would need to point
elsewhere, probably cpu_entry_stack(). Then, start playing games with
stacks on entry/exit, including copying gunk to and from the task stack.
While I'd *like* to have PTI everywhere, I'm not sure it's worth mucking
up the FRED code with PTI kludges. If a user wants fast entry/exit, they
use FRED. If you want PTI (and sekuritay), you certainly don't care
about fast entry and FRED isn't going to help you *all* that much, so
you can just stay with the IDT.
Plus, FRED hardware should have LASS which gives you a similar security
profile to PTI without the CR3 munging.
</brain dump>
Reported-by: Gayatri Kammela <Gayatri.Kammela@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Cc:stable@vger.kernel.org Link: https://patch.msgid.link/20260421163136.E7C6788A@davehans-spike.ostc.intel.com
Merge tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, the changes primarily focus on resolving race
conditions, memory safety issues (UAF), and improving the robustness
of garbage collection (GC), and folio management.
Enhancements:
- add page-order information for large folio reads in iostat
- add defrag_blocks sysfs node
Bug fixes:
- fix uninitialized kobject put in f2fs_init_sysfs()
- disallow setting an extension to both cold and hot
- fix node_cnt race between extent node destroy and writeback
- preserve previous reserve_{blocks,node} value when remount
- freeze GC and discard threads quickly
- fix false alarm of lockdep on cp_global_sem lock
- fix data loss caused by incorrect use of nat_entry flag
- skip empty sections in f2fs_get_victim
- fix inline data not being written to disk in writeback path
- fix fsck inconsistency caused by FGGC of node block
- fix fsck inconsistency caused by incorrect nat_entry flag usage
- call f2fs_handle_critical_error() to set cp_error flag
- fix fiemap boundary handling when read extent cache is incomplete
- fix use-after-free of sbi in f2fs_compress_write_end_io()
- fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io()
- fix incorrect file address mapping when inline inode is unwritten
- fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled
- avoid memory leak in f2fs_rename()"
* tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits)
f2fs: add page-order information for large folio reads in iostat
f2fs: do not support mmap write for large folio
f2fs: fix uninitialized kobject put in f2fs_init_sysfs()
f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()
f2fs: disallow setting an extension to both cold and hot
f2fs: fix node_cnt race between extent node destroy and writeback
f2fs: allow empty mount string for Opt_usr|grp|projjquota
f2fs: fix to preserve previous reserve_{blocks,node} value when remount
f2fs: invalidate block device page cache on umount
f2fs: fix to freeze GC and discard threads quickly
f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer
f2fs: fix false alarm of lockdep on cp_global_sem lock
f2fs: fix data loss caused by incorrect use of nat_entry flag
f2fs: fix to skip empty sections in f2fs_get_victim
f2fs: fix inline data not being written to disk in writeback path
f2fs: fix fsck inconsistency caused by FGGC of node block
f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally
f2fs: refactor node footer flag setting related code
f2fs: refactor f2fs_move_node_folio function
...
Merge tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax updates from Ira Weiny:
"The series adds DAX support required for the upcoming fuse/famfs file
system.[1] The support here is required because famfs is backed by
devdax rather than pmem. This all lays the groundwork for using shared
memory as a file system"
Link: https://lore.kernel.org/all/0100019d43e5f632-f5862a3e-361c-4b54-a9a6-96c242a8f17a-000000@email.amazonses.com/
* tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax/fsdev: fix uninitialized kaddr in fsdev_dax_zero_page_range()
dax: export dax_dev_get()
dax: Add fs_dax_get() func to prepare dax for fs-dax usage
dax: Add dax_set_ops() for setting dax_operations at bind time
dax: Add dax_operations for use by fs-dax on fsdev dax
dax: Save the kva from memremap
dax: add fsdev.c driver for fs-dax on character dax
dax: Factor out dax_folio_reset_order() helper
dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c
Timur Kristóf [Mon, 20 Apr 2026 23:55:04 +0000 (01:55 +0200)]
drm/amd/display: Disable 10-bit truncation and dithering on DCE 6.x
DCE 6.x doesn't support 10-bit truncation and 10-bit dithering
because the following fields are 1-bit only:
FMT_TEMPORAL_DITHER_DEPTH
FMT_SPATIAL_DITHER_DEPTH
FMT_TRUNCATE_DEPTH
Programming these fields to "2" will program them as if the
dithering option was 6-bit, resulting in sub-par picture
quality and an ugly "color banding" effect.
Note that a recent commit changed the default 10-bit dithering
option to DITHER_OPTION_SPATIAL10 which improves the picture
quality because it happens to look better, but is still not
actually supported by DCE 6.x versions.
When the color depth is 10-bit or more, just disable
any kind of dithering options on DCE 6.x.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5151 Fixes: 529cad0f945c ("drm/amd/display: Add function to set dither option") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6be8ced880dfe29ce38c2d5e74489822da5c250e)
Siwei He [Tue, 14 Apr 2026 18:46:54 +0000 (14:46 -0400)]
drm/amdgpu: OR init_pte_flags into invalid leaf PTE updates
Invalid leaf clears that only set AMDGPU_PTE_EXECUTABLE match the old
GMC9 fault-priority workaround but omit adev->gmc.init_pte_flags.
On GFX12 that includes AMDGPU_PTE_IS_PTE; without it, some cleared
PTEs can fault as no-retry and bypass the SVM/XNACK handler when a
VA is reused after a BO unmap.
Apply init_pte_flags in amdgpu_vm_pte_update_flags() alongside
EXECUTABLE so range-driven clears (e.g. amdgpu_vm_clear_freed) match
amdgpu_vm_pt_clear() for leaf templates.
Signed-off-by: Siwei He <siwei.he@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d47b2c36b9a6c6b844c33cab407a5d7ad102234)