Merge patch series "Exposing case folding behavior"
Chuck Lever <cel@kernel.org> says:
I'm attempting to implement enough support in the Linux VFS to
enable file services like NFSD and ksmbd (and user space
equivalents) to provide the actual status of case folding support
in local file systems. The default behavior for local file systems
not explicitly supported in this series is to reflect the usual
POSIX behaviors:
The case-insensitivity and case-nonpreserving booleans can be
consumed immediately by NFSD. These two attributes have been part of
the NFSv3 and NFSv4 protocols for decades, in order to support NFS
client implementations on non-POSIX systems.
Support for user space file servers is why this series exposes case
folding information via a user-space API. I don't know of any other
category of user-space application that requires access to case
folding info.
The Linux NFS community has a growing interest in supporting NFS
clients on Windows and MacOS platforms, where file name behavior does
not align with traditional POSIX semantics.
One example of a Windows-based NFS client is [1]. This client
implementation explicitly requires servers to report
FATTR4_WORD0_CASE_INSENSITIVE = TRUE for proper operation, a hard
requirement for Windows client interoperability because Windows
applications expect case-insensitive behavior. When an NFS client
knows the server is case-insensitive, it can avoid issuing multiple
LOOKUP/READDIR requests to search for case variants, and applications
like Win32 programs work correctly without manual workarounds or
code changes.
Even the Linux client can take advantage of this information. Trond
merged patches 4 years ago [2] that introduce support for case
insensitivity, in support of the Hammerspace NFS server. In
particular, when a client detects a case-insensitive NFS share,
negative dentry caching must be disabled (a lookup for "FILE.TXT"
failing shouldn't cache a negative entry when "file.txt" exists)
and directory change invalidation must clear all cached case-folded
file name variants.
Hammerspace servers and several other NFS server implementations
operate in multi-protocol environments, where a single file service
instance caters to both NFS and SMB clients. In those cases, things
work more smoothly for everyone when the NFS client can see and adapt
to the case folding behavior that SMB users rely on and expect. NFSD
needs to support the case-insensitivity and case-nonpreserving
booleans properly in order to participate as a first-class citizen
in such environments.
* patches from https://patch.msgid.link/20260507-case-sensitivity-v14-0-e62cc8200435@oracle.com:
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
nfsd: Report export case-folding via NFSv3 PATHCONF
isofs: Implement fileattr_get for case sensitivity
vboxsf: Implement fileattr_get for case sensitivity
nfs: Implement fileattr_get for case sensitivity
cifs: Implement fileattr_get for case sensitivity
xfs: Report case sensitivity in fileattr_get
hfsplus: Report case sensitivity in fileattr_get
hfs: Implement fileattr_get for case sensitivity
ntfs3: Implement fileattr_get for case sensitivity
exfat: Implement fileattr_get for case sensitivity
fat: Implement fileattr_get for case sensitivity
fs: Add case sensitivity flags to file_kattr
fs: Move file_kattr initialization to callers
Chuck Lever [Thu, 7 May 2026 08:53:08 +0000 (04:53 -0400)]
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
FS_ATTRIBUTE_INFORMATION responses have always reported
FILE_CASE_SENSITIVE_SEARCH and FILE_CASE_PRESERVED_NAMES
unconditionally. Case-insensitive filesystems like exFAT, and
casefolded directories on ext4 or f2fs, have no way to signal
their actual semantics to SMB clients.
Now that filesystems expose case behavior through ->fileattr_get,
query it via vfs_fileattr_get() and translate the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags into the corresponding SMB
attributes. Filesystems without ->fileattr_get continue reporting
default POSIX behavior (case-sensitive, case-preserving).
SMB's FS_ATTRIBUTE_INFORMATION reports per-share attributes from
the share root, not per-file. Shares mixing casefold and
non-casefold directories report the root directory's behavior.
Chuck Lever [Thu, 7 May 2026 08:53:07 +0000 (04:53 -0400)]
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
NFSD currently provides NFSv4 clients with hard-coded responses
indicating all exported filesystems are case-sensitive and
case-preserving. This is incorrect for case-insensitive filesystems
and ext4 directories with casefold enabled.
Query the underlying filesystem's actual case sensitivity via
nfsd_get_case_info() and return accurate values to clients. This
supports per-directory settings for filesystems that allow mixing
case-sensitive and case-insensitive directories within an export.
The helper queries the parent dentry for non-directory filehandles
because case-folding is a per-directory property. That resolution
has the same corner cases here as for NFSv3 PATHCONF: single-file
exports query an unexported parent, disconnected dentries report
defaults until reconnected, and hardlinked files track whichever
alias the dcache currently holds.
Chuck Lever [Thu, 7 May 2026 08:53:06 +0000 (04:53 -0400)]
nfsd: Report export case-folding via NFSv3 PATHCONF
The hard-coded MSDOS_SUPER_MAGIC check in nfsd3_proc_pathconf()
only recognizes FAT filesystems as case-insensitive. Modern
filesystems like F2FS, exFAT, and CIFS support case-insensitive
directories, but NFSv3 clients cannot discover this capability.
Query the export's actual case behavior through ->fileattr_get
instead. This allows NFSv3 clients to correctly handle case
sensitivity for any filesystem that implements the fileattr
interface. Filesystems without ->fileattr_get continue to report
the default POSIX behavior (case-sensitive, case-preserving).
This change depends on the earlier "fat: Implement fileattr_get
for case sensitivity" patch in this series, which ensures FAT
filesystems report their case behavior correctly via the
fileattr interface.
Case-folding is a per-directory property, so
nfsd_get_case_info() queries the parent dentry for
non-directory filehandles. Three inherent corner cases follow:
a single-file export's parent lies outside the exported
subtree, so the LSM hook evaluates against an unexported
directory; a disconnected dentry from fh_verify() has
d_parent == itself, so the file's own attributes are reported
until the dentry connects; and a hardlinked file resolves
through the alias the dcache currently holds, so when the
inode is linked into both case-folded and case-sensitive
directories the reported value tracks whichever parent is
active. These limitations are not addressable without
redefining the protocol attribute as per-parent rather than
per-object.
RFC 1813 restricts PATHCONF errors to NFS3ERR_STALE,
NFS3ERR_BADHANDLE, and NFS3ERR_SERVERFAULT. When an LSM hook
denies the case-folding query on the parent, NFS3ERR_STALE is
the only correct mapping: NFS3ERR_SERVERFAULT misrepresents a
working server as broken, and NFS3ERR_BADHANDLE implies a
decoding failure that did not occur. A client purging the
filehandle on receipt is the desired outcome, since the server
has refused to read attributes through it. Substituting POSIX
defaults instead would let the same handle report
casefold=false now and casefold=true once policy permits,
opening a silent name-collision window on case-insensitive
exports.
Chuck Lever [Thu, 7 May 2026 08:53:05 +0000 (04:53 -0400)]
isofs: Implement fileattr_get for case sensitivity
Upper layers such as NFSD need a way to query whether a
filesystem handles filenames in a case-sensitive manner so
they can provide correct semantics to remote clients. Without
this information, NFS exports of ISO 9660 filesystems cannot
advertise their filename case behavior.
Implement isofs_fileattr_get() to report ISO 9660 case handling
behavior. The 'check=r' (relaxed) mount option enables
case-insensitive lookups and is reported via FS_XFLAG_CASEFOLD.
By default, Joliet extensions operate in relaxed mode while
plain ISO 9660 uses strict (case-sensitive) mode.
Plain ISO 9660 names on the medium are uppercase. When neither
Rock Ridge nor Joliet is in effect, the default 'map=n' option
(and 'map=a') routes lookup and readdir through
isofs_name_translate(), which forces A-Z to a-z. The names
visible to userspace then differ in case from the on-disc form,
so report FS_XFLAG_CASENONPRESERVING in that configuration. Rock
Ridge and Joliet both deliver names as authored, and 'map=o'
emits the raw on-disc name unchanged, so those configurations
remain case-preserving.
Casefolding is a directory property, and the in-tree consumers
(NFSD, ksmbd) issue the query against a directory: NFSD walks
to the parent for non-directory dentries before calling
vfs_fileattr_get(), and ksmbd reports per-share attributes from
the share root. Wire .fileattr_get only on
isofs_dir_inode_operations. The CASEFOLD flag is set in both
fa->fsx_xflags and fa->flags so FS_IOC_FSGETXATTR and
FS_IOC_GETFLAGS agree.
Chuck Lever [Thu, 7 May 2026 08:53:04 +0000 (04:53 -0400)]
vboxsf: Implement fileattr_get for case sensitivity
Upper layers such as NFSD need a way to query whether a
filesystem handles filenames in a case-sensitive manner. Report
VirtualBox shared folder case handling behavior via the
FS_XFLAG_CASEFOLD flag.
The case sensitivity property is queried from the VirtualBox host
service at mount time and cached in struct vboxsf_sbi. The host
determines case sensitivity based on the underlying host filesystem
(for example, Windows NTFS is case-insensitive while Linux ext4 is
case-sensitive).
VirtualBox shared folders always preserve filename case exactly
as provided by the guest. The host interface does not expose a
separate case-preserving property; leaving
FS_XFLAG_CASENONPRESERVING unset reports the POSIX-default
case-preserving behavior, which matches vboxsf semantics.
The callback is registered in all three inode_operations
structures (directory, file, and symlink) to ensure consistent
reporting across all inode types.
Chuck Lever [Thu, 7 May 2026 08:53:03 +0000 (04:53 -0400)]
nfs: Implement fileattr_get for case sensitivity
An NFS server re-exporting an NFS mount point needs to report
the case sensitivity behavior of the underlying filesystem to
its clients. NFSD's attribute encoder obtains that information
by calling vfs_fileattr_get() on the lower filesystem, so the
NFS client must implement fileattr_get to surface what it
learned from its own server.
The NFS client already retrieves case sensitivity information
from servers during mount via PATHCONF (NFSv3) or the
FATTR4_CASE_INSENSITIVE/FATTR4_CASE_PRESERVING attributes
(NFSv4). Expose this information through fileattr_get by
reporting the FS_XFLAG_CASEFOLD and FS_XFLAG_CASENONPRESERVING
flags. NFSv2 lacks PATHCONF support, so mounts using that protocol
version default to standard POSIX behavior: case-sensitive and
case-preserving.
PATHCONF is now invoked unconditionally for NFSv2 and NFSv3 mounts
so the case-sensitivity capabilities are established even when the
user pins server->namelen with the namlen= mount option. That option
is orthogonal to case handling, and skipping PATHCONF because
namelen was already known would leave the caps unset.
The two capability bits carry opposite polarity because their POSIX
defaults differ. Most servers are case-sensitive and case-
preserving, matching "neither xflag set." NFS_CAP_CASE_INSENSITIVE
is set only when the server affirms case insensitivity, so "server
said no" and "server did not answer" both collapse to the case-
sensitive default. NFS_CAP_CASE_NONPRESERVING follows the same
pattern in the opposite direction: set only when the server affirms
that it does not preserve case, so that silence or a missing
attribute lands on the case-preserving default. The NFSv4 probe
checks res.attr_bitmask[0] to distinguish "server said false" from
"server omitted the attribute" before setting the bit.
Both capability bits are cleared before each probe so a remount,
an NFSv4 transparent state migration to a server with different
case semantics, or a probe whose reply does not arrive does not
retain stale capabilities from the prior probe.
Chuck Lever [Thu, 7 May 2026 08:53:02 +0000 (04:53 -0400)]
cifs: Implement fileattr_get for case sensitivity
Upper layers such as NFSD need a way to query whether a filesystem
handles filenames in a case-sensitive manner. Report CIFS/SMB case
handling behavior via FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING.
The authoritative source is the server itself: at mount time CIFS
issues QueryFSInfo(FS_ATTRIBUTE_INFORMATION) and caches the reply
on the tcon. That reply carries FILE_CASE_SENSITIVE_SEARCH and
FILE_CASE_PRESERVED_NAMES, which reflect whatever case handling
the share actually implements after SMB3.1.1 POSIX extensions
negotiation. Translating those two bits into the VFS flags lets
cifs_fileattr_get report what the server advertises rather than
what the client was asked to pretend.
QueryFSInfo is best-effort; the mount completes even if the server
does not answer. MaxPathNameComponentLength is zero in that case
and is used as the "no reply received" sentinel. When no reply is
available, fall back to the nocase mount option so that the reported
behavior agrees with the dentry comparison operations installed on
the superblock.
The callback is registered on cifs_dir_inode_ops so that NFSD,
ksmbd, and other consumers querying case handling against a
directory get a definitive answer, and on cifs_file_inode_ops to
preserve FS_COMPR_FL reporting on regular files. cifs_set_ops()
also installs cifs_namespace_inode_operations on DFS referral
directories that carry IS_AUTOMOUNT; register the same callback
there so the answer does not depend on whether the directory is
a referral point.
Registering fileattr_get routes FS_IOC_GETFLAGS through
vfs_fileattr_get() and short-circuits the syscall's fallback to
cifs_ioctl(). That fallback invoked CIFSGetExtAttr() under
CONFIG_CIFS_POSIX and CONFIG_CIFS_ALLOW_INSECURE_LEGACY on servers
advertising CIFS_UNIX_EXTATTR_CAP, surfacing the SMB1 Unix-extension
immutable, append, and nodump bits. cifs_fileattr_get carries over
only FS_COMPR_FL from cached cifsAttrs; the SMB1 extattr fetch is
not reproduced. SMB1 is deprecated, and acquiring a netfid from
within a dentry-only callback is not worth preserving a path tied
to an insecure legacy dialect.
Chuck Lever [Thu, 7 May 2026 08:53:01 +0000 (04:53 -0400)]
xfs: Report case sensitivity in fileattr_get
Upper layers such as NFSD need to query whether a filesystem
is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
when the filesystem is formatted with the ASCIICI feature
flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr()
in xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates
bs_xflags directly from xfs_ip2xflags()), so bulkstat consumers
and per-inode queries see a consistent view of the filesystem's
case-folding behavior.
FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures
FS_IOC_FSSETXATTR strips it, and xfs_flags2diflags() has no
clause for CASEFOLD so the on-disk diflags are unaffected.
The legacy FS_IOC_SETFLAGS path in xfs_fileattr_set() also
allows FS_CASEFOLD_FL through its allowlist on ASCIICI
filesystems so that a chattr read-modify-write cycle does
not fail with EOPNOTSUPP.
XFS always preserves case. XFS is case-sensitive by default,
but supports ASCII case-insensitive lookups when formatted
with the ASCIICI feature flag.
Chuck Lever [Thu, 7 May 2026 08:53:00 +0000 (04:53 -0400)]
hfsplus: Report case sensitivity in fileattr_get
Add case sensitivity reporting to the existing hfsplus_fileattr_get()
function via the FS_XFLAG_CASEFOLD flag. HFS+ always preserves case
at rest.
Case sensitivity depends on how the volume was formatted: HFSX
volumes may be either case-sensitive or case-insensitive, indicated
by the HFSPLUS_SB_CASEFOLD superblock flag.
FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures
FS_IOC_FSSETXATTR strips it. The legacy FS_IOC_SETFLAGS path in
hfsplus_fileattr_set() also allows FS_CASEFOLD_FL through its
allowlist on case-insensitive volumes so that a chattr
read-modify-write cycle does not fail with EOPNOTSUPP.
Chuck Lever [Thu, 7 May 2026 08:52:59 +0000 (04:52 -0400)]
hfs: Implement fileattr_get for case sensitivity
Report HFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. HFS is always case-insensitive (using Mac OS Roman case
folding) and always preserves case at rest.
Chuck Lever [Thu, 7 May 2026 08:52:57 +0000 (04:52 -0400)]
exfat: Implement fileattr_get for case sensitivity
Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. exFAT compares names through the volume's upcase table; in
practice that table folds case, and case is preserved at rest.
Chuck Lever [Thu, 7 May 2026 08:52:56 +0000 (04:52 -0400)]
fat: Implement fileattr_get for case sensitivity
Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
case-insensitive by default.
MSDOS supports a 'nocase' mount option that enables case-sensitive
behavior; check this option when reporting case sensitivity.
VFAT long filename entries preserve case; without VFAT, only
uppercased 8.3 short names are stored. MSDOS with 'nocase' also
preserves case since the name-formatting code skips upcasing when
'nocase' is set. Check both options when reporting case preservation.
Chuck Lever [Thu, 7 May 2026 08:52:55 +0000 (04:52 -0400)]
fs: Add case sensitivity flags to file_kattr
Enable upper layers such as NFSD to retrieve case sensitivity
information from file systems by adding FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING flags.
Filesystems report case-insensitive or case-nonpreserving behavior
by setting these flags directly in fa->fsx_xflags. The default
(flags unset) indicates POSIX semantics: case-sensitive and
case-preserving. Both flags are added to FS_XFLAG_RDONLY_MASK so
FS_IOC_FSSETXATTR silently strips them, keeping the new xflags
strictly a reporting interface. Callers that want to toggle
casefolding continue to use FS_IOC_SETFLAGS with FS_CASEFOLD_FL,
the established UAPI on filesystems that support the operation
(ext4 and f2fs on empty directories).
Case sensitivity information is exported to userspace via the
fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
system call.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260507-case-sensitivity-v14-2-e62cc8200435@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Chuck Lever [Thu, 7 May 2026 08:52:54 +0000 (04:52 -0400)]
fs: Move file_kattr initialization to callers
fileattr_fill_xflags() and fileattr_fill_flags() memset the
entire file_kattr struct before populating select fields, so
callers cannot pre-set fields in fa->fsx_xflags without having
their values clobbered. Darrick Wong noted that a function
named "fill_xflags" touching more than xflags forces callers
to know implementation details beyond its apparent scope.
Drop the memset from both fill functions and initialize at the
entry points instead: ioctl_setflags(), ioctl_fssetxattr(),
the file_setattr() syscall, and xfs_ioc_fsgetxattra() now
declare fa with an aggregate initializer. ioctl_getflags(),
ioctl_fsgetxattr(), and the file_getattr() syscall already
aggregate-initialize fa to pass flags_valid/fsx_valid hints
into vfs_fileattr_get().
Subsequent patches rely on this so that ->fileattr_get()
handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD,
FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill
functions run.
Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260507-case-sensitivity-v14-1-e62cc8200435@oracle.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
Kamal Dasu [Thu, 23 Apr 2026 19:18:55 +0000 (15:18 -0400)]
mmc: core: Fix host controller programming for fixed driver type
When using the fixed-emmc-driver-type device tree property, the MMC core
correctly selects the driver strength for the card but fails to program
the host controller accordingly. This causes a mismatch where the card
uses the specified driver type while the host controller defaults to
Type B (since ios->drv_type remains zero).
Split the driver type programming logic to handle both fixed and dynamic
driver type selection paths. For fixed driver types, program the host
controller with the selected drive_strength value. For dynamic selection,
use the existing drv_type as before.
This ensures both the eMMC device and host controller use matching driver
strengths, preventing potential signal integrity issues.
Fixes: 6186d06c519e ("mmc: parse new binding for eMMC fixed driver type") Signed-off-by: Kamal Dasu <kamal.dasu@broadcom.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Problem determination using s390dbf logging sometimes requires changing
the default logging level or log area size. While this is possible
using sysfs interfaces, there is no easy way to adjust these parameters
for early boot code that emits logs before userspace is available.
Add an s390dbf kernel parameter to address this shortcoming. The
parameter can be used to specify log level and area size (in units of
pages). A level of '-' turns logging off for an area. Logs can be
identified by name or a shell-style pattern.
Specified parameters are applied immediately during debug area
registration for regular log areas. For early, static debug areas,
log levels are changed during early_param() parsing, while size
changes are applied at arch_initcall-time.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Notify userspace about two important events on AP queues
when run within Secure Execution (SE) environment:
- Send AP CHANGE uevent with "SE_BIND=1" on successful bind
operation on this AP queue device.
- Send AP CHANGE uevent with "SE_ASSOC=<association_index>"
on successful association operation with the secret of the
reported index on this AP queue device.
Note there is no SE unbind/unassociate event. Unbind/unassociate
can have different triggers and technically there is no signaling
done which the AP code could catch. A user space application can,
if this information is crucial, query the sysfs attribute se_bind
on the AP queue which runs a synchronous TAPQ. If the attribute
returns with "unbound" a reset took place and SE bind and associate
states are unbound and unassociated.
Suggested-by: Marc Hartmayer mhartmay@linux.ibm.com Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
DaeMyung Kang [Sun, 10 May 2026 17:11:14 +0000 (02:11 +0900)]
ntfs: restore $MFT mirror contents check
check_mft_mirror() still computes the number of bytes to validate in each
mirrored MFT record, but the actual comparison against $MFTMirr was dropped
when the superblock code was updated.
As a result, mount misses a stale or inconsistent $MFTMirr as long as both
records pass the structural baad-record checks. Restore the comparison and
log an error when the primary $MFT record differs from its mirror copy.
Returning false lets the existing mount error handling mark the volume as
having NTFS errors and, with on_errors=remount-ro, continue read-only. The
default on_errors=continue mount policy still allows the mount to proceed.
Fixes: 6251f0b0de7d ("ntfs: update super block operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Jiayuan Chen [Mon, 30 Mar 2026 07:32:29 +0000 (15:32 +0800)]
irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT
On PREEMPT_RT, non-HARD irq_work runs in per-CPU kthreads via
run_irq_workd(), so irq_work_sync() uses rcuwait() to wait for BUSY==0.
After irq_work_single() clears BUSY via atomic_cmpxchg(), it still
dereferences @work for irq_work_is_hard() and rcuwait_wake_up().
An irq_work_sync() caller on another CPU that enters after BUSY is cleared
can observe BUSY==0 immediately, return, and free the work before those
accesses complete — causing a use-after-free.
Fix this by wrapping run_irq_workd() in guard(rcu)() so that the entire
irq_work_single() execution is within an RCU read-side critical
section. Then add synchronize_rcu() in irq_work_sync() after
rcuwait_wait_event() to ensure the caller waits for the RCU grace period
before returning, preventing premature frees.
Fixes: 810979682ccc ("irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20260330073234.303732-1-jiayuan.chen@linux.dev
Re-add GFP_DMA when allocating memory for CHSC control blocks.
On some supported machines, CHSC cannot access memory outside
the DMA zone, causing CHSC command failures.
Cc: stable@vger.kernel.org Fixes: a3a64a4def8d ("s390/cio: remove unneeded DMA zone allocation") Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
mmc: dw_mmc: exynos: increase DMA threshold value for exynos7870
Exynos 7870 compatible controllers, such as SDIO ones are not able to
perform DMA transfers for small sizes of data (~16 to ~512 bytes),
resulting in cache issues in subsequent transfers. Increase the DMA
transfer threshold to 512 to allow the shorter transfers to take place,
bypassing DMA.
mmc: dw_mmc: implement option for configuring DMA threshold
Some controllers, such as certain Exynos SDIO ones, are unable to
perform DMA transfers of small amount of bytes properly. Following the
device tree schema, implement the property to define the DMA transfer
threshold (from a hard coded value of 16 bytes) so that lesser number of
bytes can be transferred safely skipping DMA in such controllers. The
value of 16 bytes stays as the default for controllers which do not
define it. This value can be overridden by implementation-specific init
sequences.
Paolo Pisati [Fri, 8 May 2026 07:09:56 +0000 (09:09 +0200)]
platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8407AA
Use the existing zenbook duo keyboard quirk for the UX8407AA model too.
Signed-off-by: Paolo Pisati <p.pisati@gmail.com> Reviewed-by: Denis Benato <denis.benato@linux.dev> Link: https://patch.msgid.link/20260508070956.62201-1-p.pisati@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Harshal Dev [Thu, 16 Apr 2026 11:59:19 +0000 (17:29 +0530)]
soc: qcom: ice: Allow explicit votes on 'iface' clock for ICE
Since Qualcomm inline-crypto engine (ICE) is now a dedicated driver
de-coupled from the QCOM UFS driver, it explicitly votes for its required
clocks during probe. For scenarios where the 'clk_ignore_unused' flag is
not passed on the kernel command line, to avoid potential unclocked ICE
hardware register access during probe the ICE driver should additionally
vote on the 'iface' clock.
Also update the suspend and resume callbacks to handle un-voting and voting
on the 'iface' clock.
Fixes: 2afbf43a4aec6 ("soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver") Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Kuldeep Singh <kuldeep.singh@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260416-qcom_ice_power_and_clk_vote-v5-2-5ccf5d7e2846@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
The watchdog on the Apple silicon t8122 (M3) SoC is compatible with the
existing driver. Add "apple,t8122-wdt" as SoC specific compatible under
"apple,t8103-wdt" used by the driver.
Ahmed S. Darwish [Fri, 27 Mar 2026 02:15:23 +0000 (03:15 +0100)]
x86/cpuid: Introduce a centralized CPUID parser
Introduce a CPUID parser for populating the system's CPUID tables.
Since accessing a leaf within the CPUID table requires compile time
tokenization, split the parser into two stages:
(a) Compile-time macros for tokenizing the leaf/subleaf offsets within
the CPUID table.
(b) Generic runtime code to fill the CPUID data, using a parsing table
which collects these compile-time offsets.
For actual CPUID output parsing, support both generic and leaf-specific
read functions.
To ensure CPUID data early availability, invoke the parser during early
boot, early Xen boot, and at early secondary CPUs bring up.
Provide call site APIs to refresh a single leaf, or a leaf range, within
the CPUID tables. This is for sites issuing MSR writes that partially
change the CPU's CPUID layout. Doing full CPUID table rescans in such
cases will be destructive since the CPUID tables will host all of the
kernel's X86_FEATURE flags at a later stage.
Harshal Dev [Thu, 16 Apr 2026 11:59:18 +0000 (17:29 +0530)]
dt-bindings: crypto: qcom,ice: Fix missing power-domain and iface clk
The DT bindings for inline-crypto engine do not specify the UFS_PHY_GDSC
power-domain and iface clock. Without enabling the iface clock and the
associated power-domain the ICE hardware cannot function correctly and
leads to unclocked hardware accesses being observed during probe.
Fix the DT bindings for inline-crypto engine to require the UFS_PHY_GDSC
power-domain and iface clock for new devices (Eliza and Milos) introduced
in the current release (7.1) with yet-to-stabilize ABI, while preserving
backward compatibility for older devices.
Fixes: 618195a7ac3df ("dt-bindings: crypto: qcom,inline-crypto-engine: Document the Eliza ICE") Fixes: 85faec1e85555 ("dt-bindings: crypto: qcom,inline-crypto-engine: document the Milos ICE") Reviewed-by: Kuldeep Singh <kuldeep.singh@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260416-qcom_ice_power_and_clk_vote-v5-1-5ccf5d7e2846@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Janne Grunau [Wed, 31 Dec 2025 12:07:21 +0000 (13:07 +0100)]
watchdog: apple: Add "apple,t8103-wdt" compatible
After discussion with the devicetree maintainers we agreed to not extend
lists with the generic compatible "apple,wdt" anymore [1]. Use
"apple,t8103-wdt" as base compatible as it is the SoC the driver and
bindings were written for.
Thomas Richter [Tue, 5 May 2026 10:34:33 +0000 (12:34 +0200)]
s390/pai: Fix missing PAI counter increments under heavy load
Machines with a larger number of CPUs and under heavy load sometimes
loose PAI counter increments during recording using events
-e CRYPTO_ÂLL or -e NNPA_ALL. Counting is not affected.
This happens when several PAI crypto counters are incremented during
the same cryptographic operation.
During schedule out the functions
paiXXX_sched_task() (with XXX either crypt or ext)
+--> pai_have_samples()
+--> pai_have_sample()
+--> pai_copy()
+--> pai_push_sample()
are called to read out PAI counter values.
In pai_copy() the current values of PAI counters are read from the
PMU memory mapped page and compared to the values read during last
schedule out operation, which have been saved in a backup page
named PAI_SAVE_AREA(event). For each PAI counter a delta is calculated
and when the delta is positive, that PAI counter was incremented by
hardware. This positve delta is reported as raw data record attached
to a sample.
After all deltas have been calculated, the new PAI counter values
are saved in the backup page PAI_SAVE_AREA(event). However this is
done in pai_push_sample(), leaving a small window for missing hardware
triggered updates. Here is one scenario:
PAI counter idx: 0 1 2 3 4 5 6 7 .... N
+---+---+---+---+---+---+---+---+ +---+
PAI counter page:| | | X | | | | | |....| Y |
+---+---+---+---+---+---+---+---+ +---+
In pai_copy() each PAI counter value is read and compared
to its old value. This is done in a loop. When PAI counter indexed
N is read, the hardware might increment PAI counter indexed 2 again,
updating its value from X to X+1.
Later pai_push_sample() simply mem-copies the complete PAI counter
page to a backup page and the increment of X+1 is lost, because the
backup page now contains the new value.
Read each PAI counter and save this value in the backup page when
there is a positive delta. This omits any time window between read
and store. This also reduced the work load as only modified PAI
counters are saved.
Cc: stable@vger.kernel.org Fixes: fe861b0c8d06 ("s390/pai: save PAI counter value page in event structure") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Zhihao Cheng [Thu, 7 May 2026 11:23:01 +0000 (19:23 +0800)]
nsfs: fix wrong error code returned for pidns ioctls
When executing NS_GET_PID_FROM_PIDNS (or similar pidns ioctls), if the
target task cannot be found in the corresponding pid_ns, the error code
should be ESRCH instead of ENOTTY.
This bug was introduced when the extensible ioctl handling was added.
Without proper return, ret would be overwritten by the default case in
the extensible ioctl switch statement.
Fixes: a1d220d9dafa8 ("nsfs: iterate through mount namespaces") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://patch.msgid.link/20260507112301.1042757-1-chengzhihao1@huawei.com Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Ming Lei [Sun, 10 May 2026 14:48:43 +0000 (22:48 +0800)]
ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation
blk_validate_limits() requires max_hw_sectors >= PAGE_SECTORS and fires
a WARN_ON_ONCE if this invariant is violated. ublk_validate_params()
only checked the upper bound of max_sectors against max_io_buf_bytes,
allowing userspace to pass small values (including zero) that trigger
the warning when blk_mq_alloc_disk() is called from
ublk_ctrl_start_dev().
Before 494ea040bcb5, ublk used blk_queue_max_hw_sectors() which silently
clamped small values up to PAGE_SECTORS. The conversion to passing
queue_limits directly to blk_mq_alloc_disk() lost that clamping and now
hits blk_validate_limits()'s WARN_ON_ONCE instead.
Validate that max_sectors is at least PAGE_SECTORS in
ublk_validate_params() so invalid values are rejected early with
-EINVAL instead of reaching the block layer.
Maoyi Xie [Sun, 10 May 2026 08:41:19 +0000 (16:41 +0800)]
io_uring/fdinfo: translate SqThread PID through caller's pid_ns
SQPOLL stores current->pid (init_pid_ns view) in sqd->task_pid
at thread creation. fdinfo prints it raw via
seq_printf("SqThread:\t%d\n", sq_pid). A reader inside a
non-initial pid_ns sees the host PID, not the kthread's PID in
the reader's own pid_ns.
The SQPOLL kthread is created with CLONE_THREAD and no
CLONE_NEW*, so it lives in the submitter's pid_ns. An
unprivileged user_ns + pid_ns submitter can read fdinfo and
learn the host PID of a kthread whose in-namespace PID is
different.
Reproducer (mainline 7.0, KASAN): unshare CLONE_NEWUSER |
CLONE_NEWPID | CLONE_NEWNS, mount a private /proc, then have a
grandchild that is pid 1 in the new pid_ns open an io_uring
ring with IORING_SETUP_SQPOLL. /proc/self/task lists {1, 2};
the SQPOLL kthread is pid 2. Before: fdinfo prints
SqThread = <host pid>. After: SqThread = 2.
Use task_pid_nr_ns() against the proc inode's pid_ns to compute
sq_pid, instead of reading the stored sq->task_pid (which holds
the init_pid_ns view). pidfd_show_fdinfo() in kernel/pid.c
follows the same pattern.
Chi Zhiling [Mon, 11 May 2026 09:40:07 +0000 (17:40 +0800)]
iomap: add dirty page control to iomap_zero_iter
This patch prepares the iomap framework for exFAT's upcoming migration to
iomap. During testing of the exFAT iomap branch with xfstests generic/299 on
a VM with 8GB RAM and a 40GB disk, system unresponsiveness was observed.
iomap_zero_iter() lacked dirty page throttling, which could cause memory
pressure when exFAT's valid_size mechanism triggers large-scale zeroing
operations during writes beyond valid_size.
Align iomap_zero_iter() with iomap_write_iter() by adding
balance_dirty_pages_ratelimited() to throttle dirty page generation during
large zeroing operations
When iomap_iter() finishes its iteration (returns <= 0), it is no longer
necessary to memset the entire iomap and srcmap structures.
In high-IOPS scenarios (like 4k randread NVMe polling with io_uring),
where the majority of I/Os complete in a single extent map, this wasted
memory write bandwidth, as the caller will just discard the iterator.
Use this command to test:
taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
-n1 -P1 /mnt/testfile
IOPS improve about 5% on ext4 and XFS.
However, we MUST still call iomap_iter_reset_iomap() to release the
folio_batch if IOMAP_F_FOLIO_BATCH is set, otherwise we leak page
references. Therefore, split the cleanup logic: always release the
folio_batch, but skip the memset() when ret <= 0.
Pankaj Raghav [Mon, 11 May 2026 11:19:18 +0000 (13:19 +0200)]
fs: fix forced iversion increment on lazytime timestamp updates
When updating timestamps with lazytime enabled, if only I_DIRTY_TIME is
set (pure lazytime update), inode_maybe_inc_iversion() should not be
forced to increment i_version. The force parameter should only be true
when actual data or metadata changes require an iversion bump.
The current code uses "!!dirty" which evaluates to true whenever dirty
has any bits set, including the I_DIRTY_TIME bit alone. This forces an
iversion increment on every lazytime timestamp update, which then sets
I_DIRTY_SYNC, triggering expensive log flushes on subsequent fdatasync
calls. Andres reported this issue when he noticed a perf regression[1].
Fix this by using "dirty != I_DIRTY_TIME" as the force parameter. This
passes false for pure lazytime updates (allowing the I_VERSION_QUERIED
optimization to work), while still forcing the increment when dirty
contains other flags indicating real changes that require iversion
updates.
irqchip/econet-en751221: Support MIPS 34Kc VEIC mode
The Vectored External Interrupt Controller mode present in the MIPS 34Kc
and 1004Kc variants causes the CPU to stop dispatching interrupts by the
normal code path and instead it sends those interrupts to the external
interrupt controller to be prioritized, renumbered, and sent back. When
they come back, they are handled through a different path using a dispatch
table, so plat_irq_dispatch() never sees action.
This of course subverts the traditional intc hierarchy, and on the 1004Kc
the interrupt controller is standardized (IRQ_GIC) so it can be reasonably
considered part of the CPU itself - and tighter coupling between IRQ_GIC
and arch/mips/* is tolerable. However on the 34Kc the intc is defined by
each SoC vendor, so it's required to have a modular driver - but for a
device which in fact ends up taking over the entire interrupt system.
Let the DT describe which IRQs which come from the CPU and should be
routed back and handled by the CPU intc. These particularly include the
two IPI interrupts which would otherwise necessitate duplication of all
the IPI supporting infrastructure from the CPU intc.
dt-bindings: interrupt-controller: econet: Add CPU interrupt mapping
In MIPS VEIC mode (Vectored External Interrupt Controller), the
hardware stops directly dispatching CPU interrupts such as IPIs or CPU
performance counters, and instead it communicates them to the external
interrupt controller (the hardware described here) which prioritizes,
renumbers, and integrates them with its own hardware interrupt pins.
Interrupts from the external controller are then dispatched through a
different method via a dispatch table. In effect, the external
controller subsumes the CPU controller and becomes the root.
Since there are interrupts which ought to be controlled by the CPU
controller driver - particularly the IPI interrupts - we create a
reverse mapping where those interrupts may be sent back to the CPU
intc when they are received. This maintains the fiction that there is
still a hierarchy, and keeps the DT the same no matter whether the
processor is in VEIC mode or not. The econet,cpu-interrupt-map is
optional and if omitted, it's assumed that no interrupts need to be
mapped.
Shawn Lin [Thu, 9 Apr 2026 07:48:11 +0000 (15:48 +0800)]
mmc: core: Add validation for host-provided max_segs
The max_segs field is of type unsigned short, and if a host driver
sets an excessively large value, it may be truncated to zero. This
can cause mmc_alloc_sg() to call kmalloc_objs() with a zero size
allocation request, which leads to undefined behavior.
Under the SLUB allocator, kmalloc(0) returns a special pointer
(ZERO_SIZE_PTR). The subsequent 'if (sg)' check will evaluate to
true, and sg_init_table() will then attempt to access invalid memory,
resulting in a crash:
To prevent this, add a validation check in mmc_mq_init_request() to
detect when sg_len (derived from max_segs) is zero. If sg_len is zero,
we return an error and print an error message, allowing host driver
developers to identify and fix incorrect max_segs configuration.
This is a defensive measure that ensures the MMC core fails gracefully
when host drivers provide invalid max_segs values, rather than crashing
with a page fault.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
rust: driver core: remove drvdata() and driver_type
When drvdata() was introduced in commit 6f61a2637abe ("rust: device:
introduce Device::drvdata()"), its commit message already noted that a
direct accessor to the driver's bus device private data is not commonly
required -- bus callbacks provide access through &self, and other entry
points (IRQs, workqueues, IOCTLs, etc.) carry their own private data.
The sole motivation for drvdata() was inter-driver interaction -- an
auxiliary driver deriving the parent's bus device private data from the
parent device.
However, drvdata() exposes the driver's bus device private data beyond
the driver's own scope. This creates ordering constraints; for instance
drvdata may not be set yet when the first caller of drvdata() can
appear. It also forces the driver's bus device private data to outlive
all registrations that access it, which causes unnecessary
complications.
Private data should be private to the entity that issues it, i.e. bus
device private data belongs to bus callbacks, class device private data
to class callbacks, IRQ private data to the IRQ handler, etc.
With registration-private data now available through the auxiliary bus,
there is no remaining user of drvdata(), thus remove it.
rust: auxiliary: add registration data to auxiliary devices
Add a registration_data pointer to struct auxiliary_device, allowing the
registering (parent) driver to attach private data to the device at
registration time and retrieve it later when called back by the
auxiliary (child) driver.
By tying the data to the device's registration, Rust drivers can bind
the lifetime of device resources to it, since the auxiliary bus
guarantees that the parent driver remains bound while the auxiliary
device is bound.
On the Rust side, Registration<T> takes ownership of the data via
ForeignOwnable. A TypeId is stored alongside the data for runtime type
checking, making Device::registration_data<T>() a safe method.
Rosen Penev [Sun, 10 May 2026 19:55:31 +0000 (12:55 -0700)]
gpio: spear-spics: Add COMPILE_TEST support
The SPEAr SPI chip-select GPIO driver only depends on generic platform,
OF, and MMIO interfaces, so it can be built outside SPEAr platform
configurations.
Enable compile-test coverage to catch build regressions on other
architectures.
Yong-Xuan Wang [Fri, 8 May 2026 09:31:21 +0000 (02:31 -0700)]
irqchip/riscv-imsic: Clear interrupt move state during CPU offlining
Affinity changes of IMSIC interrupts have to be careful to not lose an
interrupt in the process. Each vector keeps track of an affinity change in
progress with two pointers in struct imsic_vector.
imsic_vector::move_prev points to the previous CPU target data and
imsic_vector::move_next to the designated new CPU target data.
imsic_vector::move_prev on the new CPU can only be cleared after the
previous CPU has cleared imsic_vector::move_next, which ususally happens in
__imsic_remote_sync().
In case of CPU hot-unplug __imsic_remote_sync() is not invoked because the
CPU is already marked offline. That means imsic_vector::move_prev becomes
stale until the CPU is onlined again.
The stale pointer prevents further affinity changes for the affected
interrupts.
Solve this by clearing the imsic_vector::move_prev pointers in the CPU
hotplug offline path.
[ tglx: Replace word salad in change log ]
Fixes: 0f67911e821c ("irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260508-imsic-v2-1-e9f08dd46cf5@sifive.com
Xianwei Zhao [Fri, 8 May 2026 07:36:56 +0000 (07:36 +0000)]
irqchip/meson-gpio: Add support for Amlogic A9 SoCs
The Amlogic A9 SoCs supports the following GPIO interrupt lines:
A9 IRQ Number:
- 95:86 10 pins on bank Y
- 85:84 2 pins on bank CC
- 83:64 20 pins on bank A
- 63:48 16 pins on bank Z
- 47:30 18 pins on bank X
- 29:22 8 pins on bank H
- 21:14 8 pins on bank M
- 13:0 14 pins on bank B
A9 AO IRQ Number:
- 38 1 pins on bank TESTN
- 37:31 7 pins on bank C
- 30:13 18 pins on bank D
- 12:0 13 pins on bank AO
Mark Rutland [Thu, 7 May 2026 11:05:18 +0000 (12:05 +0100)]
genirq/chip: Don't call add_interrupt_randomness() for NMIs
Recently handle_percpu_devid_irq() was changed to call
add_interrupt_randomness(). This introduced a potential deadlock when
handle_percpu_devid_irq() is used to handle an NMI, which can be
detected with lockdep, e.g.
================================
WARNING: inconsistent lock state
7.1.0-rc2-pnmi #465 Not tainted
--------------------------------
inconsistent {INITIAL USE} -> {IN-NMI} usage.
perf/695 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff00837dfd3a18 (&base->lock){-.-.}-{2:2}, at: lock_timer_base+0x6c/0xac
{INITIAL USE} state was registered at:
_raw_spin_lock_irqsave+0x68/0xb0
lock_timer_base+0x6c/0xac
__mod_timer+0x100/0x32c
add_timer_global+0x2c/0x40
__queue_delayed_work+0xf0/0x140
queue_delayed_work_on+0x134/0x138
mem_cgroup_css_online+0x30c/0x310
online_css+0x34/0x10c
cgroup_init_subsys+0x158/0x1c8
cgroup_init+0x440/0x524
start_kernel+0x888/0x998
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&base->lock);
<Interrupt>
lock(&base->lock);
*** DEADLOCK ***
During review, Thomas pointed out it wouldn't be safe for
handle_percpu_devid_irq() to call add_interrupt_randomness() if it was
used to handle NMIs:
https://lore.kernel.org/lkml/87bjgik042.ffs@tglx/
... but evidently people missed that handle_percpu_devid_irq() *is* used
for NMIs.
While it might seem that NMIs should be handled with a separate
handle_percpu_devid_nmi() function, for various structural reasons this was
impractical, and handle_percpu_devid_irq() has been expected to be used for
NMIs since commits:
21bbbc50f398f ("irqchip/gic-v3: Switch high priority PPIs over to handle_percpu_devid_irq()") 5ff78c8de9d83 ("genirq: Kill handle_percpu_devid_fasteoi_nmi()")
Taking the above into account, avoid the deadlock by not calling
add_interrupt_randomness() when handle_percpu_devid_irq() is called in an
NMI context. This is consistent with other NNI handling flows, which do not
call add_interrupt_randomness().
At the same time, update the kernel-doc comment to make it clear that
handle_percpu_devid_irq() can be called in NMI context. The rest of
handle_percpu_devid_irq() is currently NMI safe and doesn't need to change.
Fixes: fd7400cfcbaa ("genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq()") Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20260507110518.3128248-1-mark.rutland@arm.com
Sascha Bischoff [Wed, 6 May 2026 09:37:43 +0000 (09:37 +0000)]
irqchip/gic-v5: Allocate ITS parent LPIs as a range
The ITS MSI domain no longer manages LPI allocation directly. LPIs are
allocated and freed by the parent LPI domain, which can now handle a
full range of interrupts and unwind partial allocations internally.
Make the ITS domain request and release the parent IRQs as a single
range instead of iterating over each interrupt. The ITS allocation
path then only needs to reserve EventIDs, allocate the parent range,
and fill in the ITS irq_data for each MSI. Since no operation in the
per-MSI loop can fail, the partial parent-free unwind becomes
unnecessary.
On teardown, reset the ITS irq_data for the range and then release the
parent range in one call, leaving LPI teardown to the LPI domain.
Fixes: 0f0101325876 ("irqchip/gic-v5: Add GICv5 LPI/IPI support") Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260506093634.382062-4-sascha.bischoff@arm.com
Sascha Bischoff [Wed, 6 May 2026 09:37:23 +0000 (09:37 +0000)]
irqchip/gic-v5: Support range allocation for LPIs
The per-IPI parent allocation loop returns immediately on failure and leaks
any parent interrupts allocated by earlier iterations.
The GICv5 LPI domain now owns LPI allocation and teardown internally,
but its irq_domain callbacks still reject requests where nr_irqs is
greater than one. This forces child domains to allocate and free LPIs
one at a time even when the interrupt core requests a contiguous
range.
Handle multi-interrupt allocation and teardown in the LPI domain by
iterating over the requested range and unwinding any partially
allocated state on failure.
Allocate the parent LPIs for the IPI domain with a single range
request as well, which cures the leakage problem.
Fixes: 0f0101325876 ("irqchip/gic-v5: Add GICv5 LPI/IPI support") Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260506093634.382062-3-sascha.bischoff@arm.com
Sascha Bischoff [Wed, 6 May 2026 09:37:02 +0000 (09:37 +0000)]
irqchip/gic-v5: Move LPI allocation into the LPI domain
The IPI and ITS MSI domains currently allocate and release LPIs
directly, then pass the selected LPI ID to the parent LPI domain. This
leaks the LPI domain's allocation policy into its child domains and
forces each child to duplicate part of the parent domain's teardown.
Make the LPI domain allocate LPIs in its .alloc() callback and release
them in a matching .free() callback. Child domains can then request a
parent interrupt without passing an implementation-specific LPI ID,
and the LPI lifetime is tied to the domain that owns the LPI
namespace.
Remove the gicv5_alloc_lpi() and gicv5_free_lpi() wrappers now that no
external caller needs to manage LPIs directly.
This is a preparatory change for an actual leakage problem in the
allocation code and therefore tagged with the same Fixes tag.
Fixes: 0f0101325876 ("irqchip/gic-v5: Add GICv5 LPI/IPI support") Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260506093634.382062-2-sascha.bischoff@arm.com
Jie Li [Mon, 11 May 2026 11:37:25 +0000 (13:37 +0200)]
gpiolib: add gpiod_is_single_ended() helper
The direction of a single-ended (open-drain or open-source) GPIO line
cannot always be reliably determined by reading hardware registers.
In true open-drain implementations, the "high" state is achieved by
entering a high-impedance mode, which many hardware controllers report
as "input" even if the software intends to use it as an output.
This creates issues for consumer drivers (like I2C) that rely on
gpiod_get_direction() to decide if a line can be driven.
Introduce gpiod_is_single_ended() to allow consumers to check the
software configuration (GPIO_FLAG_OPEN_DRAIN/GPIO_FLAG_OPEN_SOURCE) of
a descriptor. This provides a robust way to identify lines that are
capable of being driven, regardless of their instantaneous hardware state.
Junxi Qian [Wed, 6 May 2026 12:24:15 +0000 (20:24 +0800)]
fuse: fix writeback array overflow when max_pages is one
fuse_iomap_writeback_range() appends one folio pointer and one
fuse_folio_desc for every dirty range that is merged into the current
writeback request. The merge decision checks the byte budget against
fc->max_pages and fc->max_write, but it does not check whether the folio
and descriptor arrays still have another free slot.
This is not sufficient for fuseblk, where the filesystem block size can
be smaller than PAGE_SIZE. With writeback cache enabled and max_pages
negotiated as one, contiguous sub-page dirty ranges can fit within the
byte budget while spanning more than one folio. The next append can then
write past the one-slot folios and descs arrays.
Split the request when the number of already attached folios has reached
fc->max_pages. This keeps the folio/descriptor slot accounting in sync
with the send decision.
Fixes: ef7e7cbb323f ("fuse: use iomap for writeback") Cc: stable@vger.kernel.org Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Link: https://patch.msgid.link/20260506122415.205340-1-qjx1298677004@gmail.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Hongling Zeng [Fri, 1 May 2026 07:10:58 +0000 (15:10 +0800)]
fs: Fix return in jfs_mkdir and orangefs_mkdir
Return NULL instead of passing to ERR_PTR while err is zero
Fixes these smatch warnings:
- fs/jfs/namei.c:311 jfs_mkdir() warn: passing zero to 'ERR_PTR'
- fs/orangefs/namei.c:369 orangefs_mkdir() warn: passing zero
to 'ERR_PTR'
Junyoung Jang [Mon, 4 May 2026 11:26:49 +0000 (20:26 +0900)]
fs/statmount: fix slab out-of-bounds write in statmount_mnt_idmap
statmount_mnt_idmap() writes one mapping with seq_printf() and then
manually advances seq->count to include the NUL separator.
If seq_printf() overflows, seq_set_overflow() sets seq->count to
seq->size. The manual seq->count++ changes this to seq->size + 1.
seq_has_overflowed() then no longer detects the overflow. The corrupted
count returns to statmount_string(), which later executes:
seq->buf[seq->count++] = '\0';
This causes a 1-byte NULL out-of-bounds write on the dynamically
allocated seq buffer.
Fix this by checking for overflow immediately after seq_printf().
Replace open-coded bitfield modifications with the standard FIELD_MODIFY()
macro across multiple SPI controller drivers. This improves readability and
adds compile-time checking without functional changes.
Each patch modifies a single driver, allowing independent review and
application.
Derek J. Clark [Sun, 10 May 2026 04:25:39 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-other: Limit adding attributes to supported devices
Adds lwmi_is_attr_01_supported, and only creates the attribute subfolder
if the attribute is supported by the hardware. Due to some poorly
implemented BIOS this is a multi-step sequence of events. This is
because:
- Some BIOS support getting the capability data from custom mode (0xff),
while others only support it in no-mode (0x00).
- Some BIOS support get/set for the current value from custom mode (0xff),
while others only support it in no-mode (0x00).
- Some BIOS report capability data for a method that is not fully
implemented.
- Some BIOS have methods fully implemented, but no complimentary
capability data.
To ensure we only expose fully implemented methods with corresponding
capability data, we check each outcome before reporting that an
attribute can be supported.
Checking for lwmi_is_attr_01_supported during remove is not done to
ensure that we don't attempt to call cd01 or send WMI events if one of
the interfaces being removed was the cause of the driver unloading.
Fixes: edc4b183b794 ("platform/x86: Add Lenovo Other Mode WMI Driver") Reported-by: Kurt Borja <kuurtb@gmail.com> Closes: https://lore.kernel.org/platform-driver-x86/DG60P3SHXR8H.3NSEHMZ6J7XRC@gmail.com/ Cc: stable@vger.kernel.org Reviewed-by: Rong Zhang <i@rong.moe> Tested-by: Rong Zhang <i@rong.moe> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com> Link: https://patch.msgid.link/20260510042546.436874-10-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Derek J. Clark [Sun, 10 May 2026 04:25:38 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-other: Add Attribute ID helper functions
Adds lwmi_attr_id() function. In the same vein as LWMI_ATTR_ID_FAN_RPM(),
but as a generic, to de-duplicate attribute_id assignment boilerplate.
Adds tunable_attr_01_id() function that breaks out the members of a
tunable_attr_01 struct and passes them to lwmi_attr_id().
No functional change intended.
Cc: stable@vger.kernel.org Reviewed-by: Rong Zhang <i@rong.moe> Tested-by: Rong Zhang <i@rong.moe> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com> Link: https://patch.msgid.link/20260510042546.436874-9-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Derek J. Clark [Sun, 10 May 2026 04:25:37 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-helpers: Move gamezone enums to wmi-helpers
In a later patch in the series the thermal mode enum will be accessed
across three separate drivers (wmi-capdata, wmi-gamezonem and wmi-other).
An additional patch in the series will also add a function prototype that
needs to reference this enum in wmi-helpers.h. To avoid having all these
drivers begin to import each others headers, and to avoid declaring an
opaque enum to hande the second case, move the thermal mode enum to
helpers where it can be safely accessed by everything that needs it from
a single import.
While at it, since the gamezone_events_type enum is the only remaining
item in the header, move that as well and remove the gamezone header
entirely.
Cc: stable@vger.kernel.org Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Rong Zhang <i@rong.moe> Tested-by: Rong Zhang <i@rong.moe> Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com> Link: https://patch.msgid.link/20260510042546.436874-8-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Rong Zhang [Sun, 10 May 2026 04:25:36 +0000 (04:25 +0000)]
platform/x86: lenovo: Decouple lenovo-wmi-gamezone and lenovo-wmi-other
Currently, lenovo-wmi-gamezone depends on lenovo-wmi-other as the former
imports symbols from the latter. The imported symbols are just used to
register a notifier block. However, there is no runtime dependency
between both drivers, and either of them can run without the other,
which is the major purpose of using the notifier framework.
Such a link-time dependency is non-optimal. A previous attempt to "fix"
it made LENOVO_WMI_GAMEZONE select LENOVO_WMI_TUNING, which was
fundamentally broken and resulted in undefined Kconfig behavior, as
`select' cannot be used on a symbol with potentially unmet dependencies.
Decouple both drivers by moving the thermal mode notifier chain to
lenovo-wmi-helpers. Methods for notifier block (un)registration are
exported for lenovo-wmi-gamezone, while a method for querying the
current thermal mode are exported for lenovo-wmi-other.
This turns the dependency graph from
+------------ lenovo-wmi-gamezone
| |
v |
lenovo-wmi-helpers |
^ |
| V
+------------ lenovo-wmi-other
into
+------------ lenovo-wmi-gamezone
|
v
lenovo-wmi-helpers
^
|
+------------ lenovo-wmi-other
To make it clear, the name of the notifier chain is also renamed from
`om_chain_head' to `tm_chain_head', indicating that it's used to query
the current thermal mode.
No functional change intended.
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Fixes: 6e38b9fcbfa3 ("platform/x86: lenovo: gamezone needs "other mode"") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603252259.gHvJDyh3-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202603260302.X0NjQOda-lkp@intel.com/ Signed-off-by: Rong Zhang <i@rong.moe> Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com> Link: https://patch.msgid.link/20260510042546.436874-7-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Derek J. Clark [Sun, 10 May 2026 04:25:35 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-other: Fix tunable_attr_01 struct members
In struct tunable_attr_01 the capdata pointer is unused and the size of
the id members is u32 when it should be u8. Fix these prior to adding
additional members.
No functional change intended.
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Cc: stable@vger.kernel.org Reviewed-by: Rong Zhang <i@rong.moe> Tested-by: Rong Zhang <i@rong.moe> Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com> Link: https://patch.msgid.link/20260510042546.436874-6-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Derek J. Clark [Sun, 10 May 2026 04:25:34 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-other: Zero initialize WMI arguments
Adds explicit initialization of wmi_method_args_32 declarations with
zero values to prevent uninitialized data from being sent to the device
BIOS when passed.
No functional change intended.
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Fixes: 22024ac5366f ("platform/x86: Add Lenovo Gamezone WMI Driver") Fixes: edc4b183b794 ("platform/x86: Add Lenovo Other Mode WMI Driver") Reported-by: Rong Zhang <i@rong.moe> Closes: https://lore.kernel.org/platform-driver-x86/95c7e7b539dd0af41189c754fcd35cec5b6fe182.camel@rong.moe/ Cc: stable@vger.kernel.org Reviewed-by: Rong Zhang <i@rong.moe> Tested-by: Rong Zhang <i@rong.moe> Signed-off-by: Derek J. Clark <derekjohn.clark@gmail.com> Link: https://patch.msgid.link/20260510042546.436874-5-derekjohn.clark@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Rong Zhang [Sun, 10 May 2026 04:25:33 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-other: Balance component bind and unbind
When lwmi_om_master_bind() fails, the master device's components are
left bound, with the aggregate device destroyed due to the failure
(found by sashiko.dev [1]).
Balance calls to component_bind_all() and component_unbind_all() when an
error is propagated to the component framework.
Rong Zhang [Sun, 10 May 2026 04:25:32 +0000 (04:25 +0000)]
platform/x86: lenovo-wmi-other: Balance IDA id allocation and free
Currently, the IDA id is only freed on wmi-other device removal or
failure to create firmware-attributes device, kset, or attributes. It
leaks IDA ids if the wmi-other device is bound multiple times, as the
unbind callback never frees the previously allocated IDA id.
Additionally, if the wmi-other device has failed to create a
firmware-attributes device before it gets removed, the wmi-device
removal callback double frees the same IDA id.
These bugs were found by sashiko.dev [1].
Fix them by moving ida_free() into lwmi_om_fw_attr_remove() so it is
balanced with ida_alloc() in lwmi_om_fw_attr_add(). With them fixed,
properly set and utilize the validity of priv->ida_id to balance
firmware-attributes registration and removal, without relying on
propagating the registration error to the component framework, which is
more reliable and aligns with the hwmon device registration and removal
sequences.
Introduce a structured, size-efficient, per-CPU, CPUID data repository.
Use the x86-cpuid-db auto-generated data types, and custom CPUID leaf
parsers, to build that repository. Given a leaf, subleaf, and index,
provide direct memory access to the parsed and cached per-CPU CPUID output.
** Long-term goal
Remove the need for drivers and other areas in the kernel to invoke direct
CPUID queries. Only one place in the kernel should be allowed to use the
CPUID instruction: the CPUID parser code.
** Implementation
Introduce CPUID_LEAF()/CPUID_LEAF_N() to build a compact CPUID storage
layout in the form:
where each CPUID query stores its output at the designated leaf/subleaf
array and has an associated "CPUID query info" structure.
Embed the CPUID tables inside "struct cpuinfo_x86" to ensure early-boot and
per-CPU access through the CPUs capability structures.
Use an array of CPUID output storage entries for each leaf/subleaf
combination to accommodate leaves which produce the same output format for
a large subleaf range. This is typical for CPUID leaves enumerating
hierarchical objects; e.g. CPUID(0x4) cache topology enumeration,
CPUID(0xd) XSAVE enumeration, and CPUID(0x12) SGX Enclave Page Cache
enumeration.
** New CPUID APIs
Assuming a CPU capability structure 'c', provide macros to access the
parsed and cached CPUID leaf/subleaf output. These macros resolve to a
compile-time tokenization that ensures type-safety:
Fix spelling mistake in comment:
- occured -> occurred
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Md Shofiqul Islam <shofiqtest@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Merge patch series "selftests/namespaces: Fix test hangs and false failures"
Ricardo B. Marlière <rbm@suse.com> says:
This series addresses three reliability problems in the namespaces selftest
suite that cause tests to hang or report incorrect results.
The first patch fixes a hang in nsid_test where the grandchild process is
not reaped during fixture teardown, leaving it alive and holding the TAP
pipe write-end open so the test runner blocks indefinitely waiting for EOF.
The second and third patches fix two problems in listns_efault_test: a
waitpid(-1) race that can cause the iterator child to be consumed during
namespace cleanup (leading to an indefinite block on the subsequent targeted
waitpid), and a false FAIL verdict on kernels that do not implement listns()
(the EFAULT tests should SKIP in that case, consistent with every other
listns test that already handles ENOSYS correctly).
selftests/namespaces: Skip efault tests when listns() is not available
When listns() is not implemented the iterator child detects ENOSYS and
exits cleanly with status PIDFD_SKIP before the parent has a chance to
signal it. The parent sends SIGKILL (which is a harmless no-op at that
point) and then calls waitpid(), obtaining a normal-exit status. The
subsequent ASSERT_TRUE(WIFSIGNALED(status)) therefore fails, causing the
three EFAULT-focused tests to report FAIL rather than SKIP on kernels that
do not yet carry listns() support.
After collecting the iterator's exit status, check whether it exited with
PIDFD_SKIP and issue a SKIP verdict in that case, consistent with the
behaviour of every other listns test that already handles ENOSYS correctly.
selftests/namespaces: Fix waitpid race in listns_efault_test cleanup
The efault tests spawn two categories of child processes: namespace
children (each in its own mount namespace, for concurrent destruction) and
an iterator child that calls listns() in a tight loop. The cleanup loop
used waitpid(-1), which reaps any child in any order. If the iterator child
exits early (e.g. because listns() returned ENOSYS) before all namespace
children have been reaped, waitpid(-1) may consume it instead. The
subsequent targeted waitpid(iter_pid) would then block indefinitely.
Track the PIDs of the namespace children explicitly and use targeted
waitpid() calls in the cleanup loop so the iterator child cannot be
inadvertently reaped during namespace cleanup.
Merge patch series "uaccess/sockptr: copy_struct_ fixes and more helpers"
Stefan Metzmacher <metze@samba.org> says:
Here are some patches related to
copy_struct_{from,to}_{user,sockptr}()
I collected during my work on an IPPROTO_SMBDIRECT
implementation wrapping the smbdirect code used
by cifs.ko and ksmbd.ko.
The first patch fixes copy_struct_to_user()
to behave like documented.
The 2nd patch fixes the case where
copy_struct_from_user() is called by
copy_struct_from_sockptr().
The 3rd patch introduces
copy_struct_{from,to}_bounce_buffer()
as a result of a discussion about the
IPPROTO_QUIC driver in order to
be future prove when handling msg_control
messages in sendmsg and recvmsg.
But I'll likely also use them in my
IPPROTO_SMBDIRECT driver.
The 4th patch makes copy_struct_from_sockptr()
a trivial wrapper switching between
copy_struct_from_user() and
copy_struct_from_bounce_buffer()
The 5th patch introduces copy_struct_to_sockptr()
which I'll also use in my IPPROTO_SMBDIRECT driver.
* patches from https://patch.msgid.link/cover.1775576651.git.metze@samba.org:
sockptr: introduce copy_struct_to_sockptr()
sockptr: let copy_struct_from_sockptr() use copy_struct_from_bounce_buffer()
uaccess: add copy_struct_{from,to}_bounce_buffer() helpers
sockptr: fix usize check in copy_struct_from_sockptr() for user pointers
uaccess: fix ignored_trailing logic in copy_struct_to_user()
selftests/namespaces: Kill grandchild in nsid fixture teardown
The timens_separate and pidns_separate test cases fork a grandchild that
calls pause(). FIXTURE_TEARDOWN only kills the direct child, which is the
init process of the grandchild's namespace. Once the child (init) exits,
the grandchild is reparented to the host init but remains alive and
continues to hold the inherited write end of the test runner's TAP pipe
open. tap_prefix never receives EOF and blocks indefinitely, hanging the
entire test collection.
Record the grandchild PID in the fixture struct so that teardown can send
SIGKILL and reap it before dealing with the child. The grandchild must be
reaped first because the child acts as its PID namespace init; killing the
child first would kill the grandchild without giving us a chance to
waitpid() it.
We already have copy_struct_from_sockptr() as wrapper to
copy_struct_from_user() or copy_struct_from_bounce_buffer(),
so it's good to have copy_struct_to_sockptr()
as well matching the behavior of copy_struct_to_user()
or copy_struct_to_bounce_buffer().
The world would be better without sockptr_t, but having
copy_struct_to_sockptr() is better than open code it
in various places.
I'll use this in my IPPROTO_SMBDIRECT work,
but maybe it will also be useful for others...
IPPROTO_QUIC will likely also use it.
Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Dmitry Safonov <dima@arista.com> Cc: Francesco Ruggeri <fruggeri@arista.com> Cc: Salam Noureddine <noureddine@arista.com> Cc: David Ahern <dsahern@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Michal Luczaj <mhal@rbox.co> Cc: David Wei <dw@davidwei.uk> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Xin Long <lucien.xin@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Simon Horman <horms@kernel.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Christian Brauner <brauner@kernel.org> CC: Kees Cook <keescook@chromium.org> Cc: netdev@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/c950ee1578cb93b4411c3731010def9c1cd82f0d.1775576651.git.metze@samba.org Signed-off-by: Christian Brauner <brauner@kernel.org>
sockptr: let copy_struct_from_sockptr() use copy_struct_from_bounce_buffer()
The world would be better without sockptr_t, but this at least
simplifies copy_struct_from_sockptr() to be just a dispatcher for
copy_struct_from_user() or copy_struct_from_bounce_buffer() without any
special logic on its own.
Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Dmitry Safonov <dima@arista.com> Cc: Francesco Ruggeri <fruggeri@arista.com> Cc: Salam Noureddine <noureddine@arista.com> Cc: David Ahern <dsahern@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Michal Luczaj <mhal@rbox.co> Cc: David Wei <dw@davidwei.uk> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Xin Long <lucien.xin@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Kuniyuki Iwashima <kuniyu@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Simon Horman <horms@kernel.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Christian Brauner <brauner@kernel.org> CC: Kees Cook <keescook@chromium.org> Cc: netdev@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/b9b7e22664a53251d7ad099b12aead8b599c1257.1775576651.git.metze@samba.org Signed-off-by: Christian Brauner <brauner@kernel.org>