]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
2 weeks agoNFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst
Mike Snitzer [Tue, 11 Nov 2025 14:59:32 +0000 (09:59 -0500)] 
NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst

This document details the NFSD IO modes that are configurable using
NFSD's experimental debugfs interfaces:

  /sys/kernel/debug/nfsd/io_cache_read
  /sys/kernel/debug/nfsd/io_cache_write

This document will evolve as NFSD's interfaces do (e.g. if/when NFSD's
debugfs interfaces are replaced with per-export controls).

Future updates will provide more specific guidance and howto
information to help others use and evaluate NFSD's IO modes:
BUFFERED, DONTCACHE and DIRECT.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agoNFSD: Implement NFSD_IO_DIRECT for NFS WRITE
Mike Snitzer [Tue, 11 Nov 2025 14:59:31 +0000 (09:59 -0500)] 
NFSD: Implement NFSD_IO_DIRECT for NFS WRITE

When NFSD_IO_DIRECT is selected via the
/sys/kernel/debug/nfsd/io_cache_write experimental tunable, split
incoming unaligned NFS WRITE requests into a prefix, middle and
suffix segment, as needed. The middle segment is now DIO-aligned and
the prefix and/or suffix are unaligned. Synchronous buffered IO is
used for the unaligned segments, and IOCB_DIRECT is used for the
middle DIO-aligned extent.

Although IOCB_DIRECT avoids the use of the page cache, by itself it
doesn't guarantee data durability. For UNSTABLE WRITE requests,
durability is obtained by a subsequent NFS COMMIT request.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2 weeks agoNFSD: Make FILE_SYNC WRITEs comply with spec
Chuck Lever [Tue, 11 Nov 2025 14:59:30 +0000 (09:59 -0500)] 
NFSD: Make FILE_SYNC WRITEs comply with spec

Mike noted that when NFSD responds to an NFS_FILE_SYNC WRITE, it
does not also persist file time stamps. To wit, Section 18.32.3
of RFC 8881 mandates:

> The client specifies with the stable parameter the method of how
> the data is to be processed by the server. If stable is
> FILE_SYNC4, the server MUST commit the data written plus all file
> system metadata to stable storage before returning results. This
> corresponds to the NFSv2 protocol semantics. Any other behavior
> constitutes a protocol violation. If stable is DATA_SYNC4, then
> the server MUST commit all of the data to stable storage and
> enough of the metadata to retrieve the data before returning.

Commit 3f3503adb332 ("NFSD: Use vfs_iocb_iter_write()") replaced:

- flags |= RWF_SYNC;

with:

+ kiocb.ki_flags |= IOCB_DSYNC;

which appears to be correct given:

if (flags & RWF_SYNC)
kiocb_flags |= IOCB_DSYNC;

in kiocb_set_rw_flags(). However the author of that commit did not
appreciate that the previous line in kiocb_set_rw_flags() results
in IOCB_SYNC also being set:

kiocb_flags |= (__force int) (flags & RWF_SUPPORTED);

RWF_SUPPORTED contains RWF_SYNC, and RWF_SYNC is the same bit as
IOCB_SYNC. Reviewers at the time did not catch the omission.

Reported-by: Mike Snitzer <snitzer@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/20251018005431.3403-1-cel@kernel.org/T/#t
Fixes: 3f3503adb332 ("NFSD: Use vfs_iocb_iter_write()")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoNFSD: Add trace point for SCSI fencing operation.
Dai Ngo [Wed, 5 Nov 2025 20:45:55 +0000 (12:45 -0800)] 
NFSD: Add trace point for SCSI fencing operation.

Add trace point to print client IP address, net namespace number,
device name and status of SCSI pr_preempt command.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoNFSD: use correct reservation type in nfsd4_scsi_fence_client
Dai Ngo [Wed, 5 Nov 2025 20:45:54 +0000 (12:45 -0800)] 
NFSD: use correct reservation type in nfsd4_scsi_fence_client

The reservation type argument for the pr_preempt call should match the
one used in nfsd4_block_get_device_info_scsi.

Fixes: f99d4fbdae67 ("nfsd: add SCSI layout support")
Cc: stable@vger.kernel.org
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoxdrgen: Don't generate unnecessary semicolon
Chuck Lever [Wed, 5 Nov 2025 15:26:07 +0000 (10:26 -0500)] 
xdrgen: Don't generate unnecessary semicolon

The Jinja2 templates add a semicolon at the end of every function.
The C language does not require this punctuation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoxdrgen: Fix union declarations
Chuck Lever [Wed, 5 Nov 2025 15:26:06 +0000 (10:26 -0500)] 
xdrgen: Fix union declarations

Add a missing template file. This file is used when a union is
defined as a public API (ie, "pragma public <union name>;").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoNFSD: don't start nfsd if sv_permsocks is empty
Olga Kornievskaia [Mon, 3 Nov 2025 17:57:34 +0000 (12:57 -0500)] 
NFSD: don't start nfsd if sv_permsocks is empty

Previously, while trying to create a server instance, if no
listening sockets were present then default parameter udp
and tcp listeners were created. It's unclear what purpose
was of starting these listeners were and how this could have
been triggered by the userland setup. This patch proposed
to ensure the reverse that we never end in a situation where
no listener sockets are created and we are trying to create
nfsd threads.

The problem it solves is: when nfs.conf only has tcp=n (and
nothing else for the choice of transports), nfsdctl would
still start the server and create udp and tcp listeners.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoxdrgen: handle _XdrString in union encoder/decoder
Khushal Chitturi [Wed, 29 Oct 2025 06:12:36 +0000 (11:42 +0530)] 
xdrgen: handle _XdrString in union encoder/decoder

Running xdrgen on xdrgen/tests/test.x fails when
generating encoder or decoder functions for union
members of type _XdrString. It was because _XdrString
does not have a spec attribute like _XdrBasic,
leading to AttributeError.

This patch updates emit_union_case_spec_definition
and emit_union_case_spec_decoder/encoder to handle
_XdrString by assigning type_name = "char *" and
avoiding referencing to spec.

Testing: Fixed xdrgen tool was run on originally failing
test file (tools/net/sunrpc/xdrgen/tests/test.x) and now
completes without AttributeError. Modified xdrgen tool was
also run against nfs4_1.x (Documentation/sunrpc/xdr/nfs4_1.x).
The output header file matches with nfs4_1.h
(include/linux/sunrpc/xdrgen/nfs4_1.h).
This validates the patch for all XDR input files currently
within the kernel.

Changes since v2:
- Moved the shebang to the first line
- Removed SPDX header to match style of current xdrgen files

Changes since v1:
- Corrected email address in Signed-off-by.
- Wrapped patch description lines to 72 characters.

Signed-off-by: Khushal Chitturi <kc9282016@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoxdrgen: Fix the variable-length opaque field decoder template
Chuck Lever [Mon, 27 Oct 2025 13:56:33 +0000 (09:56 -0400)] 
xdrgen: Fix the variable-length opaque field decoder template

Ensure that variable-length opaques are decoded into the named
field, and do not overwrite the structure itself.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoxdrgen: Make the xdrgen script location-independent
Chuck Lever [Mon, 27 Oct 2025 13:56:32 +0000 (09:56 -0400)] 
xdrgen: Make the xdrgen script location-independent

The @pythondir@ placeholder is meant for build-time substitution,
such as with autoconf. autoconf is not used in the kernel. Let's
replace that mechanism with one that better enables the xdrgen
script to be run from any directory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 weeks agoxdrgen: Generalize/harden pathname construction
Chuck Lever [Mon, 27 Oct 2025 13:56:31 +0000 (09:56 -0400)] 
xdrgen: Generalize/harden pathname construction

Use Python's built-in Path constructor to find the Jinja templates.
This provides better error checking, proper use of path component
separators, and more reliable location of the template files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agolockd: don't allow locking on reexported NFSv2/3
Jeff Layton [Thu, 23 Oct 2025 13:12:39 +0000 (09:12 -0400)] 
lockd: don't allow locking on reexported NFSv2/3

Since commit 9254c8ae9b81 ("nfsd: disallow file locking and delegations
for NFSv4 reexport"), file locking when reexporting an NFS mount via
NFSv4 is expressly prohibited by nfsd. Do the same in lockd:

Add a new  nlmsvc_file_cannot_lock() helper that will test whether file
locking is allowed for a given file, and return nlm_lck_denied_nolocks
if it isn't.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoMAINTAINERS: add a nfsd blocklayout reviewer
Christoph Hellwig [Wed, 22 Oct 2025 11:45:30 +0000 (13:45 +0200)] 
MAINTAINERS: add a nfsd blocklayout reviewer

Add a minimal entry for the block layout driver to make sure Christoph
who wrote the code gets Cced on all patches.  The actual maintenance
stays with the nfsd maintainer team.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agonfsd: Use MD5 library instead of crypto_shash
Eric Biggers [Thu, 16 Oct 2025 18:15:34 +0000 (11:15 -0700)] 
nfsd: Use MD5 library instead of crypto_shash

Update NFSD's support for "legacy client tracking" (which uses MD5) to
use the MD5 library instead of crypto_shash.  This has several benefits:

- Simpler code.  Notably, much of the error-handling code is no longer
  needed, since the library functions can't fail.

- Improved performance due to reduced overhead.  A microbenchmark of
  nfs4_make_rec_clidname() shows a speedup from 1455 cycles to 425.

- The MD5 code can now safely be built as a loadable module when nfsd is
  built as a loadable module.  (Previously, nfsd forced the MD5 code to
  built-in, presumably to work around the unreliability of the
  name-based loading.)  Thus select MD5 from the tristate option NFSD if
  NFSD_LEGACY_CLIENT_TRACKING, instead of from the bool option NFSD_V4.

- Fixes a bug where legacy client tracking was not supported on kernels
  booted with "fips=1", due to crypto_shash not allowing MD5 to be used.
  This particular use of MD5 is not for a cryptographic purpose, though,
  so it is acceptable even when fips=1 (see
  https://lore.kernel.org/r/dae495a93cbcc482f4ca23c3a0d9360a1fd8c3a8.camel@redhat.com/).

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agonfsd: stop pretending that we cache the SEQUENCE reply.
NeilBrown [Thu, 16 Oct 2025 13:49:58 +0000 (09:49 -0400)] 
nfsd: stop pretending that we cache the SEQUENCE reply.

nfsd does not cache the reply to a SEQUENCE.  As the comment above
nfsd4_replay_cache_entry() says:

 * The sequence operation is not cached because we can use the slot and
 * session values.

The comment above nfsd4_cache_this() suggests otherwise.

 * The session reply cache only needs to cache replies that the client
 * actually asked us to.  But it's almost free for us to cache compounds
 * consisting of only a SEQUENCE op, so we may as well cache those too.
 * Also, the protocol doesn't give us a convenient response in the case
 * of a replay of a solo SEQUENCE op that wasn't cached

The code in nfsd4_store_cache_entry() makes it clear that only responses
beyond 'cstate.data_offset' are actually cached, and data_offset is set
at the end of nfsd4_encode_sequence() *after* the sequence response has
been encoded.

This patch simplifies code and removes the confusing comments.

- nfsd4_is_solo_sequence() is discarded as not-useful.
- nfsd4_cache_this() is now trivial so it too is discarded with the
  code placed in-line at the one call-site in nfsd4_store_cache_entry().
- nfsd4_enc_sequence_replay() is open-coded in to
  nfsd4_replay_cache_entry(), and then simplified to (hopefully) make
  the process of replaying a reply clearer.

Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFS: nfsd-maintainer-entry-profile: Inline function name prefixes
Bagas Sanjaya [Mon, 17 Nov 2025 10:24:17 +0000 (17:24 +0700)] 
NFS: nfsd-maintainer-entry-profile: Inline function name prefixes

Sphinx reports htmldocs warnings:

Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst:185: ERROR: Unknown target name: "nfsd". [docutils]
Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst:188: ERROR: Unknown target name: "nfsdn". [docutils]
Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst:192: ERROR: Unknown target name: "nfsd4m". [docutils]

These are due to Sphinx confusing function name prefixes for external
link syntax. Fix the warnings by inlining the prefixes.

Fixes: 3a1ce35030e1e0 ("NFSD: Add a subsystem policy document")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20251117174218.29365f30@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD: Add a subsystem policy document
Chuck Lever [Tue, 14 Oct 2025 15:09:58 +0000 (11:09 -0400)] 
NFSD: Add a subsystem policy document

Steer contributors to NFSD's patchworks instance, list our patch
submission preferences, and more. The new document is based on the
existing netdev and xfs subsystem policy documents.

This is an attempt to add transparency to the process of accepting
contributions to NFSD and getting them merged upstream.

Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NeilBrown <neil@brown.name>
[ cel: Hand-edits to address review comments ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agosunrpc: allocate a separate bvec array for socket sends
Jeff Layton [Mon, 13 Oct 2025 13:54:53 +0000 (09:54 -0400)] 
sunrpc: allocate a separate bvec array for socket sends

svc_tcp_sendmsg() calls xdr_buf_to_bvec() with the second slot of
rq_bvec as the start, but doesn't reduce the array length by one, which
could lead to an array overrun. Also, rq_bvec is always rq_maxpages in
length, which can be too short in some cases, since the TCP record
marker consumes a slot.

Fix both problems by adding a separate bvec array to the svc_sock that
is specifically for sending. For TCP, make this array one slot longer
than rq_maxpages, to account for the record marker. For UDP, only
allocate as large an array as we need since it's limited to 64k of
payload.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoSUNRPC: Improve "fragment too large" warning
Chuck Lever [Wed, 8 Oct 2025 15:39:56 +0000 (11:39 -0400)] 
SUNRPC: Improve "fragment too large" warning

Including the client IP address that generated the overrun traffic
seems like it would be helpful. The message now reads:

  kernel: svc: nfsd oversized RPC fragment (1064958 octets) from 100.64.0.11:45866

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD: Implement NFSD_IO_DIRECT for NFS READ
Chuck Lever [Wed, 8 Oct 2025 13:52:30 +0000 (09:52 -0400)] 
NFSD: Implement NFSD_IO_DIRECT for NFS READ

Add an experimental option that forces NFS READ operations to use
direct I/O instead of reading through the NFS server's page cache.

There is already at least one other layer of read caching: the page
cache on NFS clients.

The server's page cache, in many cases, is unlikely to provide
additional benefit. Some benchmarks have demonstrated that the
server's page cache is actively detrimental for workloads whose
working set is larger than the server's available physical memory.

For instance, on small NFS servers, cached NFS file content can
squeeze out local memory consumers. For large sequential workloads,
an enormous amount of data flows into and out of the page cache
and is consumed by NFS clients exactly once -- caching that data
is expensive to do and totally valueless.

For now this is a hidden option that can be enabled on test
systems for benchmarking. In the longer term, this option might
be enabled persistently or per-export. When the exported file
system does not support direct I/O, NFSD falls back to using
either DONTCACHE or buffered I/O to fulfill NFS READ requests.

Suggested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD: Relocate the xdr_reserve_space_vec() call site
Chuck Lever [Wed, 8 Oct 2025 13:52:29 +0000 (09:52 -0400)] 
NFSD: Relocate the xdr_reserve_space_vec() call site

In order to detect when a direct READ is possible, we need the send
buffer's .page_len to be zero when there is nothing in the buffer's
.pages array yet.

However, when xdr_reserve_space_vec() extends the size of the
xdr_stream to accommodate a READ payload, it adds to the send
buffer's .page_len.

It should be safe to reserve the stream space /after/ the VFS read
operation completes. This is, for example, how an NFSv3 READ works:
the VFS read goes into the rq_bvec, and is then added to the send
xdr_stream later by svcxdr_encode_opaque_pages().

Now that xdr_reserve_space_vec() uses the number of bytes actually
read, the xdr_truncate_encode() call is no longer necessary.

Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD: pass nfsd_file to nfsd_iter_read()
Mike Snitzer [Wed, 8 Oct 2025 13:52:28 +0000 (09:52 -0400)] 
NFSD: pass nfsd_file to nfsd_iter_read()

Prepare for nfsd_iter_read() to use the DIO alignment stored in
nfsd_file by passing the nfsd_file to nfsd_iter_read() rather than
just the file which is associaed with the nfsd_file.

This means nfsd4_encode_readv() now also needs the nfsd_file rather
than the file.  Instead of changing the file arg to be the nfsd_file,
we discard the file arg as the nfsd_file (and indeed the file) is
already available via the "read" argument.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD/blocklayout: Support multiple extents per LAYOUTGET
Sergey Bashirov [Fri, 3 Oct 2025 09:11:06 +0000 (12:11 +0300)] 
NFSD/blocklayout: Support multiple extents per LAYOUTGET

Allow the pNFS server to respond with multiple extents to a LAYOUTGET
request, thereby avoiding unnecessary load on the server and improving
performance for the client. The number of LAYOUTGET requests is
significantly reduced for various file access patterns, including
random and parallel writes.

Additionally, this change allows the client to request layouts with the
loga_minlength value greater than the minimum possible length of a single
extent in XFS. We use this functionality to fix a livelock in the client.

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD/blocklayout: Introduce layout content structure
Sergey Bashirov [Fri, 3 Oct 2025 09:11:05 +0000 (12:11 +0300)] 
NFSD/blocklayout: Introduce layout content structure

Add a layout content structure instead of a single extent. The ability
to store and encode an array of extents is then used to implement support
for multiple extents per LAYOUTGET.

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD/blocklayout: Extract extent mapping from proc_layoutget
Sergey Bashirov [Fri, 3 Oct 2025 09:11:04 +0000 (12:11 +0300)] 
NFSD/blocklayout: Extract extent mapping from proc_layoutget

No changes in functionality. Split the proc_layoutget function to
create a helper function that maps single extent to the requested
range. This helper function is then used to implement support for
multiple extents per LAYOUTGET.

Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD/blocklayout: Fix minlength check in proc_layoutget
Sergey Bashirov [Fri, 3 Oct 2025 09:11:03 +0000 (12:11 +0300)] 
NFSD/blocklayout: Fix minlength check in proc_layoutget

The extent returned by the file system may have a smaller offset than
the segment offset requested by the client. In this case, the minimum
segment length must be checked against the requested range. Otherwise,
the client may not be able to continue the read/write operation.

Fixes: 8650b8a05850 ("nfsd: pNFS block layout driver")
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agosvcrdma: Increase the server's default RPC/RDMA credit grant
Chuck Lever [Thu, 2 Oct 2025 14:00:52 +0000 (10:00 -0400)] 
svcrdma: Increase the server's default RPC/RDMA credit grant

The range of commits from commit e3274026e2ec ("SUNRPC: move all of
xprt handling into svc_xprt_handle()") to commit 15d39883ee7d
("SUNRPC: change the back-channel queue to lwq") enabled NFSD
performance to scale better as the number of nfsd threads is
increased. These commits were merged in v6.7.

Now that the nfsd thread count can scale to more threads, permit
individual clients to make more use of those threads. Increase the
RPC/RDMA per-connection credit grant from 64 to 128 -- same as the
Linux NFS client.

Simple single client fio-based benchmarking so far shows only
improvement, no regression.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD: Update comment documenting unsupported fattr4 attributes
Chuck Lever [Wed, 1 Oct 2025 13:24:31 +0000 (09:24 -0400)] 
NFSD: Update comment documenting unsupported fattr4 attributes

TIME_CREATE has been supported since commit e377a3e698fb ("nfsd: Add
support for the birth time attribute").

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agonfsd: delete unreachable confusing code in nfs4_open_delegation()
Matvey Kovalev [Mon, 29 Sep 2025 17:35:20 +0000 (20:35 +0300)] 
nfsd: delete unreachable confusing code in nfs4_open_delegation()

op_delegate_type is assigned OPEN_DELEGATE_NONE just before the if-block
where condition specifies it not be equal to OPEN_DELEGATE_NONE. Compiler
treats the block as unreachable and optimizes it out from the resulting
executable.

In that aspect commit d08d32e6e5c0 ("nfsd4: return delegation immediately
if lease fails") notably makes no difference.

Seems it's better to just drop this code instead of fiddling with memory
barriers or atomics.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Matvey Kovalev <matvey.kovalev@ispras.ru>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoNFSD: Add array bounds-checking in nfsd_iter_read()
Chuck Lever [Wed, 17 Sep 2025 14:31:40 +0000 (10:31 -0400)] 
NFSD: Add array bounds-checking in nfsd_iter_read()

The *count parameter does not appear to be explicitly restricted
to being smaller than rsize, so it might be possible to overrun
the rq_bvec or rq_pages arrays.

Rather than overrunning these arrays (damage done!) and then WARNING
once, let's harden the loop so that it terminates before the end of
the arrays are reached. This should result in a short read, which is
OK -- clients recover by sending additional READ requests for the
remaining unread bytes.

Reported-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agonfsd: switch the default for NFSD_LEGACY_CLIENT_TRACKING to "n"
Jeff Layton [Tue, 9 Sep 2025 15:48:08 +0000 (11:48 -0400)] 
nfsd: switch the default for NFSD_LEGACY_CLIENT_TRACKING to "n"

We added this Kconfig option a little over a year ago. Switch the
default to "n" in preparation for its eventual removal.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agonfsd: change nfs4_client_to_reclaim() to allocate data
NeilBrown [Mon, 8 Sep 2025 01:38:33 +0000 (11:38 +1000)] 
nfsd: change nfs4_client_to_reclaim() to allocate data

The calling convention for nfs4_client_to_reclaim() is clumsy in that
the caller needs to free memory if the function fails.  It is much
cleaner if the function frees its own memory.

This patch changes nfs4_client_to_reclaim() to re-allocate the .data
fields to be stored in the newly allocated struct nfs4_client_reclaim,
and to free everything on failure.

__cld_pipe_inprogress_downcall() needs to allocate the data anyway to
copy it from user-space, so now that data is allocated twice.  I think
that is a small price to pay for a cleaner interface.

Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agonfsd: move name lookup out of nfsd4_list_rec_dir()
NeilBrown [Mon, 15 Sep 2025 02:55:13 +0000 (12:55 +1000)] 
nfsd: move name lookup out of nfsd4_list_rec_dir()

nfsd4_list_rec_dir() is called with two different callbacks.
One of the callbacks uses vfs_rmdir() to remove the directory.
The other doesn't use the dentry at all, just the name.

As only one callback needs the dentry, this patch moves the lookup into
that callback.  This prepares of changes to how directory operations
are locked.

Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agosvcrdma: Release transport resources synchronously
Chuck Lever [Wed, 3 Sep 2025 15:53:35 +0000 (11:53 -0400)] 
svcrdma: Release transport resources synchronously

NFSD has always supported added network listeners. The new netlink
protocol now enables the removal of listeners.

Olga noticed that if an RDMA listener is removed and immediately
re-added, the deferred __svc_rdma_free() function might not have
run yet, so some or all of the old listener's RDMA resources
linger, which prevents a new listener on the same address from
being created.

Also, svc_xprt_free() does a module_put() just after calling
->xpo_free(). That means if there is deferred work going on, the
module could be unloaded before that work is even started,
resulting in a UAF.

Neil asks:
> What particular part of __svc_rdma_free() needs to run in order for a
> subsequent registration to succeed?
> Can that bit be run directory from svc_rdma_free() rather than be
> delayed?
> (I know almost nothing about rdma so forgive me if the answers to these
> questions seems obvious)

The reasons I can recall are:

 - Some of the transport tear-down work can sleep
 - Releasing a cm_id is tricky and can deadlock

We might be able to mitigate the second issue with judicious
application of transport reference counting.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/linux-nfs/20250821204328.89218-1-okorniev@redhat.com/
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
4 weeks agoLinux 6.18-rc6 v6.18-rc6
Linus Torvalds [Sun, 16 Nov 2025 22:25:38 +0000 (14:25 -0800)] 
Linux 6.18-rc6

4 weeks agoMerge tag 'perf-tools-fixes-for-v6.18-2-2025-11-16' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 16 Nov 2025 21:45:03 +0000 (13:45 -0800)] 
Merge tag 'perf-tools-fixes-for-v6.18-2-2025-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix writing bpf_prog (infos|btfs)_cnt to data file, to not generate
   invalid perf.data files in some corner cases.

 - Fix 'perf top' segfault by ensuring libbfd is initialized. This is an
   opt-in feature due to license incompatibilities.

 - Fix segfault in 'perf lock' due to missing kernel map.

 - Fix 'perf lock contention' test.

 - Don't fail fast path detection if binutils-devel isn't available.

 - Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason.

* tag 'perf-tools-fixes-for-v6.18-2-2025-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf libbfd: Ensure libbfd is initialized prior to use
  perf test: Fix lock contention test
  perf lock: Fix segfault due to missing kernel map
  tools headers UAPI: Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason
  perf build: Don't fail fast path feature detection when binutils-devel is not available
  perf header: Write bpf_prog (infos|btfs)_cnt to data file

4 weeks agoMerge tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 16 Nov 2025 21:31:14 +0000 (13:31 -0800)] 
Merge tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "7 hotfixes.  5 are cc:stable, 4 are against mm/

  All are singletons - please see the respective changelogs for details"

* tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm, swap: fix potential UAF issue for VMA readahead
  selftests/user_events: fix type cast for write_index packed member in perf_test
  lib/test_kho: check if KHO is enabled
  mm/huge_memory: fix folio split check for anon folios in swapcache
  MAINTAINERS: update David Hildenbrand's email address
  crash: fix crashkernel resource shrink
  mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb

4 weeks agoMerge tag 'firewire-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 16 Nov 2025 15:08:28 +0000 (07:08 -0800)] 
Merge tag 'firewire-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fixes from Takashi Sakamoto:
 "This includes some fixes for the topology map, newly introduced in
  v6.18 kernel"

* tag 'firewire-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix to update generation field in topology map
  firewire: core: Initialize topology_map.lock

4 weeks agoMerge tag 'edac_urgent_for_v6.18_rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 16 Nov 2025 15:05:24 +0000 (07:05 -0800)] 
Merge tag 'edac_urgent_for_v6.18_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:

 - In Versalnet, handle the reporting of non-standard hw errors whose
   information can come in more than one remote processor message.

 - Explicitly reenable ECC checking after a warm reset in Altera OCRAM
   as those registers are reset to default otherwise

 - Fix single-bit error injection in Altera EDAC to not inject errors
   directly in ECC RAM and thus lead to false double-bit errors due to
   same ECC RAM being in concurrent use

* tag 'edac_urgent_for_v6.18_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/altera: Use INTTEST register for Ethernet and USB SBE injection
  EDAC/altera: Handle OCRAM ECC enable after warm reset
  EDAC/versalnet: Handle split messages for non-standard errors

4 weeks agofirewire: core: fix to update generation field in topology map
Takashi Sakamoto [Fri, 14 Nov 2025 14:44:21 +0000 (23:44 +0900)] 
firewire: core: fix to update generation field in topology map

The generation field of topology map is updated after initialized by zero.
The updated value of generation field is always zero, and is against
specification.

This commit fixes the bug.

Fixes: 7d138cb269db ("firewire: core: use spin lock specific to topology map")
Link: https://lore.kernel.org/r/20251114144421.415278-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
4 weeks agomm, swap: fix potential UAF issue for VMA readahead
Kairui Song [Tue, 11 Nov 2025 13:36:08 +0000 (21:36 +0800)] 
mm, swap: fix potential UAF issue for VMA readahead

Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
pinning"), the common helper for allocating and preparing a folio in the
swap cache layer no longer tries to get a swap device reference
internally, because all callers of __read_swap_cache_async are already
holding a swap entry reference.  The repeated swap device pinning isn't
needed on the same swap device.

Caller of VMA readahead is also holding a reference to the target entry's
swap device, but VMA readahead walks the page table, so it might encounter
swap entries from other devices, and call __read_swap_cache_async on
another device without holding a reference to it.

So it is possible to cause a UAF when swapoff of device A raced with
swapin on device B, and VMA readahead tries to read swap entries from
device A.  It's not easy to trigger, but in theory, it could cause real
issues.

Make VMA readahead try to get the device reference first if the swap
device is a different one from the target entry.

Link: https://lkml.kernel.org/r/20251111-swap-fix-vma-uaf-v1-1-41c660e58562@tencent.com
Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agoselftests/user_events: fix type cast for write_index packed member in perf_test
Ankit Khushwaha [Thu, 6 Nov 2025 09:55:32 +0000 (15:25 +0530)] 
selftests/user_events: fix type cast for write_index packed member in perf_test

Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member
warning due to potential unaligned pointer access:

perf_test.c:239:38: warning: taking address of packed member 'write_index'
of class or structure 'user_reg' may result in an unaligned pointer value
[-Waddress-of-packed-member]
  239 |         ASSERT_NE(-1, write(self->data_fd, &reg.write_index,
      |                                             ^~~~~~~~~~~~~~~

Since write(2) works with any alignment. Casting '&reg.write_index'
explicitly to 'void *' to suppress this warning.

Link: https://lkml.kernel.org/r/20251106095532.15185-1-ankitkhushwaha.linux@gmail.com
Fixes: 42187bdc3ca4 ("selftests/user_events: Add perf self-test for empty arguments events")
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: sunliming <sunliming@kylinos.cn>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agolib/test_kho: check if KHO is enabled
Pasha Tatashin [Thu, 6 Nov 2025 22:06:35 +0000 (17:06 -0500)] 
lib/test_kho: check if KHO is enabled

We must check whether KHO is enabled prior to issuing KHO commands,
otherwise KHO internal data structures are not initialized.

Link: https://lkml.kernel.org/r/20251106220635.2608494-1-pasha.tatashin@soleen.com
Fixes: b753522bed0b ("kho: add test for kexec handover")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202511061629.e242724-lkp@intel.com
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agomm/huge_memory: fix folio split check for anon folios in swapcache
Zi Yan [Wed, 5 Nov 2025 16:29:10 +0000 (11:29 -0500)] 
mm/huge_memory: fix folio split check for anon folios in swapcache

Both uniform and non uniform split check missed the check to prevent
splitting anon folios in swapcache to non-zero order.

Splitting anon folios in swapcache to non-zero order can cause data
corruption since swapcache only support PMD order and order-0 entries.
This can happen when one use split_huge_pages under debugfs to split
anon folios in swapcache.

In-tree callers do not perform such an illegal operation.  Only debugfs
interface could trigger it.  I will put adding a test case on my TODO
list.

Fix the check.

Link: https://lkml.kernel.org/r/20251105162910.752266-1-ziy@nvidia.com
Fixes: 58729c04cf10 ("mm/huge_memory: add buddy allocator like (non-uniform) folio_split()")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: "David Hildenbrand (Red Hat)" <david@kernel.org>
Closes: https://lore.kernel.org/all/dc0ecc2c-4089-484f-917f-920fdca4c898@kernel.org/
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agoMAINTAINERS: update David Hildenbrand's email address
David Hildenbrand (Red Hat) [Mon, 3 Nov 2025 10:36:59 +0000 (11:36 +0100)] 
MAINTAINERS: update David Hildenbrand's email address

Switch to kernel.org email address as I will be leaving Red Hat.  The old
address will remain active until end of January 2026, so performing the
change now should make sure that most mails will reach me.

Link: https://lkml.kernel.org/r/20251103103659.379335-1-david@kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agocrash: fix crashkernel resource shrink
Sourabh Jain [Sat, 1 Nov 2025 19:37:41 +0000 (01:07 +0530)] 
crash: fix crashkernel resource shrink

When crashkernel is configured with a high reservation, shrinking its
value below the low crashkernel reservation causes two issues:

1. Invalid crashkernel resource objects
2. Kernel crash if crashkernel shrinking is done twice

For example, with crashkernel=200M,high, the kernel reserves 200MB of high
memory and some default low memory (say 256MB).  The reservation appears
as:

cat /proc/iomem | grep -i crash
af000000-beffffff : Crash kernel
433000000-43f7fffff : Crash kernel

If crashkernel is then shrunk to 50MB (echo 52428800 >
/sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved:
af000000-beffffff : Crash kernel

Instead, it should show 50MB:
af000000-b21fffff : Crash kernel

Further shrinking crashkernel to 40MB causes a kernel crash with the
following trace (x86):

BUG: kernel NULL pointer dereference, address: 0000000000000038
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
<snip...>
Call Trace: <TASK>
? __die_body.cold+0x19/0x27
? page_fault_oops+0x15a/0x2f0
? search_module_extables+0x19/0x60
? search_bpf_extables+0x5f/0x80
? exc_page_fault+0x7e/0x180
? asm_exc_page_fault+0x26/0x30
? __release_resource+0xd/0xb0
release_resource+0x26/0x40
__crash_shrink_memory+0xe5/0x110
crash_shrink_memory+0x12a/0x190
kexec_crash_size_store+0x41/0x80
kernfs_fop_write_iter+0x141/0x1f0
vfs_write+0x294/0x460
ksys_write+0x6d/0xf0
<snip...>

This happens because __crash_shrink_memory()/kernel/crash_core.c
incorrectly updates the crashk_res resource object even when
crashk_low_res should be updated.

Fix this by ensuring the correct crashkernel resource object is updated
when shrinking crashkernel memory.

Link: https://lkml.kernel.org/r/20251101193741.289252-1-sourabhjain@linux.ibm.com
Fixes: 16c6006af4d4 ("kexec: enable kexec_crash_size to support two crash kernel regions")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agomm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
David Hildenbrand (Red Hat) [Fri, 14 Nov 2025 21:49:20 +0000 (22:49 +0100)] 
mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb

In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support
runtime allocation of gigantic hugetlb folios.  In the meantime it evolved
into a generic way for the architecture to state that it supports gigantic
hugetlb folios.

In commit fae7d834c43c ("mm: add __dump_folio()") we started using
CONFIG_ARCH_HAS_GIGANTIC_PAGE to decide MAX_FOLIO_ORDER: whether we could
have folios larger than what the buddy can handle.  In the context of that
commit, we started using MAX_FOLIO_ORDER to detect page corruptions when
dumping tail pages of folios.  Before that commit, we assumed that we
cannot have folios larger than the highest buddy order, which was
obviously wrong.

In commit 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes
when registering hstate"), we used MAX_FOLIO_ORDER to detect
inconsistencies, and in fact, we found some now.

Powerpc allows for configs that can allocate gigantic folio during boot
(not at runtime), that do not set CONFIG_ARCH_HAS_GIGANTIC_PAGE and can
exceed PUD_ORDER.

To fix it, let's make powerpc select CONFIG_ARCH_HAS_GIGANTIC_PAGE with
hugetlb on powerpc, and increase the maximum folio size with hugetlb to 16
GiB on 64bit (possible on arm64 and powerpc) and 1 GiB on 32 bit
(powerpc).  Note that on some powerpc configurations, whether we actually
have gigantic pages depends on the setting of CONFIG_ARCH_FORCE_MAX_ORDER,
but there is nothing really problematic about setting it unconditionally:
we just try to keep the value small so we can better detect problems in
__dump_folio() and inconsistencies around the expected largest folio in
the system.

Ideally, we'd have a better way to obtain the maximum hugetlb folio size
and detect ourselves whether we really end up with gigantic folios.  Let's
defer bigger changes and fix the warnings first.

While at it, handle gigantic DAX folios more clearly: DAX can only end up
creating gigantic folios with HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.

Add a new Kconfig option HAVE_GIGANTIC_FOLIOS to make both cases clearer.
In particular, worry about ARCH_HAS_GIGANTIC_PAGE only with HUGETLB_PAGE.

Note: with enabling CONFIG_ARCH_HAS_GIGANTIC_PAGE on powerpc, we will now
also allow for runtime allocations of folios in some more powerpc configs.
I don't think this is a problem, but if it is we could handle it through
__HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED.

While __dump_page()/__dump_folio was also problematic (not handling
dumping of tail pages of such gigantic folios correctly), it doesn't seem
critical enough to mark it as a fix.

Link: https://lkml.kernel.org/r/20251114214920.2550676-1-david@kernel.org
Fixes: 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Closes: https://lore.kernel.org/r/3e043453-3f27-48ad-b987-cc39f523060a@csgroup.eu/
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/r/94377f5c-d4f0-4c0f-b0f6-5bf1cd7305b1@linux.ibm.com/
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 weeks agoMerge tag 's390-6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 15 Nov 2025 17:01:00 +0000 (09:01 -0800)] 
Merge tag 's390-6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Heiko Carstens:

 - Fix a bug in the __ptep_rdp() inline assembly which may lead to
   missing TLB flushes

* tag 's390-6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Fix __ptep_rdp() inline assembly

4 weeks agoMerge tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 15 Nov 2025 16:55:29 +0000 (08:55 -0800)] 
Merge tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Update the list of AMD microcode minimum Entrysign revisions

 - Add additional fixed AMD RDSEED microcode revisions

 - Update the language transliteration for Kiryl Shutsemau's name
   in the MAINTAINERS entry

* tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev
  x86/CPU/AMD: Add additional fixed RDSEED microcode revisions
  MAINTAINERS: Update name spelling

4 weeks agoMerge tag 'timers-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 15 Nov 2025 16:51:43 +0000 (08:51 -0800)] 
Merge tag 'timers-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix a memory leak in the posix timer creation logic"

* tag 'timers-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timers: Plug potential memory leak in do_timer_create()

4 weeks agoMerge tag 'irq-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 15 Nov 2025 16:48:51 +0000 (08:48 -0800)] 
Merge tag 'irq-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Fix an irqchip driver release bug in the riscv-intc irqchip driver"

* tag 'irq-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/riscv-intc: Add missing free() callback in riscv_intc_domain_ops

4 weeks agoMerge tag 'core-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 15 Nov 2025 16:46:18 +0000 (08:46 -0800)] 
Merge tag 'core-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fix from Ingo Molnar:
 "Fix a broken #ifndef in the <linux/entry-virt.h> header.

  It hasn't caused problems upstream yet because no arch overrides
  arch_xfer_to_guest_mode_handle_work() at this moment"

* tag 'core-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Fix ifndef around arch_xfer_to_guest_mode_handle_work() stub

4 weeks agoMerge tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 14 Nov 2025 23:45:31 +0000 (15:45 -0800)] 
Merge tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Cache the ASPM L0s/L1 Supported bits early so quirks can override
   them if necessary (Bjorn Helgaas)

 - Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi
   device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn
   Helgaas)

* tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi
  PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports
  PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports
  PCI/ASPM: Convert quirks to override advertised link states
  PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states
  PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden

4 weeks agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Fri, 14 Nov 2025 23:39:39 +0000 (15:39 -0800)] 
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix interaction between livepatch and BPF fexit programs (Song Liu)
   With Steven and Masami acks.

 - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa)
   With Steven and Masami acks.

 - Fix out of bounds access in widen_imprecise_scalars() in the verifier
   (Eduard Zingerman)

 - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen)

 - Fix net_sched storage collision with BPF data_meta/data_end (Eric
   Dumazet)

 - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking
   them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Test widen_imprecise_scalars() with different stack depth
  bpf: account for current allocated stack depth in widen_imprecise_scalars()
  bpf: Add bpf_prog_run_data_pointers()
  selftests/bpf: Add mptcp test with sockmap
  mptcp: Fix proto fallback detection with BPF
  mptcp: Disallow MPTCP subflows from sockmap
  selftests/bpf: Add stacktrace ips test for raw_tp
  selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
  x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
  Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"
  bpf: add _impl suffix for bpf_stream_vprintk() kfunc
  bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
  selftests/bpf: Add tests for livepatch + bpf trampoline
  ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()
  ftrace: Fix BPF fexit with livepatch

4 weeks agoMerge tag 'rust-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda...
Linus Torvalds [Fri, 14 Nov 2025 23:36:15 +0000 (15:36 -0800)] 
Merge tag 'rust-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust fix from Miguel Ojeda:

 - Fix a Rust 1.91.0 build issue due to 'bindings.o' not containing
   DWARF debug information anymore by teaching gendwarfksyms to skip
   object files without exports

* tag 'rust-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  gendwarfksyms: Skip files with no exports

4 weeks agoMerge tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 14 Nov 2025 21:44:23 +0000 (13:44 -0800)] 
Merge tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Various fixes when using NFS with TLS

 - Localio direct-IO fixes

 - Fix error handling in nfs_atomic_open_v23()

 - Fix sysfs memory leak when nfs_client kobject add fails

 - Fix an incorrect parameter when calling nfs4_call_sync()

 - Fix a failing LTP test when using delegated timestamps

* tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix LTP test failures when timestamps are delegated
  NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()
  NFS: sysfs: fix leak when nfs_client kobject add fails
  NFSv2/v3: Fix error handling in nfs_atomic_open_v23()
  nfs/localio: do not issue misaligned DIO out-of-order
  nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
  nfs/localio: backfill missing partial read support for misaligned DIO
  nfs/localio: add refcounting for each iocb IO associated with NFS pgio header
  nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
  NFS: Check the TLS certificate fields in nfs_match_client()
  pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS
  pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect()
  pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()

4 weeks agoMerge tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 14 Nov 2025 21:39:15 +0000 (13:39 -0800)] 
Merge tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes, amdgpu and vmwgfx making up the most of it, along with
  panthor and i915/xe.

  Seems about right for this time of development, nothing major
  outstanding.

  client:
   - Fix description of module parameter

  panthor:
   - Flush writes before mapping buffers

  vmwgfx:
   - Improve command validation
   - Improve ref counting
   - Fix cursor-plane support

  amdgpu:
   - Disallow P2P DMA for GC 12 DCC surfaces
   - ctx error handling fix
   - UserQ fixes
   - VRR fix
   - ISP fix
   - JPEG 5.0.1 fix

  amdkfd:
   - Save area check fix
   - Fix GPU mappings for APU after prefetch

  i915:
   - Fix PSR's pipe to vblank conversion
   - Disable Panel Replay on MST links

  xe:
   - New HW workarounds affecting PTL and WCL platforms

* tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernel:
  drm/client: fix MODULE_PARM_DESC string for "active"
  drm/i915/dp_mst: Disable Panel Replay
  drm/amdkfd: Fix GPU mappings for APU after prefetch
  drm/amdkfd: relax checks for over allocation of save area
  drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1
  drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BO
  drm/amd/display: Allow VRR params change if unsynced with the stream
  drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_process
  drm/amdgpu: jump to the correct label on failure
  drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces
  drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpg
  drm/xe/xe3: Extend wa_14023061436
  drm/xe/xe3: Add WA_14024681466 for Xe3_LPG
  drm/i915/psr: fix pipe to vblank conversion
  drm/panthor: Flush shmem writes before mapping buffers CPU-uncached
  drm/vmwgfx: Restore Guest-Backed only cursor plane support
  drm/vmwgfx: Use kref in vmw_bo_dirty
  drm/vmwgfx: Validate command header size against SVGA_CMD_MAX_DATASIZE

4 weeks agoMerge tag 'mmc-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 14 Nov 2025 21:34:36 +0000 (13:34 -0800)] 
Merge tag 'mmc-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:

 - dw_mmc-rockchip: Fix internal phase calculation

 - pxamci: Simplify and fix ->probe() error handling

 - sdhci-of-dwcmshc: Fix strbin signal delay

 - wmt-sdmmc: Fix compile test default

* tag 'mmc-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: dw_mmc-rockchip: Fix wrong internal phase calculate
  mmc: pxamci: Simplify pxamci_probe() error handling using devm APIs
  mmc: sdhci-of-dwcmshc: Change DLL_STRBIN_TAPNUM_DEFAULT to 0x4
  mmc: wmt-sdmmc: fix compile test default

4 weeks agoMerge tag 'pmdomain-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh...
Linus Torvalds [Fri, 14 Nov 2025 21:29:15 +0000 (13:29 -0800)] 
Merge tag 'pmdomain-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:

 - imx: Fix reference count leak in ->remove()

 - samsung: Rework legacy splash-screen handover workaround

 - samsung: Fix potential memleak during ->probe()

 - arm: Fix genpd leak on provider registration failure for scmi

* tag 'pmdomain-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: imx: Fix reference count leak in imx_gpc_remove
  pmdomain: samsung: Rework legacy splash-screen handover workaround
  pmdomain: arm: scmi: Fix genpd leak on provider registration failure
  pmdomain: samsung: plug potential memleak during probe

4 weeks agoMerge tag 'cxl-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Linus Torvalds [Fri, 14 Nov 2025 21:25:00 +0000 (13:25 -0800)] 
Merge tag 'cxl-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Dave Jiang:

 - Fix incorrect device handle check for Generic Initiator

 - Fix offset calculation for extended linear cache poison injection

 - Fix lockdep warning for hmem_register_resource()

* tag 'cxl-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  acpi/hmat: Fix lockdep warning for hmem_register_resource()
  cxl: Adjust offset calculation for poison injection
  acpi,srat: Fix incorrect device handle check for Generic Initiator

4 weeks agoMerge tag 'spi-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Fri, 14 Nov 2025 21:04:35 +0000 (13:04 -0800)] 
Merge tag 'spi-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few standard fixes here, plus one more interesting one from Hans
  which addresses an issue where a move in when we requested GPIOs on
  ACPI systems caused us to stop doing pinmuxing and leave things
  floating that we'd really rather not have floating"

* tag 'spi-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: Add TODO comment about ACPI GPIO setup
  spi: xilinx: increase number of retries before declaring stall
  spi: imx: keep dma request disabled before dma transfer setup
  spi: Try to get ACPI GPIO IRQ earlier

4 weeks agoMerge tag 'regulator-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Nov 2025 21:01:23 +0000 (13:01 -0800)] 
Merge tag 'regulator-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "One simple fix for a GPIO descriptor leak in the probe error handling
  for the fixed regulator"

* tag 'regulator-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fixed: fix GPIO descriptor leak on register failure

4 weeks agoMerge tag 'sound-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 14 Nov 2025 20:50:08 +0000 (12:50 -0800)] 
Merge tag 'sound-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes. All changes are device-specific, and
  nothing stands out.

   - A regression fix for HD-audio HDMI probe

   - USB-audio hardening patches for issues spotted by fuzzers

   - ASoC fixes for TAS278x, SoundWire and Cirrus

   - Usual HD-audio and USB-audio quirks"

* tag 'sound-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series
  ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe()
  ALSA: hda/tas2781: Correct the wrong project ID
  ALSA: usb-audio: Fix NULL pointer dereference in snd_usb_mixer_controls_badd
  ASoC: SDCA: bug fix while parsing mipi-sdca-control-cn-list
  ALSA: usb-audio: Fix potential overflow of PCM transfer buffer
  ALSA: hda/tas2781: Add new quirk for HP new projects
  ASoC: tas2781: fix getting the wrong device number
  ASoC: codecs: va-macro: fix resource leak in probe error path
  ASoC: tas2783A: Fix issues in firmware parsing
  ASoC: sdw_utils: fix device reference leak in is_sdca_endpoint_present()
  ASoC: cs4271: Fix regulator leak on probe failure
  ALSA: hda/hdmi: Fix breakage at probing nvhdmi-mcp driver
  ASoC: da7213: Use component driver suspend/resume
  ALSA: usb-audio: add min_mute quirk for SteelSeries Arctis
  ASoC: doc: cs35l56: Update firmware filename description for B0 silicon

4 weeks agoMerge tag 'block-6.18-20251114' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Nov 2025 18:18:45 +0000 (10:18 -0800)] 
Merge tag 'block-6.18-20251114' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixlet from Jens Axboe:
 "Been sitting on this one for a week or two, planning on sending it out
  when there were other block changes for 6.18. But as that hasn't
  materialized in the second week of sitting on it, let's flush it out.

  A previous commit updated my git tree locations, but one was missed as
  it was already set to the git.kernel.org one. But the git location swap
  also renamed the actual tree from linux-block to just linux, let's get
  that last one updated too"

* tag 'block-6.18-20251114' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  MAINTAINERS: correct git location for block layer tree

4 weeks agoMerge tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Nov 2025 17:57:30 +0000 (09:57 -0800)] 
Merge tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

 - Use the actual segments in a request when for bvec based buffers

 - Fix an odd case where the iovec might get leaked for a read/write
   request, if it was newly allocated, overflowed the alloc cache, and
   hit an early error

 - Minor tweak to the query API added in this release, returning the
   number of available entries

* tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
  io_uring/query: return number of available queries
  io_uring/rw: ensure allocated iovec gets cleared for early failure

5 weeks agoselftests/bpf: Test widen_imprecise_scalars() with different stack depth
Eduard Zingerman [Fri, 14 Nov 2025 02:57:30 +0000 (18:57 -0800)] 
selftests/bpf: Test widen_imprecise_scalars() with different stack depth

A test case for a situation when widen_imprecise_scalars() is called
with old->allocated_stack > cur->allocated_stack. Test structure:

    def widening_stack_size_bug():
      r1 = 0
      for r6 in 0..1:
        iterator_with_diff_stack_depth(r1)
        r1 = 42

    def iterator_with_diff_stack_depth(r1):
      if r1 != 42:
        use 128 bytes of stack
      iterator based loop

iterator_with_diff_stack_depth() is verified with r1 == 0 first and
r1 == 42 next. Causing stack usage of 128 bytes on a first visit and 8
bytes on a second. Such arrangement triggered a KASAN error in
widen_imprecise_scalars().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 weeks agobpf: account for current allocated stack depth in widen_imprecise_scalars()
Eduard Zingerman [Fri, 14 Nov 2025 02:57:29 +0000 (18:57 -0800)] 
bpf: account for current allocated stack depth in widen_imprecise_scalars()

The usage pattern for widen_imprecise_scalars() looks as follows:

    prev_st = find_prev_entry(env, ...);
    queued_st = push_stack(...);
    widen_imprecise_scalars(env, prev_st, queued_st);

Where prev_st is an ancestor of the queued_st in the explored states
tree. This ancestor is not guaranteed to have same allocated stack
depth as queued_st. E.g. in the following case:

    def main():
      for i in 1..2:
        foo(i)        // same callsite, differnt param

    def foo(i):
      if i == 1:
        use 128 bytes of stack
      iterator based loop

Here, for a second 'foo' call prev_st->allocated_stack is 128,
while queued_st->allocated_stack is much smaller.
widen_imprecise_scalars() needs to take this into account and avoid
accessing bpf_verifier_state->frame[*]->stack out of bounds.

Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks")
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 weeks agobpf: Add bpf_prog_run_data_pointers()
Eric Dumazet [Wed, 12 Nov 2025 12:55:16 +0000 (12:55 +0000)] 
bpf: Add bpf_prog_run_data_pointers()

syzbot found that cls_bpf_classify() is able to change
tc_skb_cb(skb)->drop_reason triggering a warning in sk_skb_reason_drop().

WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 __sk_skb_reason_drop net/core/skbuff.c:1189 [inline]
WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 sk_skb_reason_drop+0x76/0x170 net/core/skbuff.c:1214

struct tc_skb_cb has been added in commit ec624fe740b4 ("net/sched:
Extend qdisc control block with tc control block"), which added a wrong
interaction with db58ba459202 ("bpf: wire in data and data_end for
cls_act_bpf").

drop_reason was added later.

Add bpf_prog_run_data_pointers() helper to save/restore the net_sched
storage colliding with BPF data_meta/data_end.

Fixes: ec624fe740b4 ("net/sched: Extend qdisc control block with tc control block")
Reported-by: syzbot <syzkaller@googlegroups.com>
Closes: https://lore.kernel.org/netdev/6913437c.a70a0220.22f260.013b.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251112125516.1563021-1-edumazet@google.com
5 weeks agoMerge tag 'v6.18-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 14 Nov 2025 16:32:58 +0000 (08:32 -0800)] 
Merge tag 'v6.18-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:

 - Fix device reference leak in hisilicon

* tag 'v6.18-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hisilicon/qm - Fix device reference leak in qm_get_qos_value

5 weeks agoMerge tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 14 Nov 2025 16:30:48 +0000 (08:30 -0800)] 
Merge tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Multichannel reconnect channel selection fix

 - Fix for smbdirect (RDMA) disconnect bug

 - Fix for incorrect username length check

 - Fix memory leak in mount parm processing

* tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: let smbd_disconnect_rdma_connection() turn CREATED into DISCONNECTED
  smb: fix invalid username check in smb3_fs_context_parse_param()
  cifs: client: fix memory leak in smb3_fs_context_parse_param
  smb: client: fix cifs_pick_channel when channel needs reconnect

5 weeks agoposix-timers: Plug potential memory leak in do_timer_create()
Eslam Khafagy [Fri, 14 Nov 2025 12:27:39 +0000 (14:27 +0200)] 
posix-timers: Plug potential memory leak in do_timer_create()

When posix timer creation is set to allocate a given timer ID and the
access to the user space value faults, the function terminates without
freeing the already allocated posix timer structure.

Move the allocation after the user space access to cure that.

[ tglx: Massaged change log ]

Fixes: ec2d0c04624b3 ("posix-timers: Provide a mechanism to allocate a given timer ID")
Reported-by: syzbot+9c47ad18f978d4394986@syzkaller.appspotmail.com
Suggested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Eslam Khafagy <eslam.medhat1993@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251114122739.994326-1-eslam.medhat1993@gmail.com
Closes: https://lore.kernel.org/all/69155df4.a70a0220.3124cb.0017.GAE@google.com/T/
5 weeks agoirqchip/riscv-intc: Add missing free() callback in riscv_intc_domain_ops
Nick Hu [Fri, 14 Nov 2025 07:28:44 +0000 (15:28 +0800)] 
irqchip/riscv-intc: Add missing free() callback in riscv_intc_domain_ops

The irq_domain_free_irqs() helper requires that the irq_domain_ops->free
callback is implemented. Otherwise, the kernel reports the warning message
"NULL pointer, cannot free irq" when irq_dispose_mapping() is invoked to
release the per-HART local interrupts.

Set irq_domain_ops->free to irq_domain_free_irqs_top() to cure that.

Fixes: 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs")
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251114-rv-intc-fix-v1-1-a3edd1c1a868@sifive.com
5 weeks agos390/mm: Fix __ptep_rdp() inline assembly
Heiko Carstens [Thu, 13 Nov 2025 12:21:47 +0000 (13:21 +0100)] 
s390/mm: Fix __ptep_rdp() inline assembly

When a zero ASCE is passed to the __ptep_rdp() inline assembly, the
generated instruction should have the R3 field of the instruction set to
zero. However the inline assembly is written incorrectly: for such cases a
zero is loaded into a register allocated by the compiler and this register
is then used by the instruction.

This means that selected TLB entries may not be flushed since the specified
ASCE does not match the one which was used when the selected TLB entries
were created.

Fix this by removing the asce and opt parameters of __ptep_rdp(), since
all callers always pass zero, and use a hard-coded register zero for
the R3 field.

Fixes: 0807b856521f ("s390/mm: add support for RDP (Reset DAT-Protection)")
Cc: stable@vger.kernel.org
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
5 weeks agofirewire: core: Initialize topology_map.lock
Ville Syrjälä [Fri, 14 Nov 2025 14:18:50 +0000 (23:18 +0900)] 
firewire: core: Initialize topology_map.lock

Lockdep barfs on the new uninitialized spinlock.
Initialize it.

protip: enable lockdep (CONFIG_PROVE_LOCKING=y) when
        doing locking changes

firewire_ohci 0000:02:01.1: added OHCI v1.10 device as card 0, 4 IR + 4 IT contexts, quirks 0x11
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 UID: 0 PID: 1042 Comm: irq/17-firewire Not tainted 6.17.0-rc2-cl-bisect2-00026-g7d138cb269db #136 PREEMPT
Hardware name: Dell Inc. Latitude E5400                  /0D695C, BIOS A19 06/13/2013
Call Trace:
 <TASK>
 dump_stack_lvl+0x6d/0xa0
 register_lock_class+0x783/0x790
 ? find_held_lock+0x2b/0x80
 ? __mod_timer+0x110/0x320
 ? __mod_timer+0x110/0x320
 __lock_acquire+0x405/0x2600
 lock_acquire+0xca/0x2e0
 ? fw_core_handle_bus_reset+0x888/0xca0 [firewire_core]
 ? fw_core_handle_bus_reset+0x878/0xca0 [firewire_core]
 ? fw_core_handle_bus_reset+0x878/0xca0 [firewire_core]
 _raw_spin_lock+0x2e/0x40
 ? fw_core_handle_bus_reset+0x888/0xca0 [firewire_core]
 fw_core_handle_bus_reset+0x888/0xca0 [firewire_core]
 handle_selfid_complete_event+0x35c/0x7a0 [firewire_ohci]
 ? irq_thread+0x8d/0x280
 irq_thread_fn+0x18/0x50
 irq_thread+0x15a/0x280
 ? irq_check_status_bit+0x100/0x100
 ? lockdep_hardirqs_on+0x78/0x100
 ? irq_finalize_oneshot.part.0+0xc0/0xc0
 ? irq_forced_thread_fn+0x60/0x60
 kthread+0x114/0x200
 ? kthreads_online_cpu+0x110/0x110
 ret_from_fork+0x158/0x1e0
 ? kthreads_online_cpu+0x110/0x110
 ret_from_fork_asm+0x11/0x20
 </TASK>

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 7d138cb269db ("firewire: core: use spin lock specific to topology map")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
5 weeks agoALSA: usb-audio: Add native DSD quirks for PureAudio DAC series
Lushih Hsieh [Fri, 14 Nov 2025 05:20:53 +0000 (13:20 +0800)] 
ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series

The PureAudio APA DAC and Lotus DAC5 series are USB Audio
2.0 Class devices that support native Direct Stream Digital (DSD)
playback via specific vendor protocols.

Without these quirks, the devices may only function in standard
PCM mode, or fail to correctly report their DSD format capabilities
to the ALSA framework, preventing native DSD playback under Linux.

This commit adds new quirk entries for the mentioned DAC models
based on their respective Vendor/Product IDs (VID:PID), for example:
0x16d0:0x0ab1 (APA DAC), 0x16d0:0xeca1 (DAC5 series), etc.

The quirk ensures correct DSD format handling by setting the required
SNDRV_PCM_FMTBIT_DSD_U32_BE format bit and defining the DSD-specific
Audio Class 2.0 (AC2.0) endpoint configurations. This allows the ALSA
DSD API to correctly address the device for high-bitrate DSD streams,
bypassing the need for DoP (DSD over PCM).

Test on APA DAC and Lotus DAC5 SE under Arch Linux.

Tested-by: Lushih Hsieh <bruce@mail.kh.edu.tw>
Signed-off-by: Lushih Hsieh <bruce@mail.kh.edu.tw>
Link: https://patch.msgid.link/20251114052053.54989-1-bruce@mail.kh.edu.tw
Signed-off-by: Takashi Iwai <tiwai@suse.de>
5 weeks agox86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev
Borislav Petkov (AMD) [Fri, 14 Nov 2025 13:01:14 +0000 (14:01 +0100)] 
x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev

Add the minimum Entrysign revision for that model+stepping to the list
of minimum revisions.

Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/e94dd76b-4911-482f-8500-5c848a3df026@citrix.com
5 weeks agox86/CPU/AMD: Add additional fixed RDSEED microcode revisions
Mario Limonciello [Thu, 13 Nov 2025 22:35:50 +0000 (16:35 -0600)] 
x86/CPU/AMD: Add additional fixed RDSEED microcode revisions

Microcode that resolves the RDSEED failure (SB-7055 [1]) has been released for
additional Zen5 models to linux-firmware [2]. Update the zen5_rdseed_microcode
array to cover these new models.

Fixes: 607b9fb2ce24 ("x86/CPU/AMD: Add RDSEED fix for Zen5")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7055.html
Link: https://gitlab.com/kernel-firmware/linux-firmware/-/commit/6167e5566900cf236f7a69704e8f4c441bc7212a
Link: https://patch.msgid.link/20251113223608.1495655-1-mario.limonciello@amd.com
5 weeks agoMerge tag 'asoc-fix-v6.18-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git...
Takashi Iwai [Fri, 14 Nov 2025 08:47:28 +0000 (09:47 +0100)] 
Merge tag 'asoc-fix-v6.18-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.18

A small collection of fixes, all driver specific and none especially
remarkable unless you have the hardware (many not even then).

5 weeks agoMerge tag 'drm-xe-fixes-2025-11-13' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 14 Nov 2025 07:51:09 +0000 (17:51 +1000)] 
Merge tag 'drm-xe-fixes-2025-11-13' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
 - New HW workarounds affecting PTL and WCL platforms
   (Nitin Gote, Tangudu Tilak Tirumalesh)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/ay2qztgonodwson6tuzcv5napjmqbgwzv27so4ybfola34guux@xgufrrmbzyws
5 weeks agoMerge tag 'drm-intel-fixes-2025-11-13' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Fri, 14 Nov 2025 07:50:36 +0000 (17:50 +1000)] 
Merge tag 'drm-intel-fixes-2025-11-13' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix PSR's pipe to vblank conversion (Jani)
- Disable Panel Replay on MST links (Imre)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aRXdQnitzyFcokhF@intel.com
5 weeks agoMerge tag 'drm-misc-fixes-2025-11-13' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 14 Nov 2025 07:24:51 +0000 (17:24 +1000)] 
Merge tag 'drm-misc-fixes-2025-11-13' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

client:
- Fix description of module parameter

panthor:
- Flush writes before mapping buffers

vmwgfx:
- Improve command validation
- Improve ref counting
- Fix cursor-plane support

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251113132317.GA451885@linux.fritz.box
5 weeks agoMerge tag 'amd-drm-fixes-6.18-2025-11-12' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 14 Nov 2025 07:09:50 +0000 (17:09 +1000)] 
Merge tag 'amd-drm-fixes-6.18-2025-11-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-11-12:

amdgpu:
- Disallow P2P DMA for GC 12 DCC surfaces
- ctx error handling fix
- UserQ fixes
- VRR fix
- ISP fix
- JPEG 5.0.1 fix

amdkfd:
- Save area check fix
- Fix GPU mappings for APU after prefetch

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251112200930.8788-1-alexander.deucher@amd.com
5 weeks agoMerge tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio
Linus Torvalds [Fri, 14 Nov 2025 01:00:40 +0000 (17:00 -0800)] 
Merge tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio

Pull VFIO seftest fixes from Alex Williamson:

 - Fix vfio selftests to remove the expectation that the IOMMU supports
   a 64-bit IOVA space.

   These manifest both in the original set of tests introduced this
   development cycle in identity mapping the IOVA to buffer virtual
   address space, as well as the more recent boundary testing.

   Implement facilities for collecting the valid IOVA ranges from the
   backend, implement a simple IOVA allocator, and use the information
   for determining extents (Alex Mastro)

* tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio:
  vfio: selftests: replace iova=vaddr with allocated iovas
  vfio: selftests: add iova allocator
  vfio: selftests: fix map limit tests to use last available iova
  vfio: selftests: add iova range query helpers

5 weeks agoMerge tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Nov 2025 00:54:36 +0000 (16:54 -0800)] 
Merge tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - gpd-fan: Fix compilation error for non-ACPI builds, and initialize EC
   when loading the driver

* tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (gpd-fan) initialize EC on driver load for Win 4
  hwmon: (gpd-fan) Fix compilation error in non-ACPI builds

5 weeks agoMerge tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 14 Nov 2025 00:31:07 +0000 (16:31 -0800)] 
Merge tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix issues related to the handling of compressed hibernation
  images and a recent intel_pstate driver regression:

   - Fix issues related to using inadequate data types and incorrect use
     of atomic variables in the compressed hibernation images handling
     code that were introduced during the 6.9 development cycle (Mario
     Limonciello)

   - Move a X86_FEATURE_IDA check from turbo_is_disabled() to the places
     where a new value for MSR_IA32_PERF_CTL is computed in intel_pstate
     to address a regression preventing users from enabling turbo
     frequencies post-boot (Srinivas Pandruvada)"

* tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
  PM: hibernate: Fix style issues in save_compressed_image()
  PM: hibernate: Use atomic64_t for compressed_size variable
  PM: hibernate: Emit an error when image writing fails

5 weeks agoMerge tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 14 Nov 2025 00:22:36 +0000 (16:22 -0800)] 
Merge tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix issues in the ACPI CPPC library and in the recently added
  parser for the ACPI MRRM table:

   - Limit some checks in the ACPI CPPC library to online CPUs to avoid
     accessing uninitialized per-CPU variables when some CPUs are
     offline to start with, like during boot with 'nosmt=force' (Gautham
     Shenoy)

   - Rework add_boot_memory_ranges() in the ACPI MRRM table parser to
     fix memory leaks and improve error handling (Kaushlendra Kumar)"

* tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: MRRM: Fix memory leaks and improve error handling
  ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs
  ACPI: CPPC: Perform fast check switch only for online CPUs
  ACPI: CPPC: Check _CPC validity for only the online CPUs
  ACPI: CPPC: Detect preferred core availability on online CPUs

5 weeks agoMerge branch 'mptcp-fix-conflicts-between-mptcp-and-sockmap'
Martin KaFai Lau [Thu, 13 Nov 2025 17:15:42 +0000 (09:15 -0800)] 
Merge branch 'mptcp-fix-conflicts-between-mptcp-and-sockmap'

Jiayuan Chen says:

====================
mptcp: Fix conflicts between MPTCP and sockmap

Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.

sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and
implementing fast socket-level forwarding logic:
1. Users can obtain file descriptors through userspace socket()/accept()
   interfaces, then call BPF syscall to perform these replacements.
2. Users can also use the bpf_sock_hash_update helper (in sockops programs)
   to replace handlers when TCP connections enter ESTABLISHED state
  (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)

However, when combined with MPTCP, an issue arises: MPTCP creates subflow
sk's and performs TCP handshakes, so the BPF program obtains subflow sk's
and may incorrectly replace their sk_prot. We need to reject such
operations. In patch 1, we set psock_update_sk_prot to NULL in the
subflow's custom sk_prot.

Additionally, if the server's listening socket has MPTCP enabled and the
client's TCP also uses MPTCP, we should allow the combination of subflow
and sockmap. This is because the latest Golang programs have enabled MPTCP
for listening sockets by default [2]. For programs already using sockmap,
upgrading Golang should not cause sockmap functionality to fail.

Patch 2 prevents the WARNING from occurring.

Despite these patches fixing stream corruption, users of sockmap must set
GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it.

[1] truncated warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
Modules linked in:
RIP: 0010:mptcp_stream_accept+0x34c/0x380
RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
PKRU: 55555554
Call Trace:
 <TASK>
 do_accept+0xeb/0x190
 ? __x64_sys_pselect6+0x61/0x80
 ? _raw_spin_unlock+0x12/0x30
 ? alloc_fd+0x11e/0x190
 __sys_accept4+0x8c/0x100
 __x64_sys_accept+0x1f/0x30
 x64_sys_call+0x202f/0x20f0
 do_syscall_64+0x72/0x9a0
 ? switch_fpu_return+0x60/0xf0
 ? irqentry_exit_to_user_mode+0xdb/0x1e0
 ? irqentry_exit+0x3f/0x50
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
---[ end trace 0000000000000000 ]---

[2]: https://go-review.googlesource.com/c/go/+/607715
====================

Link: https://patch.msgid.link/20251111060307.194196-1-jiayuan.chen@linux.dev
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
5 weeks agoselftests/bpf: Add mptcp test with sockmap
Jiayuan Chen [Tue, 11 Nov 2025 06:02:52 +0000 (14:02 +0800)] 
selftests/bpf: Add mptcp test with sockmap

Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.

Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-4-jiayuan.chen@linux.dev
5 weeks agomptcp: Fix proto fallback detection with BPF
Jiayuan Chen [Tue, 11 Nov 2025 06:02:51 +0000 (14:02 +0800)] 
mptcp: Fix proto fallback detection with BPF

The sockmap feature allows bpf syscall from userspace, or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  syn_recv_sock()/subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''

When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_syn_recv_sock()
  subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
'''

Then, this subflow can be normally used by sockmap, which replaces the
native sk_prot with sockmap's custom sk_prot. The issue occurs when the
user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
Here, it uses sk->sk_prot to compare with the native sk_prot, but this
is incorrect when sockmap is used, as we may incorrectly set
sk->sk_socket->ops.

This fix uses the more generic sk_family for the comparison instead.

Additionally, this also prevents a WARNING from occurring:

result from ./scripts/decode_stacktrace.sh:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
(net/mptcp/protocol.c:4005)
Modules linked in:
...

PKRU: 55555554
Call Trace:
<TASK>
do_accept (net/socket.c:1989)
__sys_accept4 (net/socket.c:2028 net/socket.c:2057)
__x64_sys_accept (net/socket.c:2067)
x64_sys_call (arch/x86/entry/syscall_64.c:41)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f87ac92b83d

---[ end trace 0000000000000000 ]---

Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-3-jiayuan.chen@linux.dev
5 weeks agoperf libbfd: Ensure libbfd is initialized prior to use
Ian Rogers [Wed, 12 Nov 2025 07:43:11 +0000 (23:43 -0800)] 
perf libbfd: Ensure libbfd is initialized prior to use

Multiple threads may be creating and destroying BFD objects in
situations like `perf top`.

Without appropriate initialization crashes may occur during libbfd's
cache management.

BFD's locks require recursive mutexes, add support for these.

Committer testing:

This happens only when building with 'make BUILD_NONDISTRO=1' and having
the binutils-devel package (or equivalent) installed, i.e. linking with
binutils devel files, an opt-in perf build.

Before:

  root@x1:~# perf top
  perf: Segmentation fault
  -------- backtrace --------
  <SNIP multiple failed attempts at printing a backtrace>
  root@x1:~#

After this patch it works as before.

Closes: https://lore.kernel.org/lkml/aQt66zhfxSA80xwt@gentoo.org/
Fixes: 95931d9a594dd0b5 ("perf libbfd: Move libbfd functionality to its own file")
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agoperf test: Fix lock contention test
Ravi Bangoria [Thu, 13 Nov 2025 16:01:24 +0000 (16:01 +0000)] 
perf test: Fix lock contention test

Couple of independent fixes:

1. Wire in SIGSEGV handler that terminates the test with a failure code.

2. Use "--lock-cgroup" instead of "-g"; "-g" was proposed but never
   merged. See commit 4d1792d0a2564caf ("perf lock contention: Add
   --lock-cgroup option")

3. Call cleanup() on every normal exit so trap_cleanup() doesn't mistake
   it for an unexpected signal and emit a false-negative "Unexpected
   signal in main" message.

Before patch:

  # ./perf test -vv "lock contention"
   85: kernel lock contention analysis test:
  --- start ---
  test child forked, pid 610711
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --lock-cgroup
  Unexpected signal in test_aggr_cgroup
  ---- end(0) ----
   85: kernel lock contention analysis test                            : Ok

After patch:

  # ./perf test -vv "lock contention"
   85: kernel lock contention analysis test:
  --- start ---
  test child forked, pid 602637
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --lock-cgroup
  Testing perf lock contention --type-filter (w/ spinlock)
  Testing perf lock contention --lock-filter (w/ tasklist_lock)
  Testing perf lock contention --callstack-filter (w/ unix_stream)
  [Skip] Could not find 'unix_stream'
  Testing perf lock contention --callstack-filter with task aggregation
  [Skip] Could not find 'unix_stream'
  Testing perf lock contention --cgroup-filter
  Testing perf lock contention CSV output
  ---- end(0) ----
   85: kernel lock contention analysis test                            : Ok

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Tycho Andersen <tycho@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agoperf lock: Fix segfault due to missing kernel map
Ravi Bangoria [Thu, 13 Nov 2025 16:01:23 +0000 (16:01 +0000)] 
perf lock: Fix segfault due to missing kernel map

Kernel maps are encoded in PERF_RECORD_MMAP2 samples but "perf lock
report" and "perf lock contention" do not process MMAP2 samples.

Because of that, machine->vmlinux_map stays NULL and any later access
triggers a segmentation fault.

Fix it by adding ->mmap2() callbacks.

Fixes: 53b00ff358dc75b1 ("perf record: Make --buildid-mmap the default")
Reported-by: Tycho Andersen (AMD) <tycho@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Tycho Andersen (AMD) <tycho@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agotools headers UAPI: Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason
Arnaldo Carvalho de Melo [Thu, 13 Nov 2025 18:03:07 +0000 (15:03 -0300)] 
tools headers UAPI: Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason

To pick the changes in:

  9d7dfb95da2cb5c1 ("KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL")

The 'perf kvm-stat' tool uses the exit reasons that are included in the
VMX_EXIT_REASONS define, this new SEAMCALL isn't included there (TDCALL
is), so shouldn't be causing any change in behaviour, this patch ends up
being just addressess the following perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agoperf build: Don't fail fast path feature detection when binutils-devel is not available
Arnaldo Carvalho de Melo [Thu, 13 Nov 2025 00:57:08 +0000 (21:57 -0300)] 
perf build: Don't fail fast path feature detection when binutils-devel is not available

This is one more remnant of the BUILD_NONDISTRO series to make building
with binutils-devel opt-in due to license incompatibility.

In this case just the references at link time were still in place, which
make building the test-all.bin file fail, which wasn't detected before
probably because the last test was done with binutils-devel available,
doh.

Now:

  $ rpm -q binutils-devel
  package binutils-devel is not installed
  $ file /tmp/build/perf-tools/feature/test-all.bin
  /tmp/build/perf-tools/feature/test-all.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
  dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
  BuildID[sha1]=4b5388a346b51f1b993f0b0dbd49f4570769b03c, for GNU/Linux 3.2.0, not stripped
  $

Fixes: 970ae86307718c34 ("perf build: The bfd features are opt-in, stop testing for them by default")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agoperf header: Write bpf_prog (infos|btfs)_cnt to data file
Thomas Falcon [Fri, 7 Nov 2025 17:31:50 +0000 (11:31 -0600)] 
perf header: Write bpf_prog (infos|btfs)_cnt to data file

With commit f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF
info"), the write_bpf_( prog_info() | btf() ) functions exit without
writing anything if env->bpf_prog.(infos| btfs)_cnt is zero.

process_bpf_( prog_info() | btf() ), however, still expect a "count"
value to exist in the data file. If btf information is empty, for
example, process_bpf_btf will read garbage or some other data as the
number of btf nodes in the data file. As a result, the data file will
not be processed correctly.

Instead, write the count to the data file and exit if it is zero.

Fixes: f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF info")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5 weeks agoMerge branch 'pm-sleep'
Rafael J. Wysocki [Thu, 13 Nov 2025 20:05:46 +0000 (21:05 +0100)] 
Merge branch 'pm-sleep'

Merge fixes for issues related to the handling of compressed hibernation
images that were introduced during the 6.9 development cycle.

* pm-sleep:
  PM: hibernate: Fix style issues in save_compressed_image()
  PM: hibernate: Use atomic64_t for compressed_size variable
  PM: hibernate: Emit an error when image writing fails

5 weeks agoMerge tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka...
Linus Torvalds [Thu, 13 Nov 2025 19:42:44 +0000 (11:42 -0800)] 
Merge tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

 - Fix memory leak of objects from remote NUMA node when bulk freeing to
   a cache with sheaves (Harry Yoo)

* tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slub: fix memory leak in free_to_pcs_bulk()

5 weeks agoMerge branches 'acpi-cppc' and 'acpi-tables'
Rafael J. Wysocki [Thu, 13 Nov 2025 19:40:51 +0000 (20:40 +0100)] 
Merge branches 'acpi-cppc' and 'acpi-tables'

Merge ACPI CPPC library fixes and an ACPI MRRM table parser fix for
6.18-rc6.

* acpi-cppc:
  ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs
  ACPI: CPPC: Perform fast check switch only for online CPUs
  ACPI: CPPC: Check _CPC validity for only the online CPUs
  ACPI: CPPC: Detect preferred core availability on online CPUs

* acpi-tables:
  ACPI: MRRM: Fix memory leaks and improve error handling

5 weeks agoMerge tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 13 Nov 2025 19:37:40 +0000 (11:37 -0800)] 
Merge tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fix from Shuah Khan:
 "Fixes event-filter-function.tc tracing test failure caused when a
  first run to sample events triggers kmem_cache_free which interferes
  with the rest of the test.

  Fix this by calling sample_events twice to eliminate the
  kmem_cache_free related noise from the sampling"

* tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/tracing: Run sample events to clear page cache events