]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
10 days agolockd: Remove C macros that are no longer used
Chuck Lever [Tue, 12 May 2026 18:14:11 +0000 (14:14 -0400)] 
lockd: Remove C macros that are no longer used

The conversion of all NLMv3 procedures to xdrgen-generated
XDR functions is complete. The hand-rolled XDR size
calculation macros (Ck, No, St, Rg) and the nlm_void
structure definition served only the older implementations
and are now unused.

Also removes NLMDBG_FACILITY, which was set to the client
debug flag in server-side code but never referenced, and
corrects a comment to specify "NLMv3 Server procedures".

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 FREE_ALL procedure
Chuck Lever [Tue, 12 May 2026 18:14:10 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 FREE_ALL procedure

With all other NLMv3 procedures now converted to xdrgen-generated
XDR functions, the FREE_ALL procedure can be converted as well.
This conversion allows the removal of nlmsvc_retrieve_args(),
a 52-line helper function that was used only by FREE_ALL to
retrieve client information from lockd's internal data
structures.

Replace the NLMPROC_FREE_ALL entry in the nlmsvc_procedures
array with an entry that uses xdrgen-built XDR decoders and
encoders. The procedure handler is updated to use the new
wrapper structure (nlm_notify_wrapper) and call
nlm3svc_lookup_host() directly, eliminating the need for the
now-removed helper function.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so the
zeroing memset performed by the dispatch layer is not needed. The
nlm_notify_wrapper structure has no members beyond the xdrgen
substructure, so no further initialization is required.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 NM_LOCK procedure
Chuck Lever [Tue, 12 May 2026 18:14:09 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 NM_LOCK procedure

Now that nlmsvc_do_lock() has been introduced to handle both
monitored and non-monitored lock requests, the NLMv3 NM_LOCK
procedure can be converted to use xdrgen-generated XDR
functions. This conversion allows the removal of
__nlmsvc_proc_lock(), a helper function that was previously
shared between the LOCK and NM_LOCK procedures.

Replace the NLMPROC_NM_LOCK entry in the nlmsvc_procedures
array with an entry that uses xdrgen-built XDR decoders and
encoders. The procedure handler is reduced to a thin wrapper
around nlmsvc_do_lock() with the monitored flag set to false.

The pc_argzero=0 choice was justified for the LOCK conversion
and applies unchanged here, since both procedures share the
same nlm_lockargs_wrapper layout and decoder.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 UNSHARE procedure
Chuck Lever [Tue, 12 May 2026 18:14:08 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 UNSHARE procedure

Convert the NLMv3 UNSHARE procedure to use xdrgen-generated XDR
functions nlm_svc_decode_nlm_shareargs and
nlm_svc_encode_nlm_shareres.

The procedure handler is updated to use the wrapper structures
(nlm_shareargs_wrapper and nlm_shareres_wrapper) introduced by
the SHARE conversion patch and accesses arguments through the
argp->xdrgen hierarchy.

The .pc_argzero field is set to zero because the generated
decoder fills argp->xdrgen before the procedure runs, so the
zeroing memset performed by the dispatch layer is no longer
needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 SHARE procedure
Chuck Lever [Tue, 12 May 2026 18:14:07 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 SHARE procedure

Convert the NLMv3 SHARE procedure to use xdrgen-generated XDR
functions nlm_svc_decode_nlm_shareargs and
nlm_svc_encode_nlm_shareres.

This patch introduces struct nlm_shareargs_wrapper and struct
nlm_shareres_wrapper to bridge between the xdrgen-generated
structures and the internal lockd types. The procedure handler
is updated to access arguments through the argp->xdrgen
hierarchy and uses nlm3svc_lookup_host and nlm3svc_lookup_file
for host and file resolution.

The .pc_argzero field is set to zero because the generated
decoder fills argp->xdrgen before the procedure runs, so the
zeroing memset performed by the dispatch layer is no longer
needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Convert NLMv3 server-side undefined procedures to xdrgen
Chuck Lever [Tue, 12 May 2026 18:14:06 +0000 (14:14 -0400)] 
lockd: Convert NLMv3 server-side undefined procedures to xdrgen

Complete the xdrgen migration of NLMv3 server-side
procedures by converting the three unused procedure slots
(17, 18, and 19). These slots already returned
rpc_proc_unavail; they are converted here only to retire
the last users of the hand-coded nlmsvc_decode_void and
nlmsvc_encode_void helpers.

The three undefined procedure entries now use the xdrgen
functions nlm_svc_decode_void and nlm_svc_encode_void. The
nlmsvc_proc_unused function is also moved earlier in the
file to follow the convention of placing procedure
implementations before the procedure table.

The pc_argsize, pc_ressize, and pc_argzero fields are now
set to zero since no arguments or results are processed.
Setting pc_xdrressize to XDR_void reflects that these
procedures return no reply payload; the previous value of
St over-reserved a status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 SM_NOTIFY procedure
Chuck Lever [Tue, 12 May 2026 18:14:05 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 SM_NOTIFY procedure

Continue the xdrgen migration by converting NLMv3 SM_NOTIFY,
a private callback from statd to notify lockd when a remote
host has rebooted.  The procedure now uses
nlm_svc_decode_nlm_notifyargs and nlm_svc_encode_void,
generated from the NLM version 3 protocol specification.

A new struct nlm_notifyargs_wrapper bridges between the
xdrgen-generated nlm_notifyargs and the lockd_reboot
structure expected by nlm_host_rebooted().  The wrapper
contains both the xdrgen-decoded arguments and a reboot
field for the existing API.

Setting pc_argzero to zero is safe because the generated
decoder fills argp->xdrgen before the procedure runs, so
the zeroing memset performed by the dispatch layer is no
longer needed.

Setting pc_xdrressize to XDR_void reflects that SM_NOTIFY
returns no data; the previous value of St over-reserved a
status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 GRANTED_RES procedure
Chuck Lever [Tue, 12 May 2026 18:14:04 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 GRANTED_RES procedure

Continue the xdrgen migration by converting NLMv3 GRANTED_RES,
the callback that a remote NLM uses to return async GRANTED
results to this lockd.  The procedure now uses
nlm_svc_decode_nlm_res and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification.

Setting pc_argzero to zero is safe because the generated
decoder fills the argp->xdrgen subfields before the procedure
runs, so the zeroing memset performed by the dispatch layer
is no longer needed.

Setting pc_xdrressize to XDR_void reflects that GRANTED_RES, as
a callback, returns no data; the previous value of St
over-reserved a status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 UNLOCK_RES procedure
Chuck Lever [Tue, 12 May 2026 18:14:03 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 UNLOCK_RES procedure

Continue the xdrgen migration by converting NLMv3 UNLOCK_RES,
the callback that a remote NLM uses to return async UNLOCK
results to this lockd.  The procedure now uses
nlm_svc_decode_nlm_res and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification.

Setting pc_argzero to zero is safe because the generated
decoder fills the argp->xdrgen subfields before the procedure
runs, so the zeroing memset performed by the dispatch layer
is no longer needed.

Setting pc_xdrressize to XDR_void reflects that UNLOCK_RES, as
a callback, returns no data; the previous value of St
over-reserved a status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 CANCEL_RES procedure
Chuck Lever [Tue, 12 May 2026 18:14:02 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 CANCEL_RES procedure

Continue the xdrgen migration by converting NLMv3 CANCEL_RES,
the callback that a remote NLM uses to return async CANCEL
results to this lockd.  The procedure now uses
nlm_svc_decode_nlm_res and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification.

Setting pc_argzero to zero is safe because the generated
decoder fills the argp->xdrgen subfields before the procedure
runs, so the zeroing memset performed by the dispatch layer
is no longer needed.

Setting pc_xdrressize to XDR_void reflects that CANCEL_RES, as
a callback, returns no data; the previous value of St
over-reserved a status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 LOCK_RES procedure
Chuck Lever [Tue, 12 May 2026 18:14:01 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 LOCK_RES procedure

Continue the xdrgen migration by converting NLMv3 LOCK_RES,
the callback that a remote NLM uses to return async LOCK
results to this lockd.  The procedure now uses
nlm_svc_decode_nlm_res and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification.

Setting pc_argzero to zero is safe because the generated
decoder fills the argp->xdrgen subfields before the procedure
runs, so the zeroing memset performed by the dispatch layer
is no longer needed.

Setting pc_xdrressize to XDR_void reflects that LOCK_RES, as
a callback, returns no data; the previous value of St
over-reserved a status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 TEST_RES procedure
Chuck Lever [Tue, 12 May 2026 18:14:00 +0000 (14:14 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 TEST_RES procedure

Continue the xdrgen migration by converting NLMv3 TEST_RES,
the callback that a remote NLM uses to return async TEST
results to this lockd. The procedure now uses
nlm_svc_decode_nlm_testres and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification.

Setting pc_argzero to zero is safe because the generated
decoder fills the argp->xdrgen subfields before the procedure
runs, so the zeroing memset performed by the dispatch layer
is no longer needed.

Setting pc_xdrressize to XDR_void reflects that TEST_RES, as
a callback, returns no data; the previous value of St
over-reserved a status word in the reply buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 GRANTED_MSG procedure
Chuck Lever [Tue, 12 May 2026 18:13:59 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 GRANTED_MSG procedure

Continue the xdrgen migration by converting NLMv3 GRANTED_MSG,
the async counterpart to GRANTED that a remote NLM uses to tell
this lockd that a previously blocked client lock request has
become available. The procedure now uses
nlm_svc_decode_nlm_testargs and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification. The procedure
handler reaches the xdrgen types through the
nlm_testargs_wrapper structure, which bridges between generated
code and the legacy lockd_lock representation.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so
the zeroing memset performed by the dispatch layer is not
needed. The lock member of the wrapper is populated explicitly
in __nlmsvc_proc_granted_msg() by nlm_lock_to_lockd_lock()
rather than relying on zero-initialization.

The NLM async callback mechanism uses client-side functions
which continue to take legacy results like struct lockd_res,
preventing GRANTED and GRANTED_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 UNLOCK_MSG procedure
Chuck Lever [Tue, 12 May 2026 18:13:58 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 UNLOCK_MSG procedure

Continue the xdrgen migration by converting NLMv3 UNLOCK_MSG, the
async counterpart to UNLOCK that clients use to release locks
without waiting for a reply. The procedure now uses
nlm_svc_decode_nlm_unlockargs and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification. The procedure
handler reaches the xdrgen types through the
nlm_unlockargs_wrapper structure, which bridges between generated
code and the legacy lockd_lock representation.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so
the zeroing memset performed by the dispatch layer is not needed.
The lock member of the wrapper is populated explicitly in
nlm3svc_lookup_file() rather than relying on zero-initialization.

The NLM async callback mechanism uses client-side functions which
continue to take legacy results like struct lockd_res, preventing
UNLOCK and UNLOCK_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 CANCEL_MSG procedure
Chuck Lever [Tue, 12 May 2026 18:13:57 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 CANCEL_MSG procedure

The CANCEL_MSG procedure is part of NLM's asynchronous lock
request flow, where clients send CANCEL_MSG to cancel pending
lock requests. This patch continues the xdrgen migration by
converting CANCEL_MSG to use generated XDR functions.

This patch converts the CANCEL_MSG procedure to use xdrgen
functions nlm_svc_decode_nlm_cancargs and nlm_svc_encode_void
generated from the NLM version 3 protocol specification. The
procedure handler uses xdrgen types through the
nlm_cancargs_wrapper structure that bridges between generated
code and the legacy lockd_lock representation.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so the
zeroing memset performed by the dispatch layer is not needed. The
lock member of the wrapper is populated explicitly in
nlm3svc_lookup_file() rather than relying on zero-initialization.

The previous hand-written decoder in svcxdr_decode_cookie()
rewrote a zero-length NLM cookie into a four-byte zero cookie,
with a comment attributing the substitution to HP-UX clients.
The xdrgen-generated netobj decoder performs no such rewrite, so
a zero-length request cookie now round-trips unchanged into the
CANCEL_RES reply. HP-UX has reached end of support, and CANCEL_MSG
is fire-and-forget with no client-side reply matching on the NLM
cookie, so the workaround is dropped intentionally here.

The NLM async callback mechanism uses client-side functions
which continue to take legacy results like struct lockd_res,
preventing CANCEL and CANCEL_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 LOCK_MSG procedure
Chuck Lever [Tue, 12 May 2026 18:13:56 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 LOCK_MSG procedure

Continue the xdrgen migration by converting NLMv3 LOCK_MSG, the
async counterpart to LOCK that clients use to request locks that
may block. The procedure now uses nlm_svc_decode_nlm_lockargs and
nlm_svc_encode_void, generated from the NLM version 3 protocol
specification. The procedure handler reaches the xdrgen types
through the nlm_lockargs_wrapper structure, which bridges between
generated code and the legacy lockd_lock representation.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs,
so the zeroing memset performed by the dispatch layer is not
needed. The lock member of the wrapper is populated explicitly in
nlm3svc_lookup_file() rather than relying on zero-initialization.

The NLM async callback mechanism uses client-side functions which
continue to take legacy results like struct lockd_res, preventing
LOCK and LOCK_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 TEST_MSG procedure
Chuck Lever [Tue, 12 May 2026 18:13:55 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 TEST_MSG procedure

Continue the xdrgen migration by converting NLMv3 TEST_MSG, the
async counterpart to TEST that clients use to check lock
availability without blocking. The procedure now uses
nlm_svc_decode_nlm_testargs and nlm_svc_encode_void, generated
from the NLM version 3 protocol specification. The procedure
handler reaches the xdrgen types through the
nlm_testargs_wrapper structure, which bridges between generated
code and the legacy lockd_lock representation.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so
the zeroing memset performed by the dispatch layer is not
needed. The lock member of the wrapper is populated explicitly
in nlm3svc_lookup_file() rather than relying on
zero-initialization.

The NLM async callback mechanism uses client-side functions
which continue to take legacy results like struct lockd_res,
preventing TEST and TEST_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Refactor nlmsvc_callback()
Chuck Lever [Tue, 12 May 2026 18:13:54 +0000 (14:13 -0400)] 
lockd: Refactor nlmsvc_callback()

The xdrgen-based XDR conversion requires each RPC procedure to
extract its own arguments, since xdrgen generates distinct
argument structures for each procedure rather than using a
single shared type.

Move the host lookup logic from nlmsvc_callback() into each
of the five MSG procedure handlers (TEST_MSG, LOCK_MSG,
CANCEL_MSG, UNLOCK_MSG, and GRANTED_MSG). Each handler now
performs its own host lookup from rqstp->rq_argp and passes
the resulting host pointer to nlmsvc_callback(). This
establishes the per-procedure argument-handling pattern that
the subsequent xdrgen conversion patches require.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 GRANTED procedure
Chuck Lever [Tue, 12 May 2026 18:13:53 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 GRANTED procedure

The NLM GRANTED procedure allows servers to notify clients when
a previously blocked lock request has been granted, completing
the asynchronous lock request flow. This patch converts the NLMv3
GRANTED procedure to use xdrgen-generated XDR functions.

The conversion replaces the legacy decoder with the xdrgen
functions nlm_svc_decode_nlm_testargs and nlm_svc_encode_nlm_res
generated from the NLM version 3 protocol specification. The
procedure handler accesses xdrgen types through a wrapper structure
that bridges between generated code and the legacy lockd_lock
representation still used by the core lockd logic.

A new helper function nlm_lock_to_lockd_lock() converts an xdrgen
nlm_lock into the legacy lockd_lock format. The helper complements
the existing nlm3svc_lookup_host() and nlm3svc_lookup_file()
functions used throughout this series.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so the
zeroing memset performed by the dispatch layer is not needed. The
helper populates each field of the wrapper's lock member that any
downstream consumer reads: fh, oh, svid, and the file_lock byte
range. Because pc_argzero no longer scrubs the rq_argp slot, the
shared nlmclnt_lock_event tracepoint class is updated to source
its byte-range fields from lock->fl.fl_start and lock->fl.fl_end,
which both the client and server populate unconditionally; the old
lock_start and lock_len fields are no longer required by the trace.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 UNLOCK procedure
Chuck Lever [Tue, 12 May 2026 18:13:52 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 UNLOCK procedure

The NLM UNLOCK procedure allows clients to release held locks,
completing the basic lock lifecycle alongside TEST, LOCK, and
CANCEL procedures already converted in this series.

Convert UNLOCK to use the xdrgen functions
nlm_svc_decode_nlm_unlockargs and nlm_svc_encode_nlm_res
generated from the NLM version 3 protocol specification, reusing
the nlm3svc_lookup_host() and nlm3svc_lookup_file() helpers
introduced earlier in the series. The procedure handler uses
xdrgen types through a wrapper structure that bridges between
generated code and the legacy lockd_lock representation still
used by the core lockd logic.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so
the zeroing memset performed by the dispatch layer is not needed.
The lock member of the wrapper is populated explicitly in
nlm3svc_lookup_file() rather than relying on zero-initialization.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 CANCEL procedure
Chuck Lever [Tue, 12 May 2026 18:13:51 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 CANCEL procedure

The NLM CANCEL procedure allows clients to cancel outstanding
blocked lock requests. This patch continues the xdrgen migration
by converting the CANCEL procedure. CANCEL reuses the
nlm3svc_lookup_host() and nlm3svc_lookup_file() helpers
established in the TEST procedure conversion.

This patch converts the CANCEL procedure to use xdrgen functions
nlm_svc_decode_nlm_cancargs and nlm_svc_encode_nlm_res generated
from the NLM version 3 protocol specification. The procedure
handler uses xdrgen types through a wrapper structure that
bridges between generated code and the legacy lockd_lock
representation still used by the core lockd logic.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so the
zeroing memset performed by the dispatch layer is not needed. The
lock member of the wrapper is populated explicitly in
nlm3svc_lookup_file() rather than relying on zero-initialization.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 LOCK procedure
Chuck Lever [Tue, 12 May 2026 18:13:50 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 LOCK procedure

The NLM LOCK procedure requires the same host and file lookup
operations established in the TEST procedure conversion. This
patch extends the xdrgen migration to the LOCK procedure,
leveraging the shared nlm3svc_lookup_host() and
nlm3svc_lookup_file() helpers to establish consistent patterns
across the series.

This patch converts the LOCK procedure to use xdrgen functions
nlm_svc_decode_nlm_lockargs and nlm_svc_encode_nlm_res generated
from the NLM version 3 protocol specification. The procedure
handler uses xdrgen types through wrapper structures that bridge
between generated code and the legacy lockd_lock representation
still used by the core lockd logic.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so
the zeroing memset performed by the dispatch layer is not needed.
The cookie and lock members of the wrapper are populated
explicitly in nlm_netobj_to_cookie() and nlm3svc_lookup_file()
rather than relying on zero-initialization.

The hand-rolled svcxdr_decode_cookie() previously substituted a
four-byte zero cookie when a zero-length cookie arrived on the
wire, a compatibility shim for HP-UX clients that had been
carried in fs/lockd/ since the original import. The xdrgen
decoder reproduces the cookie verbatim, and
nlm_netobj_to_cookie() copies whatever length the peer sent. As
subsequent patches replace the remaining call sites of
svcxdr_decode_cookie(), this series retires that HP-UX compat
behavior on the server side.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 TEST procedure
Chuck Lever [Tue, 12 May 2026 18:13:49 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 TEST procedure

The NLM TEST procedure requires host and file lookups to check
lock state, operations that will be common across multiple NLM
procedures being migrated to xdrgen. Introducing the
nlm3svc_lookup_host() and nlm3svc_lookup_file() helpers now keeps
these common patterns in one place for subsequent conversions in
this series.

This patch converts the TEST procedure to use xdrgen functions
nlm_svc_decode_nlm_testargs and nlm_svc_encode_nlm_testres
generated from the NLM version 3 protocol specification. The
procedure handler is rewritten to use xdrgen types through wrapper
structures that bridge between generated code and the legacy
lockd_lock representation still used by the core lockd logic.

Setting pc_argzero to zero is safe because the generated decoder
fills the argp->xdrgen subfields before the procedure runs, so the
zeroing memset performed by the dispatch layer is not needed. The
lock member of the wrapper is populated explicitly in
nlm3svc_lookup_file() rather than relying on zero-initialization.

The conflicting holder's offset and length are saturated to
NLM_OFFSET_MAX when constructing the reply. A conflicting lock
established by an NLMv4 client or by a local process can sit
beyond the NLMv3 signed 32-bit range, and copying fl_start and
fl_end straight into the unsigned 32-bit XDR fields would wrap
and report a bogus range. The previous hand-written encoder in
svcxdr_encode_holder() used loff_t_to_s32() for the same reason,
but this patch series intends to separate the concerns of data
conversion (XDR) from dealing with local byte range constraints,
so clamping is hoisted into the proc function.

The previous hand-written decoder in svcxdr_decode_cookie()
rewrote a zero-length NLM cookie into a four-byte zero cookie,
with a comment attributing the substitution to HP-UX clients.
The xdrgen-generated netobj decoder performs no such rewrite, so
a zero-length request cookie now round-trips unchanged into the
reply. HP-UX has reached end of support, and NLM_TEST reply
matching relies on the RPC XID rather than the NLM cookie, so
the workaround is dropped intentionally here.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Use xdrgen XDR functions for the NLMv3 NULL procedure
Chuck Lever [Tue, 12 May 2026 18:13:48 +0000 (14:13 -0400)] 
lockd: Use xdrgen XDR functions for the NLMv3 NULL procedure

Hand-written XDR encoders and decoders are difficult to maintain
and can diverge from protocol specifications. Migrating to
xdrgen-generated code improves type safety and ensures the
implementation matches the NLM version 3 protocol specification
exactly.

Convert the NULL procedure to use nlm_svc_decode_void and
nlm_svc_encode_void, generated from
Documentation/sunrpc/xdr/nlm3.x. NULL has no arguments or
results, so it is the first procedure converted.

NULL returns no XDR-encoded data, so pc_xdrressize is set to
XDR_void. The argzero field is also set to zero since xdrgen
decoders initialize all decoded values.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Rename struct nlm_share to lockd_share
Chuck Lever [Tue, 12 May 2026 18:13:47 +0000 (14:13 -0400)] 
lockd: Rename struct nlm_share to lockd_share

As part of the effort to enable lockd's server-side XDR functions to
be generated from the NLM protocol specification (using xdrgen), the
internal type names must be changed to avoid conflicts with the
machine-generated type names.

Rename struct nlm_share to struct lockd_share to avoid conflicts with
the NLMv3 XDR type definitions that will be introduced when svcproc.c
is converted to use xdrgen.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Rename struct nlm_reboot to lockd_reboot
Chuck Lever [Tue, 12 May 2026 18:13:46 +0000 (14:13 -0400)] 
lockd: Rename struct nlm_reboot to lockd_reboot

As part of the effort to enable lockd's server-side XDR functions to
be generated from the NLM protocol specification (using xdrgen), the
internal type names must be changed to avoid conflicts with the
machine-generated type names.

Rename struct nlm_reboot to struct lockd_reboot for consistency with
the other renamed internal types.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Rename struct nlm_res to lockd_res
Chuck Lever [Tue, 12 May 2026 18:13:45 +0000 (14:13 -0400)] 
lockd: Rename struct nlm_res to lockd_res

As part of the effort to enable lockd's server-side XDR functions to
be generated from the NLM protocol specification (using xdrgen), the
internal type names must be changed to avoid conflicts with the
machine-generated type names.

Rename struct nlm_res to struct lockd_res to avoid conflicts with
the NLMv3 XDR type definitions that will be introduced when svcproc.c
is converted to use xdrgen.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Rename struct nlm_args to lockd_args
Chuck Lever [Tue, 12 May 2026 18:13:44 +0000 (14:13 -0400)] 
lockd: Rename struct nlm_args to lockd_args

As part of the effort to enable lockd's server-side XDR functions to
be generated from the NLM protocol specification (using xdrgen), the
internal type names must be changed to avoid conflicts with the
machine-generated type names.

Rename struct nlm_args to struct lockd_args to avoid conflicts with
the NLMv3 XDR type definitions that will be introduced when
svcproc.c is converted to use xdrgen.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Rename struct nlm_lock to lockd_lock
Chuck Lever [Tue, 12 May 2026 18:13:43 +0000 (14:13 -0400)] 
lockd: Rename struct nlm_lock to lockd_lock

A subsequent patch will convert fs/lockd/svcproc.c to use
machine-generated XDR encoding and decoding functions in a
manner similar to fs/lockd/svc4proc.c. Machine-generated
types derived from the NLM specification will conflict with
the internal types of the same name.

Rename the internal struct nlm_lock type to lockd_lock to
avoid such naming conflicts.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Rename struct nlm_cookie to lockd_cookie
Chuck Lever [Tue, 12 May 2026 18:13:42 +0000 (14:13 -0400)] 
lockd: Rename struct nlm_cookie to lockd_cookie

Machine-generated XDR types derived from the NLM specification
use names that match the protocol. Internal lockd types with
identical names cause compilation failures when machine-generated
encoders replace hand-coded ones.

Rename the internal struct nlm_cookie type to lockd_cookie to
prevent such collisions. The "lockd_" prefix distinguishes
implementation-specific types from specified NLM protocol types.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoDocumentation: Add the RPC language description of NLM version 3
Chuck Lever [Tue, 12 May 2026 18:13:41 +0000 (14:13 -0400)] 
Documentation: Add the RPC language description of NLM version 3

In order to generate source code to encode and decode NLMv3 protocol
elements, include a copy of the RPC language description of NLMv3
for xdrgen to process. The language description is derived from the
Open Group's XNFS specification:

  https://pubs.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_03

The C code committed here was generated from the new nlm3.x file
using tools/net/sunrpc/xdrgen/xdrgen.

The goals of replacing hand-written XDR functions with ones that
are tool-generated are to improve memory safety and make XDR
encoding and decoding less brittle to maintain. Parts of the
NFSv4 protocol are still being extended actively. Tool-generated
XDR code reduces the time it takes to get a working implementation
of new protocol elements.

The xdrgen utility derives both the type definitions and the
encode/decode functions directly from protocol specifications,
using names and symbols familiar to anyone who knows those specs.
Unlike hand-written code that can inadvertently diverge from the
specification, xdrgen guarantees that the generated code matches
the specification exactly.

We would eventually like xdrgen to generate Rust code as well,
making the conversion of the kernel's NFS stacks to use Rust just
a little easier for us.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Do not monitor when looking up the LOCK_MSG callback host
Chuck Lever [Tue, 12 May 2026 18:13:40 +0000 (14:13 -0400)] 
lockd: Do not monitor when looking up the LOCK_MSG callback host

A LOCK_MSG handler that fails to obtain a host returns
rpc_system_err, which causes the dispatcher to send an RPC-level
error rather than an NLM LOCK_RES denial. Before the xdrgen
conversion, the outer host lookup was unmonitored, so an NSM
upcall failure was reported back to the client through LOCK_RES
with status nlm_lck_denied_nolocks generated by the inner helper.

The xdrgen conversion replaced the unmonitored lookup with
nlm4svc_lookup_host(..., true). When nsm_monitor() fails, the
outer lookup now returns NULL, so the procedure short-circuits to
rpc_system_err and __nlm4svc_proc_lock_msg() never runs. The
client therefore receives no LOCK_RES, regressing the legacy
behavior.

The inner helper still performs a monitored lookup while building
the LOCK_RES, so the outer call only needs an unmonitored host
reference for the callback path. Pass false here to restore the
previous semantics.

Fixes: b2be4e28c23a ("lockd: Use xdrgen XDR functions for the NLMv4 LOCK_MSG procedure")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Translate nlm__int__deadlock in __nlm4svc_proc_lock_msg()
Chuck Lever [Tue, 12 May 2026 18:13:39 +0000 (14:13 -0400)] 
lockd: Translate nlm__int__deadlock in __nlm4svc_proc_lock_msg()

When nlmsvc_lock() detects a deadlock it returns the internal
sentinel nlm__int__deadlock (30001), which version-specific
handlers must translate to a wire-valid status before the reply
is encoded.  The xdrgen LOCK_MSG handler stores the sentinel
unmodified in resp->status; the LOCK_RES callback then places
30001 on the v4 wire, where the client rejects the reply.

Commit 9e0d0c619407 ("lockd: Introduce nlm__int__deadlock")
established the translation boundary and updated the synchronous
v4 path nlm4svc_do_lock(), but the xdrgen LOCK_MSG handler added
later in commit b2be4e28c23a ("lockd: Use xdrgen XDR functions
for the NLMv4 LOCK_MSG procedure") missed the corresponding
remap.  Apply the same translation in __nlm4svc_proc_lock_msg()
so deadlock results are reported as nlm4_deadlock on LOCK_RES.

Fixes: b2be4e28c23a ("lockd: Use xdrgen XDR functions for the NLMv4 LOCK_MSG procedure")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Drop locks_init_lock() from nlm4_lock_to_lockd_lock()
Chuck Lever [Tue, 12 May 2026 18:13:38 +0000 (14:13 -0400)] 
lockd: Drop locks_init_lock() from nlm4_lock_to_lockd_lock()

The NLMv4 GRANTED helper passes the wrapper's lock to
nlmclnt_grant(), which compares only fl_start, fl_end, svid, and
fh, and the shared nlmclnt_lock_event tracepoint now sources its
byte-range fields from fl_start and fl_end as well. Both fl_start
and fl_end are set unconditionally by lockd_set_file_lock_range4()
on the line below, so the locks_init_lock() call left no observable
effect: every other field of struct file_lock is unread on the
GRANTED path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Correct kernel-doc status descriptions for NLMv4 GRANTED
Chuck Lever [Tue, 12 May 2026 18:13:37 +0000 (14:13 -0400)] 
lockd: Correct kernel-doc status descriptions for NLMv4 GRANTED

NLM_GRANTED is a server-to-client callback; the local node
responds in the role of the client. The kernel-doc for
nlm4svc_proc_granted attributes NLM4_DENIED and
NLM4_DENIED_GRACE_PERIOD to "the server", but per the Open
Group XNFS specification the responder for this procedure is
the client host, and NLM4_DENIED_GRACE_PERIOD identifies the
client's own grace period after a reboot, not the server's.

Rewrite the descriptions to match the spec: NLM4_DENIED
reflects the generic internal-resource-constraint failure, and
NLM4_DENIED_GRACE_PERIOD attributes the grace period to the
client host that received the callback.

Fixes: 7a9f7c8f934e ("lockd: Use xdrgen XDR functions for the NLMv4 GRANTED procedure")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agolockd: Stop warning on nlm__int__drop_reply in !V4 cast_status
Chuck Lever [Tue, 12 May 2026 18:13:36 +0000 (14:13 -0400)] 
lockd: Stop warning on nlm__int__drop_reply in !V4 cast_status

cast_status folds internal lock-daemon sentinels into NLMv1/v3
wire status codes.  The !CONFIG_LOCKD_V4 variant warns when an
unrecognized status falls into the internal-sentinel range,
gated by be32_to_cpu(status) >= 30000.

nlm__int__drop_reply is defined as cpu_to_be32(30000), so it
sits at the lower edge of that range and trips pr_warn_once
("lockd: unhandled internal status %u").  The status is
returned unchanged so the reply is still dropped, but every
dropped reply on a !CONFIG_LOCKD_V4 build emits a spurious
warning.

Compare against nlm__int__drop_reply directly so the warning
still catches the genuinely unexpected sentinels deadlock,
stale_fh, and failed (30001 through 30003) but excludes the
legitimate dropped-reply marker.

Fixes: d343fce148a4 ("[PATCH] knfsd: Allow lockd to drop replies as appropriate")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agosvcrdma: Defer send context release to xpo_release_ctxt
Chuck Lever [Wed, 6 May 2026 15:26:51 +0000 (11:26 -0400)] 
svcrdma: Defer send context release to xpo_release_ctxt

Send completion currently queues a work item to an unbound
workqueue for each completed send context. Under load, the
Send Completion handlers contend for the shared workqueue
pool lock.

Replace the workqueue with a per-transport lock-free list
(llist). The Send completion handler appends the send_ctxt
to sc_send_release_list and does no further teardown. The
nfsd thread drains the list in xpo_release_ctxt between
RPCs, performing DMA unmapping, chunk I/O resource release,
and page release in a batch.

This eliminates both the workqueue pool lock and the DMA
unmap cost from the Send completion path. DMA unmapping can
be expensive when an IOMMU is present in strict mode, as
each unmap triggers a synchronous hardware IOTLB
invalidation. Moving it to the nfsd thread, where that
latency is harmless, avoids penalizing completion handler
throughput.

The nfsd threads absorb the release cost at a point where
the client is no longer waiting on a reply, and natural
batching amortizes the overhead when completions arrive
faster than RPCs complete.

A self-enqueue backstops drain on a quiescing transport.
When svc_rdma_send_ctxt_put() observes that its llist_add()
transitions sc_send_release_list from empty to non-empty,
it sets XPT_DATA and calls svc_xprt_enqueue() so that
svc_xprt_ready() schedules an nfsd thread. The thread
enters svc_rdma_recvfrom(), finds no pending receive,
clears XPT_DATA, and returns 0; svc_xprt_release() then
runs xpo_release_ctxt and drains the list. Under steady
load the foreground drain keeps the list non-empty between
adds and no enqueue fires; only the trailing edge of a
burst pays for a wakeup. Without this path, a Send
completion arriving after the last xpo_release_ctxt on an
idle connection would leave the send_ctxt's DMA mappings
and reply pages pinned until the next RPC, send-context
exhaustion, or transport close.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agosvcrdma: Release write chunk resources without re-queuing
Chuck Lever [Wed, 6 May 2026 15:26:50 +0000 (11:26 -0400)] 
svcrdma: Release write chunk resources without re-queuing

Each RDMA Send completion triggers a cascade of work items on the
svcrdma_wq unbound workqueue:

  ib_cq_poll_work (on ib_comp_wq, per-CPU)
    -> svc_rdma_send_ctxt_put -> queue_work    [work item 1]
      -> svc_rdma_write_info_free -> queue_work [work item 2]

Every transition through queue_work contends on the unbound
pool's spinlock. Profiling an 8KB NFSv3 read/write workload
over RDMA shows about 4% of total CPU cycles spent on this
lock, with the cascading re-queue of write_info release
contributing roughly 1%.

The initial queue_work in svc_rdma_send_ctxt_put is needed to
move release work off the CQ completion context (which runs on
a per-CPU bound workqueue). However, once executing on
svcrdma_wq, there is no need to re-queue for each write_info
structure. svc_rdma_reply_chunk_release already calls
svc_rdma_cc_release inline from the same svcrdma_wq context,
and svc_rdma_recv_ctxt_put does the same from nfsd thread
context.

Release write chunk resources inline in
svc_rdma_write_info_free, removing the intermediate
svc_rdma_write_info_free_async work item and the wi_work
field from struct svc_rdma_write_info.

Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove dead rpcsec_gss_krb5 definitions
Chuck Lever [Mon, 27 Apr 2026 13:51:02 +0000 (09:51 -0400)] 
SUNRPC: Remove dead rpcsec_gss_krb5 definitions

The migration to crypto/krb5 eliminated the per-enctype
function dispatch and direct crypto API usage, leaving
behind a number of orphaned definitions.

Remove the following from gss_krb5.h:

 - GSS_KRB5_K5CLENGTH, used only by removed key derivation
 - KG_TOK_MIC_MSG and KG_TOK_WRAP_MSG (Kerberos v1 token
   types; v1 support was dropped earlier)
 - KG2_TOK_INITIAL and KG2_TOK_RESPONSE (context
   establishment token types; no remaining users)
 - KG2_RESP_FLAG_ERROR and KG2_RESP_FLAG_DELEG_OK
 - enum sgn_alg and enum seal_alg (v1 algorithm constants)
 - All CKSUMTYPE_* definitions, now duplicated by
   KRB5_CKSUMTYPE_* in <crypto/krb5.h>
 - The KG_ error constants from gssapi_err_krb5.h, which
   have no remaining users
 - The ENCTYPE_* constant block, replaced by KRB5_ENCTYPE_*
   from <crypto/krb5.h>
 - KG_USAGE_SEAL/SIGN/SEQ (3DES usage constants)
 - KEY_USAGE_SEED_CHECKSUM/ENCRYPTION/INTEGRITY, duplicated
   by <crypto/krb5.h>
 - #include <crypto/skcipher.h>, no longer needed

Remove the cksum[] field from struct krb5_ctx in
gss_krb5_internal.h; no code reads or writes it after the
key derivation removal.

Switch gss_krb5_enctypes[] in gss_krb5_mech.c to the
canonical KRB5_ENCTYPE_* names from <crypto/krb5.h>.

Remove stale #include directives:
 - <crypto/skcipher.h> from gss_krb5_wrap.c
 - <linux/random.h> and <linux/crypto.h> from
   gss_krb5_seal.c

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove redundant crypto Kconfig dependencies
Chuck Lever [Mon, 27 Apr 2026 13:51:01 +0000 (09:51 -0400)] 
SUNRPC: Remove redundant crypto Kconfig dependencies

With all per-message crypto operations now routed through
crypto/krb5, rpcsec_gss_krb5 no longer calls individual
crypto algorithms directly. The CRYPTO_KRB5 symbol already
selects CRYPTO_SKCIPHER and CRYPTO_HASH (the latter
transitively via CRYPTO_HMAC).

Drop the top-level select CRYPTO_SKCIPHER and select
CRYPTO_HASH from RPCSEC_GSS_KRB5, as these are redundant
with CRYPTO_KRB5's own dependencies.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove per-enctype Kconfig options
Chuck Lever [Mon, 27 Apr 2026 13:51:00 +0000 (09:51 -0400)] 
SUNRPC: Remove per-enctype Kconfig options

The RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA1,
RPCSEC_GSS_KRB5_ENCTYPES_CAMELLIA, and
RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2 Kconfig options
originally gated both algorithm availability and the
advertised enctype list. Now that per-message crypto
operations are routed through crypto/krb5, these options
control only which enctype numbers appear in the gssd
upcall string; the underlying algorithms are always
present.

Remove the per-enctype Kconfig options and replace the
ifdef-gated enctype table with a candidate list looked
up in the crypto/krb5 enctype table at module init
time. Each enctype is included in the advertised list
only if crypto_krb5_find_enctype() finds it in the
library's enctype table. When a new enctype is added
to crypto/krb5, adding its constant to the candidate
array is sufficient to begin advertising it.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove dead code from rpcsec_gss_krb5
Chuck Lever [Mon, 27 Apr 2026 13:50:59 +0000 (09:50 -0400)] 
SUNRPC: Remove dead code from rpcsec_gss_krb5

With all per-message crypto operations routed through crypto/krb5,
a substantial body of code in rpcsec_gss_krb5 has no remaining
callers. The internal key derivation functions (krb5_derive_key_v2,
krb5_kdf_hmac_sha2, krb5_kdf_feedback_cmac) and the low-level
crypto primitives (krb5_encrypt, gss_krb5_checksum, krb5_cbc_cts_
encrypt/decrypt, krb5_etm_checksum) are unreachable because their
only call sites were the per-enctype function pointers removed in
previous patches. Delete gss_krb5_keys.c entirely and strip the
dead functions from gss_krb5_crypto.c.

The KUnit test suite in gss_krb5_test.c exercised exactly these
internal functions: RFC 3961 n-fold, RFC 3962 key derivation,
RFC 6803 Camellia key derivation, and RFC 8009 AES-SHA2 key
derivation, plus encryption self-tests that drove the now-removed
encrypt routines. The corresponding test coverage is provided by
the crypto/krb5 selftests in crypto/krb5/selftest.c. Remove the
test file, the RPCSEC_GSS_KRB5_KUNIT_TEST Kconfig symbol, the
.kunitconfig, and all VISIBLE_IF_KUNIT / EXPORT_SYMBOL_IF_KUNIT
annotations.

xdr_process_buf() walked xdr_buf segments through a per-segment
callback and existed solely for the crypto routines in
gss_krb5_crypto.c. With that file removed, xdr_process_buf()
has no remaining callers. Its successor, xdr_buf_to_sg(),
populates a scatterlist directly from an xdr_buf byte range
and was introduced earlier in this series.

With every consumer of struct gss_krb5_enctype removed, replace
its remaining uses with the equivalent fields from struct
krb5_enctype (key_len). Remove struct gss_krb5_enctype, the
supported_gss_krb5_enctypes[] table, gss_krb5_lookup_enctype(),
and the gk5e pointer from krb5_ctx.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove legacy skcipher/ahash handles from krb5_ctx
Chuck Lever [Mon, 27 Apr 2026 13:50:58 +0000 (09:50 -0400)] 
SUNRPC: Remove legacy skcipher/ahash handles from krb5_ctx

Previous patches switched all per-message crypto operations
(encrypt, decrypt, get_mic, verify_mic) from the internal
skcipher/ahash primitives to crypto/krb5 AEAD and shash
handles. The old crypto_sync_skcipher and crypto_ahash fields in
struct krb5_ctx are no longer referenced at runtime.

Remove the ten legacy handle fields from struct krb5_ctx
along with the key derivation and handle allocation code in
gss_krb5_import_ctx_v2() that populated them. Context import
now prepares only the four crypto/krb5 handles (two AEAD for
encryption, two shash for checksums). The corresponding cleanup
in gss_krb5_delete_sec_context() and the error path is likewise
reduced.

The krb5_derive_key() inline wrapper, gss_krb5_alloc_cipher_v2(),
and gss_krb5_alloc_hash_v2() become unused and are removed.
The per-enctype encrypt/decrypt functions (gss_krb5_aes_encrypt,
gss_krb5_aes_decrypt, krb5_etm_encrypt, krb5_etm_decrypt) that
were the sole remaining consumers of these fields are also removed;
their function-pointer call sites were already deleted in earlier
patches.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove encrypt/decrypt function pointers from enctype table
Chuck Lever [Mon, 27 Apr 2026 13:50:57 +0000 (09:50 -0400)] 
SUNRPC: Remove encrypt/decrypt function pointers from enctype table

All enctypes now route through gss_krb5_aead_encrypt() and
gss_krb5_aead_decrypt(). The per-enctype .encrypt and .decrypt
function pointers served the same purpose as .get_mic and
.wrap before them: dispatching v1 versus v2 implementations.
With v1 support long removed and the Camellia decrypt path
migrated in a preceding patch, every table entry points to
the same pair of functions.

Call gss_krb5_aead_encrypt() and gss_krb5_aead_decrypt()
directly from gss_krb5_wrap_v2() and gss_krb5_unwrap_v2(),
and drop the function pointers from struct gss_krb5_enctype.

While here, propagate the GSS status code returned by
gss_krb5_aead_decrypt() instead of discarding it.
The old indirect call sites returned GSS_S_FAILURE
unconditionally, losing the distinction between an
integrity failure (GSS_S_BAD_SIG) and a structural
error (GSS_S_DEFECTIVE_TOKEN).

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove wrap/unwrap function pointers from enctype table
Chuck Lever [Mon, 27 Apr 2026 13:50:56 +0000 (09:50 -0400)] 
SUNRPC: Remove wrap/unwrap function pointers from enctype table

Every enctype points .wrap and .unwrap at gss_krb5_wrap_v2()
and gss_krb5_unwrap_v2(). As with get_mic/verify_mic, the
indirection dates from when v1 enctypes had different wrap
implementations. Call the functions directly and remove the
pointers from struct gss_krb5_enctype.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Remove get_mic/verify_mic function pointers from enctype table
Chuck Lever [Mon, 27 Apr 2026 13:50:55 +0000 (09:50 -0400)] 
SUNRPC: Remove get_mic/verify_mic function pointers from enctype table

Every enctype in the table points .get_mic and .verify_mic at
the same pair of functions. The indirection served no purpose
after the v1 enctype support was removed. Call
gss_krb5_get_mic_v2() and gss_krb5_verify_mic_v2() directly
from the GSS mechanism dispatch and drop the function pointers
from struct gss_krb5_enctype.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Switch MIC token verification to crypto/krb5
Chuck Lever [Mon, 27 Apr 2026 13:50:54 +0000 (09:50 -0400)] 
SUNRPC: Switch MIC token verification to crypto/krb5

gss_krb5_verify_mic_v2() currently recomputes a checksum using
gss_krb5_checksum() and then compares it against the received
checksum with memcmp(). Replace this with a call to
crypto_krb5_verify_mic(), which performs the hash, comparison,
and offset/length adjustment in a single operation through the
crypto/krb5 library.

The scatterlist layout required by RFC 4121 Section 4.2.4 is
constructed via gss_krb5_mic_build_sg(), the shared helper
introduced in the preceding commit. The received checksum
occupies the first scatterlist entry, pointing directly into
the token buffer.

The errno result from crypto_krb5_verify_mic() is mapped to a
GSS major status code via gss_krb5_errno_to_status(), which
returns GSS_S_BAD_SIG for -EBADMSG (checksum mismatch).

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Switch MIC token generation to crypto/krb5
Chuck Lever [Mon, 27 Apr 2026 13:50:53 +0000 (09:50 -0400)] 
SUNRPC: Switch MIC token generation to crypto/krb5

gss_krb5_get_mic_v2() currently computes the MIC checksum by
driving a crypto_ahash directly, calling gss_krb5_checksum()
with the message body and GSS token header. Replace this with
a call to crypto_krb5_get_mic(), which performs the same keyed
hash operation through the crypto/krb5 library.

RFC 4121 Section 4.2.4 specifies that the checksum covers the
message body followed by the token header. Because the
crypto/krb5 metadata parameter is hashed before the data, the
GSS header cannot be passed as metadata. Instead, the header
is appended to the scatterlist after the body data, producing
the correct hash input ordering without using the metadata
parameter.

The scatterlist layout is:
  [checksum_output | message_body | gss_header]

The first scatterlist entry points directly into the
token buffer, so the checksum is written in place.

A shared helper, gss_krb5_mic_build_sg(), is introduced in
gss_krb5_crypto.c to construct this scatterlist layout. The
helper handles overflow allocation and scatterlist chaining
for large xdr_buf page arrays. It is reused by the verify_mic
counterpart in the following commit.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Switch Camellia decrypt to crypto/krb5
Chuck Lever [Mon, 27 Apr 2026 13:50:52 +0000 (09:50 -0400)] 
SUNRPC: Switch Camellia decrypt to crypto/krb5

The Camellia enctypes (RFC 6803) use the same MtE authenticated
encryption construction as AES-SHA1 (RFC 3962), implemented in
crypto/krb5 by the rfc3961_simplified profile. The encrypt path
already uses gss_krb5_aead_encrypt() for Camellia, but the decrypt
path was left on the old gss_krb5_aes_decrypt() code when the AES
enctypes were migrated.

Switch the Camellia .decrypt callback to gss_krb5_aead_decrypt() to
complete the AEAD migration for all enctypes. The conf_len and
cksum_len values in crypto/krb5's Camellia enctype descriptors match
the block size and checksum length that gss_krb5_aes_decrypt() was
using, so the headskip and tailskip returned to the unwrap layer are
unchanged.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Switch wrap token decryption to crypto/krb5
Chuck Lever [Mon, 27 Apr 2026 13:50:51 +0000 (09:50 -0400)] 
SUNRPC: Switch wrap token decryption to crypto/krb5

Replace the per-enctype .decrypt callbacks (gss_krb5_aes_decrypt
and krb5_etm_decrypt) with a single gss_krb5_aead_decrypt()
wrapper that delegates to crypto_krb5_decrypt().

The new wrapper builds a scatterlist covering the secured
region (confounder through checksum), passes it to the AEAD
decrypt operation, and derives the confounder and checksum
lengths from the data offset and length that
crypto_krb5_decrypt() reports. The caller's token header
verification and buffer adjustment logic is unchanged.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Switch wrap token encryption to crypto/krb5
Chuck Lever [Mon, 27 Apr 2026 13:50:50 +0000 (09:50 -0400)] 
SUNRPC: Switch wrap token encryption to crypto/krb5

Replace the per-enctype .encrypt callbacks (gss_krb5_aes_encrypt and
krb5_etm_encrypt) with a single gss_krb5_aead_encrypt() wrapper that
delegates to crypto_krb5_encrypt().

The xdr_buf setup -- GSS header insertion, confounder space
allocation, and token header copy -- remains unchanged. The
difference is that the CBC-CTS encryption and HMAC computation are
now a single AEAD operation through the crypto/krb5 library. Both
the MtE construction (RFC 3962) and the EtM construction (RFC 8009)
are handled transparently by the AEAD transform.

The plaintext page data must be copied from the page cache pages to
the scratch output pages before building the scatterlist, since the
AEAD operates in-place rather than using separate input and output
scatterlists.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Prepare crypto/krb5 encryption and checksum handles
Chuck Lever [Mon, 27 Apr 2026 13:50:49 +0000 (09:50 -0400)] 
SUNRPC: Prepare crypto/krb5 encryption and checksum handles

Allocate crypto_aead handles for encryption (one per direction)
and crypto_shash handles for checksumming (one per direction)
using the crypto/krb5 library's key preparation functions.

These four handles derive their subkeys from the session key
and the RFC 4121 usage numbers and are ready for use in
encrypt, decrypt, get_mic, and verify_mic operations.

The existing crypto_sync_skcipher and crypto_ahash handles
remain in place for now; subsequent patches switch the
per-message operations to the new handles and then remove
the old ones.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Add errno-to-GSS status conversion helper
Chuck Lever [Mon, 27 Apr 2026 13:50:48 +0000 (09:50 -0400)] 
SUNRPC: Add errno-to-GSS status conversion helper

The crypto/krb5 library returns standard negative errno values,
but the GSS mechanism layer reports results as GSS_S_* major
status codes. A translation is needed at each call site that
will be switched to the new library.

Rather than open-coding the mapping in every wrapper, provide a
single helper function.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Add helpers to convert xdr_buf byte ranges to scatterlists
Chuck Lever [Mon, 27 Apr 2026 13:50:47 +0000 (09:50 -0400)] 
SUNRPC: Add helpers to convert xdr_buf byte ranges to scatterlists

The crypto/krb5 library accepts data in scatterlist form, but
the GSS-API layer presents RPC payloads as struct xdr_buf.
Bridge that gap with a pair of helper functions:

  xdr_buf_to_sg()        - populate a caller-supplied scatterlist
                           array from a byte range
  xdr_buf_to_sg_alloc()  - populate a caller-supplied inline
                           scatterlist, chaining to a heap-
                           allocated overflow for large payloads

The inline array (typically stack-allocated at eight entries)
covers the common case of small RPCs with no heap allocation
on the encrypt/decrypt path. Only buffers spanning many pages
incur a kmalloc for the chained extension.

The segment-walking logic follows the same head, page array,
tail traversal as xdr_process_buf(), but populates a
scatterlist directly rather than invoking a per-segment
callback. sg_next() traversal makes the walker safe for
chained scatterlists. Once subsequent patches reroute all
per-message crypto operations through crypto/krb5,
xdr_process_buf() loses its last callers and is removed.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Add crypto/krb5 enctype lookup to krb5_ctx
Chuck Lever [Mon, 27 Apr 2026 13:50:46 +0000 (09:50 -0400)] 
SUNRPC: Add crypto/krb5 enctype lookup to krb5_ctx

Each krb5_ctx currently points to a gss_krb5_enctype, the
rpcsec_gss_krb5 module's own enctype descriptor. To begin
using the common crypto/krb5 library, store a pointer to the
corresponding struct krb5_enctype (from <crypto/krb5.h>) as
well.

The lookup is performed in gss_import_v2_context() immediately
after the existing gss_krb5_lookup_enctype() call. If
crypto_krb5_find_enctype() cannot find a matching enctype the
context import fails, ensuring the module never operates with
a partially-initialized krb5_ctx.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoSUNRPC: Add Kconfig dependency on CRYPTO_KRB5
Chuck Lever [Mon, 27 Apr 2026 13:50:45 +0000 (09:50 -0400)] 
SUNRPC: Add Kconfig dependency on CRYPTO_KRB5

The rpcsec_gss_krb5 module currently contains its own Kerberos 5
crypto implementation (key derivation, encryption, checksumming)
that duplicates functionality available in the common crypto/krb5
library. As a first step toward migrating to that library, add a
Kconfig select so that building rpcsec_gss_krb5 pulls in the
common Kerberos 5 crypto support.

The per-enctype Kconfig options (AES_SHA1, CAMELLIA, AES_SHA2)
remain: they continue to gate which encryption types are offered
by the GSS mechanism. The individual crypto algorithm selects
they carry become redundant once the migration is complete, since
CRYPTO_KRB5 already selects all needed ciphers and hashes.

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Increase the default max_block_size to 4MB
Chuck Lever [Mon, 20 Apr 2026 15:38:30 +0000 (11:38 -0400)] 
NFSD: Increase the default max_block_size to 4MB

Commit 8a81f16de64f ("NFSD: Add a "default" block size") introduced
NFSSVC_DEFBLKSIZE at 1MB, well below the 4MB NFSSVC_MAXBLKSIZE
ceiling, with the stated intent that a later change would raise the
default.

Raising the default reduces per-RPC overhead on fast networks by
amortizing header processing and scheduling costs across larger
payloads. The halving loop in nfsd_get_default_max_blksize()
constrains the returned value to 1/4096 of available RAM, so the
new 4MB default takes effect only on systems with at least 16GB of
RAM. Smaller machines continue to receive the same computed value
as before. Administrators can still override the computed value
through /proc/fs/nfsd/max_block_size.

On systems where the new default takes effect,
svc_sock_setbufsize() sizes each service socket's send and receive
buffers as nreqs * max_mesg * 2. Quadrupling max_mesg therefore
quadruples the per-socket buffer reservation at a fixed thread
count, which operators tuning large thread pools should account
for.

Note well: Your NFS client implementation must support large read
and write size settings to benefit from this change.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Close cached file handles when revoking export state
Chuck Lever [Sun, 19 Apr 2026 18:53:07 +0000 (14:53 -0400)] 
NFSD: Close cached file handles when revoking export state

When NFSD_CMD_UNLOCK_EXPORT revokes NFSv4 state for an export path,
GC-managed nfsd_file entries for files under that path may remain
in the file cache.  These cached handles hold the underlying
filesystem busy, preventing a subsequent unmount.

Add nfsd_file_close_export(), which walks the nfsd_file hash table
and closes GC-eligible entries whose underlying file resides on the
same filesystem and is a descendant of the export path.  Because
nfsd_file entries do not carry an export reference, the ancestry
check uses is_subdir() on the file's dentry.  False positives --
closing a cached handle that did not originate from the target
export -- are harmless; the handle is simply reopened on the next
access.

The handler calls nfsd_file_close_export() before revoking NFSv4
state, mirroring the order used by NFSD_CMD_UNLOCK_FILESYSTEM
(which cancels copies and releases NLM locks before revoking
state).  Both calls run under nfsd_mutex.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Add NFSD_CMD_UNLOCK_EXPORT netlink command
Chuck Lever [Sun, 19 Apr 2026 18:53:06 +0000 (14:53 -0400)] 
NFSD: Add NFSD_CMD_UNLOCK_EXPORT netlink command

When a filesystem is exported to NFS clients, NFSv4 state
(opens, locks, delegations, layouts) holds references that
prevent the underlying filesystem from being unmounted.
NFSD_CMD_UNLOCK_FILESYSTEM addresses this at superblock
granularity, but administrators unexporting a single path on a
shared filesystem (e.g., one of several exports on the same device)
need finer control.

Add NFSD_CMD_UNLOCK_EXPORT, which revokes NFSv4 state acquired
through exports of a specific path.  Matching is by path identity
(dentry + vfsmount) via the sc_export field on each nfs4_stid,
so multiple svc_export objects for the same path -- one per
auth_domain -- are handled correctly without requiring the caller
to name a specific client.

The command takes a single "path" attribute.  Userspace (exportfs
-u) sends this after removing the last client for a given path,
enabling the underlying filesystem to be unmounted.  When multiple
clients share an export path, individual unexports do not trigger
state revocation; only the final one does.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Track svc_export in nfs4_stid
Chuck Lever [Sun, 19 Apr 2026 18:53:05 +0000 (14:53 -0400)] 
NFSD: Track svc_export in nfs4_stid

Add an sc_export field to struct nfs4_stid so that each stateid
records the export under which it was acquired.  The export
reference is taken via exp_get() at stateid creation and released
via exp_put() in nfs4_put_stid().

Open stateids record the export from current_fh->fh_export.
Lock stateids and delegations inherit the export from their
parent open stateid. Layout stateids inherit from their
parent stateid. Directory delegations record the export from
cstate->current_fh.

A subsequent commit uses sc_export to scope state revocation to a
specific export, avoiding the need to walk inode dentry aliases at
revocation time.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Replace idr_for_each_entry_ul in find_one_sb_stid()
Chuck Lever [Sun, 19 Apr 2026 18:53:04 +0000 (14:53 -0400)] 
NFSD: Replace idr_for_each_entry_ul in find_one_sb_stid()

Replace idr_for_each_entry_ul() with a while loop over
idr_get_next_ul() for consistency with find_one_export_stid(),
added in a subsequent commit.

No change in behavior.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Add NFSD_CMD_UNLOCK_FILESYSTEM netlink command
Chuck Lever [Sun, 19 Apr 2026 18:53:03 +0000 (14:53 -0400)] 
NFSD: Add NFSD_CMD_UNLOCK_FILESYSTEM netlink command

Add NFSD_CMD_UNLOCK_FILESYSTEM as a dedicated netlink command for
revoking NFS state under a filesystem path, providing a netlink
equivalent of /proc/fs/nfsd/unlock_fs.

The command requires a "path" string attribute containing the
filesystem path whose state should be released. The handler
resolves the path to its superblock, then cancels async copies,
releases NLM locks, and revokes NFSv4 state on that superblock.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Add NFSD_CMD_UNLOCK_IP netlink command
Chuck Lever [Sun, 19 Apr 2026 18:53:02 +0000 (14:53 -0400)] 
NFSD: Add NFSD_CMD_UNLOCK_IP netlink command

The existing write_unlock_ip procfs interface releases NLM file
locks held by a specific client IP address, but procfs provides
no structured way to extend that operation to other scopes such as
revoking NFSv4 state.

Add NFSD_CMD_UNLOCK_IP as a dedicated netlink command for
releasing NLM locks by client address. The command accepts a
binary sockaddr_in or sockaddr_in6 in its address attribute.
The handler validates the address family and length, then calls
nlmsvc_unlock_all_by_ip() to release matching NLM locks.  Because
lockd is a single global instance, that call operates across
all network namespaces regardless of which namespace the caller
inhabits.

A separate netlink command for filesystem-scoped unlock is added in
a subsequent commit.

The nfsd_ctl_unlock_ip tracepoint is updated from string-based
address logging to __sockaddr, which stores the binary sockaddr
and formats it with %pISpc. This affects both the new netlink path
and the existing procfs write_unlock_ip path, giving consistent
structured output in both cases.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Extract revoke_one_stid() utility function
Chuck Lever [Sun, 19 Apr 2026 18:53:01 +0000 (14:53 -0400)] 
NFSD: Extract revoke_one_stid() utility function

The per-stateid revocation logic in nfsd4_revoke_states() handles
four stateid types in a deeply nested switch. Extract two helpers:

revoke_ol_stid() performs admin-revocation of an open or lock
stateid with st_mutex already held: marks the stateid as
SC_STATUS_ADMIN_REVOKED, closes POSIX locks for lock stateids,
and releases file access.

revoke_one_stid() dispatches by sc_type, acquires st_mutex with
the appropriate lockdep class for open and lock stateids, and
handles delegation unhash and layout close inline.

No functional change. Preparation for adding export-scoped state
revocation which reuses revoke_one_stid().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Handle layout stid in nfsd4_drop_revoked_stid()
Chuck Lever [Sun, 19 Apr 2026 18:53:00 +0000 (14:53 -0400)] 
NFSD: Handle layout stid in nfsd4_drop_revoked_stid()

nfsd4_drop_revoked_stid() has no SC_TYPE_LAYOUT case, so when a
client sends FREE_STATEID for an admin-revoked layout stid, the
default branch releases cl_lock and returns without unhashing or
releasing the stid.  The stid remains in the IDR and on the
per-client list until the client is destroyed.

Remove the layout stid from the per-client list and call
nfs4_put_stid() to drop the creation reference.  When the
refcount reaches zero, nfsd4_free_layout_stateid() handles the
remaining cleanup: cancelling the fence worker, removing from
the per-file list, and freeing the slab object.

Fixes: 1e33e1414bec ("nfsd: allow layout state to be admin-revoked.")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoNFSD: Update my maintainer email addresses
Chuck Lever [Tue, 31 Mar 2026 15:35:57 +0000 (11:35 -0400)] 
NFSD: Update my maintainer email addresses

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
10 days agoiommu/dma: Do not try to iommu_map a 0 length region in swiotlb
Jason Gunthorpe [Mon, 8 Jun 2026 18:10:04 +0000 (15:10 -0300)] 
iommu/dma: Do not try to iommu_map a 0 length region in swiotlb

iommu_dma_iova_link_swiotlb() processes a mapping that is unaligned in three
parts, the head, middle and trailer. If the middle is empty because there
are no aligned pages it will call down to iommu_map() with a 0 size
which the iommupt implementation will fail as illegal.

It then tries to do an error unwind and starts from the wrong spot
corrupting the mapping so the eventual destruction triggers a WARN_ON.

Check for 0 length and avoid mapping and use offset not 0 as the starting
point to unlink.

This is frequently triggered by using some kinds of thunderbolt NVMe
drives that trigger forced SWIOTLB for unaligned memory. NVMe seems to
pass in oddly aligned buffers for the passthrough commands from smartctl
that hit this condition.

Cc: stable@vger.kernel.org
Fixes: 433a76207dcf ("dma-mapping: Implement link/unlink ranges API")
Reported-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/0-v1-8536728bc89f+469-swiotlb_warn_jgg@nvidia.com
10 days agoMerge branch 'bpf-lpm_trie-allow-sleepable-bpf-programs-to-use-lpm-tries'
Alexei Starovoitov [Tue, 9 Jun 2026 19:42:04 +0000 (12:42 -0700)] 
Merge branch 'bpf-lpm_trie-allow-sleepable-bpf-programs-to-use-lpm-tries'

Vlad Poenaru says:

====================
bpf, lpm_trie: Allow sleepable BPF programs to use LPM tries

trie_lookup_elem() annotates its rcu_dereference_check() walks with only
rcu_read_lock_bh_held(), so a sleepable BPF program that touches an LPM
trie (e.g. a sleepable LSM hook calling bpf_map_lookup_elem()) trips a
"suspicious RCU usage" lockdep splat on debug kernels: it holds only
rcu_read_lock_trace(), which that annotation does not accept.

Patch 1 relaxes the rcu_dereference annotations in the trie walks so they
no longer trip lockdep from the Tasks Trace context, including the
trie_update_elem()/trie_delete_elem() writer walks (protected by
trie->lock). Patch 2 adds BPF_MAP_TYPE_LPM_TRIE to the verifier's
sleepable map whitelist so sleepable programs can reference an LPM trie
directly, not just as the inner map of a map-of-maps. LPM trie nodes are
reclaimed via bpf_mem_cache_free_rcu(), which chains a regular RCU grace
period into a Tasks Trace grace period before freeing -- the same
discipline BPF_MAP_TYPE_HASH relies on for sleepable access.

Changes since v1:
- Split into a 2-patch series.
- Patch 1 now also converts the trie_update_elem()/trie_delete_elem()
  walks from rcu_dereference() to rcu_dereference_protected(*p, 1),
  addressing review feedback that v1 only fixed the lookup path and left
  the same splat on the writer paths.
- New patch 2 adds the verifier whitelist entry so the fix is actually
  reachable for directly-referenced LPM tries.
- Retitled v1 ("Allow lookups from sleepable BPF programs").

v1: https://lore.kernel.org/all/20260529174233.2954240-1-vlad.wing@gmail.com/
====================

Link: https://patch.msgid.link/20260609135558.193287-1-vlad.wing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 days agobpf: Allow sleepable programs to use LPM trie maps directly
Vlad Poenaru [Tue, 9 Jun 2026 13:55:58 +0000 (06:55 -0700)] 
bpf: Allow sleepable programs to use LPM trie maps directly

The previous change relaxed the rcu_dereference annotations in
lpm_trie.c so the trie walks no longer trip lockdep when reached from a
sleepable BPF program holding only rcu_read_lock_trace().  By itself
that only helps tries reached as the inner map of a map-of-maps, or
from the classic-RCU syscall path: a sleepable program that references
an LPM trie directly is still rejected at load time by
check_map_prog_compatibility(), whose sleepable whitelist omits
BPF_MAP_TYPE_LPM_TRIE:

  Sleepable programs can only use array, hash, ringbuf and local storage maps

LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
into a Tasks Trace grace period before the node -- and the value
embedded in it that trie_lookup_elem() returns to the program -- is
released.  That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
on for sleepable access, so a value handed to a sleepable reader cannot
be freed while the program is still running under rcu_read_lock_trace().
The writer paths take trie->lock across the walk and never relied on the
RCU read-side lock to keep nodes alive.

Add BPF_MAP_TYPE_LPM_TRIE to the sleepable map whitelist so these
programs can use LPM tries directly.

Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260609135558.193287-3-vlad.wing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 days agobpf: Allow LPM map access from sleepable BPF programs
Vlad Poenaru [Tue, 9 Jun 2026 13:55:57 +0000 (06:55 -0700)] 
bpf: Allow LPM map access from sleepable BPF programs

trie_lookup_elem() annotates its rcu_dereference_check() walks with
only rcu_read_lock_bh_held().  Because rcu_dereference_check(p, c)
resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
classic RCU readers but fails for sleepable BPF programs, which enter
via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().

trie_update_elem() and trie_delete_elem() have the same problem in a
different form: they walk the trie with plain rcu_dereference(), which
asserts rcu_read_lock_held() unconditionally.  Both are reachable from
sleepable BPF programs via the bpf_map_update_elem / bpf_map_delete_elem
helpers, and from the syscall path under classic rcu_read_lock().  In
the writer paths the trie is actually protected by trie->lock (an
rqspinlock taken across the walk); we never relied on the RCU read-side
lock to keep nodes alive there.

A sleepable LSM hook that ends up touching an LPM trie therefore
triggers lockdep on debug kernels:

  =============================
  WARNING: suspicious RCU usage
  7.1.0-... Tainted: G            E
  -----------------------------
  kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
  1 lock held by net_tests/540:
   #0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
       at: __bpf_prog_enter_sleepable+0x26/0x280
  Call Trace:
   dump_stack_lvl
   lockdep_rcu_suspicious
   trie_lookup_elem
   bpf_prog_..._enforce_security_socket_connect
   bpf_trampoline_...
   security_socket_connect
   __sys_connect
   do_syscall_64

This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
against the trie's reclaim path -- but it spams the console once per
distinct callsite on every debug kernel running a sleepable BPF LSM
that touches an LPM trie, which is increasingly common.

For the lookup path, switch the rcu_dereference_check() annotation
from rcu_read_lock_bh_held() to bpf_rcu_lock_held(), which accepts all
three contexts (classic, BH, Tasks Trace).  Other map types already
follow this convention.

For trie_update_elem() and trie_delete_elem(), annotate the walks as
rcu_dereference_protected(*p, 1) -- matching trie_free() in the same
file -- since trie->lock is held across the walk.  rqspinlock has no
lockdep_map, so the predicate degenerates to '1' rather than
lockdep_is_held(&trie->lock); the protection is real but not
machine-verifiable.  trie_get_next_key() also uses bare
rcu_dereference() but is reachable only from the BPF syscall, which
holds classic rcu_read_lock() before dispatching, so it is left
untouched.

Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
Cc: stable@vger.kernel.org
Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260609135558.193287-2-vlad.wing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 days agoRDMA/efa: Implement the query port speed verb
Tom Sela [Mon, 8 Jun 2026 08:39:27 +0000 (08:39 +0000)] 
RDMA/efa: Implement the query port speed verb

Implement the query port speed callback to report the port effective
bandwidth directly in 100 Mb/s granularity.

Link: https://patch.msgid.link/r/20260608083927.4116-1-tomsela@amazon.com
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Tom Sela <tomsela@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
10 days agoRDMA/efa: Report 800 and 1600 Gbps link speed
Tom Sela [Mon, 8 Jun 2026 08:37:36 +0000 (08:37 +0000)] 
RDMA/efa: Report 800 and 1600 Gbps link speed

Add support for reporting 800 Gbps as 8X NDR and 1600 Gbps as 8X XDR
link speeds.

Link: https://patch.msgid.link/r/20260608083736.48454-1-tomsela@amazon.com
Reviewed-by: Michael Margolin <mrgolin@amazon.com>
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Tom Sela <tomsela@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
10 days agodrm/vc4: fix krealloc() memory leak
Alexander A. Klimov [Sat, 6 Jun 2026 12:38:10 +0000 (14:38 +0200)] 
drm/vc4: fix krealloc() memory leak

Don't just overwrite the original pointer passed to krealloc()
with its return value without checking latter:

    MEM = krealloc(MEM, SZ, GFP);

If krealloc() returns NULL, that erases the pointer
to the still allocated memory, hence leaks this memory.
Instead, use a temporary variable, check it's not NULL
and only then assign it to the original pointer:

    TMP = krealloc(MEM, SZ, GFP);
    if (!TMP) return;
    MEM = TMP;

While on it, use krealloc_array().

Fixes: 6d45c81d229d ("drm/vc4: Add support for branching in shader validation.")
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patch.msgid.link/20260606123817.37222-1-grandmaster@al2klimov.de
10 days agoi2c: qcom-geni: Use pm_runtime_force_{suspend,resume} helpers
Praveen Talari [Wed, 20 May 2026 07:14:29 +0000 (12:44 +0530)] 
i2c: qcom-geni: Use pm_runtime_force_{suspend,resume} helpers

The driver carries custom system suspend/resume handling that manually
tracks a suspended state and conditionally calls
geni_i2c_runtime_suspend()
from the noirq suspend path, then adjusts runtime PM state by hand. This
duplicates PM core behavior and adds unnecessary complexity.

Drop the manual state tracking and switch to pm_runtime_force_suspend()
and pm_runtime_force_resume() for system sleep. These helpers already
perform the required checks, call the runtime PM callbacks when needed,
and keep runtime PM state transitions consistent.

Reviewed-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260520-use_pm_runtime_apis-v1-1-6a5238fc6cb6@oss.qualcomm.com
10 days agoselftests/bpf: Avoid spurious spmc parallel selftest errors in libarena
Emil Tsalapatis [Tue, 9 Jun 2026 06:36:30 +0000 (02:36 -0400)] 
selftests/bpf: Avoid spurious spmc parallel selftest errors in libarena

The libarena parallel spmc selftest is nondeterministic by design.
As a result it depends up to a point on the relative timing between the
producer and consumer threads. This introduces the possibility for two
kinds of spurious failures that this patch addresses.

1) Spurious timeouts. The test proceeds in phases, and threads use a
   common counter as a barrier to avoid proceeding to the next phase
   until all threads are ready to do so. If a thread takes too long to
   reach the barrier, the already waiting threads may time out.

   Increase the current timeout. The timeout's value is a balance
   between the maximum amount of time spent on the test and the
   possibility of spurious failures. Right now the timeout is too short.
   Err on the side of caution and significantly increase it to avoid
   spurious failures.

2) Spurious resize failures. Some selftests require the spmc queue to
   resize itself. This in turn requires for the producer side to be
   materially faster than the consumer side so that the queue gets full
   enough for a resize. However, in the benchmark the spmc queue's producer
   is outnumbered 3:1. To offset it we add busy waits for consume
   queues. However, we still see occasional failures due to the queue
   never resizing.

   Minimize the possibility for this in two ways: First, remove one of
   the consumers. The 2 consumers still exercise the "race between
   consumers" scenario. Second, increase the busy wait duration to
   decrease the rate by which the consumers act on the queue.

   While at it, also replace a stray invalid error value "153" with EINVAL.

Fixes: 42998f819256 ("selftests/bpf: libarena: parallel test harness and spmc parallel selftest")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260609063630.10245-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 days agoregulator: mt6359: Fix vbbck default internal supply name
Chen-Yu Tsai [Tue, 9 Jun 2026 08:36:27 +0000 (16:36 +0800)] 
regulator: mt6359: Fix vbbck default internal supply name

This issue was pointed out by Sashiko.

vbbck is fed internally from vio18. For the MT6359, the default supply
name was incorrectly set as "VIO18", instead of the supply's default
"VIO18". In practice this still works, but it causes the regulator
description copy and replace to always happen. For the MT6359P the
name is correct.

Fix the supply name for MT6359 so that both instances are the same and
correct. Also copy the comment about the internal supply from the MT6359
list to the MT6359P list.

Fixes: 10be8fc1d534 ("regulator: mt6359: Add regulator supply names")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20260609083630.1600070-1-wenst@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>
10 days agoRDMA/mlx5: Use strscpy() to copy strings into arrays
David Laight [Mon, 8 Jun 2026 09:54:57 +0000 (10:54 +0100)] 
RDMA/mlx5: Use strscpy() to copy strings into arrays

Replacing strcpy() with strscpy() ensures that overflow of the target
buffer cannot happen.

Link: https://patch.msgid.link/r/20260608095500.2567-2-david.laight.linux@gmail.com
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
10 days agoRDMA/usnic: User strscpy() to copy device name
David Laight [Sat, 6 Jun 2026 20:26:05 +0000 (21:26 +0100)] 
RDMA/usnic: User strscpy() to copy device name

Link: https://patch.msgid.link/r/20260606202633.5018-11-david.laight.linux@gmail.com
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
10 days agoRDMA/iwcm: User strscpy() to copy device name
David Laight [Sat, 6 Jun 2026 20:26:04 +0000 (21:26 +0100)] 
RDMA/iwcm: User strscpy() to copy device name

Link: https://patch.msgid.link/r/20260606202633.5018-10-david.laight.linux@gmail.com
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
10 days agoIB/mlx4: Fill in the access_flags if IB_MR_REREG_ACCESS is not specified
Jason Gunthorpe [Fri, 5 Jun 2026 11:53:35 +0000 (08:53 -0300)] 
IB/mlx4: Fill in the access_flags if IB_MR_REREG_ACCESS is not specified

Sashiko noticed mlx4 was using whatever random access flags were provided
when IB_MR_REREG_ACCESS is not used. Since IB_MR_REREG_TRANS needs
access_flags it used the random ones which means it doesn't work sensibly
if userspace provides only IB_MR_REREG_TRANS.

Keep track of the current access_flag of the MR and use it if the user
does not specify one.

Also fixup a little confusion around mmr.access, it is the HW access flags
so the convert_access() was missing. But nothing reads this by the time
rereg_mr can happen.

Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
Link: https://patch.msgid.link/r/0-v1-29ca7a402625+ddd6-mlx4_rereg_flags_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
11 days agoASoC: SOF: amd: honor machine_check in SoundWire machine select
Vijendar Mukunda [Tue, 9 Jun 2026 14:40:00 +0000 (20:10 +0530)] 
ASoC: SOF: amd: honor machine_check in SoundWire machine select

Only accept an ACPI machine table entry when machine_check is absent
or returns true, matching other AMD SoundWire machine select paths.

Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://patch.msgid.link/20260609144146.3311301-1-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agocxl/region: Avoid variable shadowing in region attach paths
Alison Schofield [Fri, 5 Jun 2026 04:05:01 +0000 (21:05 -0700)] 
cxl/region: Avoid variable shadowing in region attach paths

A couple of symbol declarations shadow earlier variables in the region
attach paths. Shadowing makes it harder to tell which object is being
referenced and can obscure future bugs.

Reuse the existing 'cxld' variable in cxl_port_attach_region() and
rename the endpoint decoder iterator in cxl_region_attach() to avoid
shadowing the function parameter.

No functional change.

Found with sparse.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260605040504.865728-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
11 days agoASoC: amd: ps: honor machine_check in SoundWire machine select
Vijendar Mukunda [Tue, 9 Jun 2026 14:32:12 +0000 (20:02 +0530)] 
ASoC: amd: ps: honor machine_check in SoundWire machine select

Only accept an ACPI machine table entry when machine_check is absent
or returns true, matching other AMD SoundWire machine select paths.

Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Link: https://patch.msgid.link/20260609143230.3310356-1-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: sma1307: Fix uevent string leaks in fault worker
Cássio Gabriel [Tue, 9 Jun 2026 12:03:56 +0000 (09:03 -0300)] 
ASoC: sma1307: Fix uevent string leaks in fault worker

sma1307_check_fault_worker() stores dynamically allocated uevent strings in
envp[0]. Several fault conditions are checked in sequence, so a later fault
can overwrite envp[0] before the final kfree() and leak the previous
allocation.

The same flow can leave an OT1 volume entry in envp[1] while envp[0]
has been overwritten by a later non-OT1 fault, causing an inconsistent
uevent payload.

Use static STATUS strings and a stack buffer for the optional VOLUME entry.
This removes the allocations from the worker and keeps VOLUME tied only
to the OT1 events that produce it.

Fixes: 576c57e6b4c1 ("ASoC: sma1307: Add driver for Iron Device SMA1307")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260609-asoc-sma1307-uevent-leak-v1-1-cd7f5b062ab7@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: ipc3/ipc4-control: harden kcontrol payload handling
Mark Brown [Tue, 9 Jun 2026 17:41:15 +0000 (18:41 +0100)] 
ASoC: SOF: ipc3/ipc4-control: harden kcontrol payload handling

Peter Ujfalusi <peter.ujfalusi@linux.intel.com> says:

This series hardens SOF kcontrol data paths for both IPC3 and IPC4 by
fixing size-handling bugs in put/get/update flows and tightening bounds
checks around firmware/user-provided payload lengths.

The changes include:

Fix TOCTOU-style size misuse in IPC3/IPC4 bytes put paths by validating and
using the incoming payload size.
Add notification/update payload size validation before parsing control data.
Use overflow-checked arithmetic when computing expected IPC3 control sizes.
Ensure update/copy bounds are validated against actual allocation limits.
Fix IPC3 bytes_ext bounds checks to account for struct header offset, closing
a heap overflow/over-read issue from unprivileged userspace TLV access.
Overall, the series makes control payload processing robust against malformed or
inconsistent sizes and prevents out-of-bounds accesses.

Link: https://patch.msgid.link/20260609083458.31193-1-peter.ujfalusi@linux.intel.com
11 days agoASoC: SOF: ipc3-control: Fix heap overflow in bytes_ext put/get
Peter Ujfalusi [Tue, 9 Jun 2026 08:34:58 +0000 (11:34 +0300)] 
ASoC: SOF: ipc3-control: Fix heap overflow in bytes_ext put/get

The ipc_control_data buffer is allocated as kzalloc(max_size), where
max_size covers the entire struct sof_ipc_ctrl_data including its
flexible array payload. However, the bounds checks in bytes_ext_put
and _bytes_ext_get compared user data lengths against max_size
directly, ignoring that cdata->data sits at an offset of
sizeof(struct sof_ipc_ctrl_data) bytes into the allocation.

This allowed writing up to sizeof(struct sof_ipc_ctrl_data) bytes past
the end of the heap buffer from unprivileged userspace via the ALSA TLV
kcontrol interface, and similarly allowed over-reading adjacent heap
data on the get path.

Fix all bounds checks to subtract sizeof(*cdata) from max_size so they
reflect the actual space available at the cdata->data offset. Also fix
the error-path restore in bytes_ext_put which wrote to cdata->data
instead of cdata, causing the same overflow.

Fixes: 67ec2a091630 ("ASoC: SOF: Add bytes_ext control IPC ops for IPC3")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260609083458.31193-7-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: ipc3-control: Fix TOCTOU in bytes_put and bytes_get
Peter Ujfalusi [Tue, 9 Jun 2026 08:34:57 +0000 (11:34 +0300)] 
ASoC: SOF: ipc3-control: Fix TOCTOU in bytes_put and bytes_get

In sof_ipc3_bytes_put(), the size used for the memcpy is derived from
the old data->size already in the buffer, not the incoming new data's
size field. If the new data has a different size, the copy length is
wrong: it may truncate valid data or copy stale bytes.

Similarly, sof_ipc3_bytes_get() checks data->size against max_size
without accounting for the sizeof(struct sof_ipc_ctrl_data) offset
of the flex array within the allocation.

Fix bytes_put to validate and use the incoming data's sof_abi_hdr.size
from ucontrol before copying. Fix bytes_get to subtract sizeof(*cdata)
from the bounds check to match the actual available space.

Fixes: 544ac8858f24 ("ASoC: SOF: Add bytes_get/put control IPC ops for IPC3")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260609083458.31193-6-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: ipc3-control: Validate size in snd_sof_update_control
Peter Ujfalusi [Tue, 9 Jun 2026 08:34:56 +0000 (11:34 +0300)] 
ASoC: SOF: ipc3-control: Validate size in snd_sof_update_control

In snd_sof_update_control(), firmware-provided cdata->num_elems is
checked against local_cdata->data->size but never against the actual
allocation size. If local_cdata->data->size was previously set to an
inconsistent value, the memcpy could write past the allocated buffer.

Add a bounds check to ensure num_elems fits within the available space
in the ipc_control_data allocation before copying.

Fixes: 10f461d79c2d ("ASoC: SOF: Add IPC3 topology control ops")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260609083458.31193-5-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: ipc3-control: Use overflow checks in control_update size calc
Peter Ujfalusi [Tue, 9 Jun 2026 08:34:55 +0000 (11:34 +0300)] 
ASoC: SOF: ipc3-control: Use overflow checks in control_update size calc

In sof_ipc3_control_update(), the expected_size calculation uses
firmware-provided cdata->num_elems in arithmetic that could overflow
on 32-bit platforms, wrapping to a small value. This would allow the
cdata->rhdr.hdr.size comparison to pass with mismatched sizes,
potentially leading to out-of-bounds access in snd_sof_update_control.

Use check_mul_overflow() and check_add_overflow() to detect and reject
overflowed size calculations.

Fixes: 10f461d79c2d ("ASoC: SOF: Add IPC3 topology control ops")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260609083458.31193-4-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: ipc4-control: Validate notification payload size
Peter Ujfalusi [Tue, 9 Jun 2026 08:34:54 +0000 (11:34 +0300)] 
ASoC: SOF: ipc4-control: Validate notification payload size

Validate MODULE_NOTIFICATION payload length before reading
bytes/channel data in control update handling.

Fixes: 2a28b5240f2b ("ASoC: SOF: ipc4-control: Add support for generic bytes control")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260609083458.31193-3-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: ipc4-control: Fix TOCTOU in sof_ipc4_bytes_put
Peter Ujfalusi [Tue, 9 Jun 2026 08:34:53 +0000 (11:34 +0300)] 
ASoC: SOF: ipc4-control: Fix TOCTOU in sof_ipc4_bytes_put

In sof_ipc4_bytes_put(), the copy size is derived from the old
data->size in the buffer rather than the incoming new data's size
field from ucontrol. If the new data has a different size, the copy
uses the wrong length: it may truncate valid data or copy stale bytes.

Fix by validating and using the incoming data's sof_abi_hdr.size from
ucontrol before copying.

Fixes: a062c8899fed ("ASoC: SOF: ipc4-control: Add support for bytes control get and put")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260609083458.31193-2-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agos390/ap: Fix locking issue in SE bind and associate sysfs functions
Harald Freudenberger [Wed, 3 Jun 2026 13:04:56 +0000 (15:04 +0200)] 
s390/ap: Fix locking issue in SE bind and associate sysfs functions

Revisit and reorganize the locking and lock coverage of the
ap->lock spinlock as used in the two sysfs functions
se_bind_store() and se_associate_store().

A kernel run reported a possible deadlock situation, caused by
holding the spinlock (ap->lock) while triggering a uevent.
The fix rearranges the code protected by the spinlock by excluding
the uevent invocation, which does not require protection.

Additionally, the start of the protected region is moved earlier
to cover more lines, ensuring a consistent view of the AP queue
state between reading and updating its struct fields.

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
7.1.0-20260601.rc6.git12.516b5dbd4d4a.300.fc44.s390x+debug #1 Not tainted
-----------------------------------------------------
setupseguest.sh/11034 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
000001c991f498e8 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_cache_noprof+0x5a/0x6d0
and this task is already holding:
000000c4a1a12378 (&aq->lock){+.-.}-{2:2}, at: se_bind_store+0x96/0x3a0
which would create a new lock dependency:
 (&aq->lock){+.-.}-{2:2} -> (fs_reclaim){+.+.}-{0:0}
but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&aq->lock){+.-.}-{2:2}
... which became SOFTIRQ-irq-safe at:
  __lock_acquire+0x5ae/0x15a0
  lock_acquire+0x14c/0x400
  _raw_spin_lock_bh+0x58/0xb0
  ap_tasklet_fn+0x72/0xd0
  tasklet_action_common+0x174/0x1b0
  handle_softirqs+0x180/0x5c0
  irq_exit_rcu+0x196/0x200
  do_ext_irq+0x12a/0x4d0
  ext_int_handler+0xc6/0xf0
  folio_zero_user+0x1c6/0x240
  folio_zero_user+0x182/0x240
  vma_alloc_anon_folio_pmd+0xa0/0x1d0
  __do_huge_pmd_anonymous_page+0x3a/0x200
  __handle_mm_fault+0x56c/0x590
  handle_mm_fault+0xa2/0x370
  do_exception+0x292/0x590
  __do_pgm_check+0x136/0x3e0
  pgm_check_handler+0x114/0x160
to a SOFTIRQ-irq-unsafe lock:
 (fs_reclaim){+.+.}-{0:0}
... which became SOFTIRQ-irq-unsafe at:
...
  __lock_acquire+0x5ae/0x15a0
  lock_acquire+0x14c/0x400
  __fs_reclaim_acquire+0x44/0x50
  fs_reclaim_acquire+0xbe/0x100
  fs_reclaim_correct_nesting+0x20/0x70
  dotest+0x5e/0x148
  locking_selftest+0x2854/0x2a88
  start_kernel+0x3b2/0x4f0
  startup_continue+0x2e/0x40
other info that might help us debug this:
 Possible interrupt unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
       local_irq_disable();
       lock(&aq->lock);
       lock(fs_reclaim);
  <Interrupt>
    lock(&aq->lock);
 *** DEADLOCK ***
4 locks held by setupseguest.sh/11034:
 #0: 000000c485d01440 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x2fc/0x380
 #1: 000000c4d2283288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x12a0x270
 #2: 000000c4a1830e48 (kn->active#172){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x1e/0x270
 #3: 000000c4a1a12378 (&aq->lock){+.-.}-{2:2}, at: se_bind_store+0x96/0x3a0
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&aq->lock){+.-.}-{2:2} {
   HARDIRQ-ON-W at:
    __lock_acquire+0x5ae/0x15a0
    lock_acquire+0x14c/0x400
    _raw_spin_lock_bh+0x58/0xb0
    ap_queue_init_state+0x2e/0x50
    ap_scan_domains+0x5d6/0x620
    ap_scan_adapter+0x4c0/0x810
    ap_scan_bus+0x70/0x350
    ap_scan_bus_wq_callback+0x56/0x80
    process_one_work+0x2ba/0x820
    worker_thread+0x21a/0x400
    kthread+0x164/0x190
    __ret_from_fork+0x4c/0x340
    ret_from_fork+0xa/0x30
   IN-SOFTIRQ-W at:
    __lock_acquire+0x5ae/0x15a0
    lock_acquire+0x14c/0x400
    _raw_spin_lock_bh+0x58/0xb0
    ap_tasklet_fn+0x72/0xd0
    tasklet_action_common+0x174/0x1b0
    handle_softirqs+0x180/0x5c0
    irq_exit_rcu+0x196/0x200
    do_ext_irq+0x12a/0x4d0
    ext_int_handler+0xc6/0xf0
    folio_zero_user+0x1c6/0x240
    folio_zero_user+0x182/0x240
    vma_alloc_anon_folio_pmd+0xa0/0x1d0
    __do_huge_pmd_anonymous_page+0x3a/0x200
    __handle_mm_fault+0x56c/0x590
    handle_mm_fault+0xa2/0x370
    do_exception+0x292/0x590
    __do_pgm_check+0x136/0x3e0
    pgm_check_handler+0x114/0x160
   INITIAL USE at:
   __lock_acquire+0x5ae/0x15a0
   lock_acquire+0x14c/0x400
   _raw_spin_lock_bh+0x58/0xb0
   ap_queue_init_state+0x2e/0x50
   ap_scan_domains+0x5d6/0x620
   ap_scan_adapter+0x4c0/0x810
   ap_scan_bus+0x70/0x350
   ap_scan_bus_wq_callback+0x56/0x80
   process_one_work+0x2ba/0x820
   worker_thread+0x21a/0x400
   kthread+0x164/0x190
   __ret_from_fork+0x4c/0x340
   ret_from_fork+0xa/0x30
 }
 ... key      at: [<000001c9936e8aa0>] __key.7+0x0/0x10
the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (fs_reclaim){+.+.}-{0:0} {
   HARDIRQ-ON-W at:
    __lock_acquire+0x5ae/0x15a0
    lock_acquire+0x14c/0x400
    __fs_reclaim_acquire+0x44/0x50
    fs_reclaim_acquire+0xbe/0x100
    fs_reclaim_correct_nesting+0x20/0x70
    dotest+0x5e/0x148
    locking_selftest+0x2854/0x2a88
    start_kernel+0x3b2/0x4f0
    startup_continue+0x2e/0x40
   SOFTIRQ-ON-W at:
    __lock_acquire+0x5ae/0x15a0
    lock_acquire+0x14c/0x400
    __fs_reclaim_acquire+0x44/0x50
    fs_reclaim_acquire+0xbe/0x100
    fs_reclaim_correct_nesting+0x20/0x70
    dotest+0x5e/0x148
    locking_selftest+0x2854/0x2a88
    start_kernel+0x3b2/0x4f0
    startup_continue+0x2e/0x40
   INITIAL USE at:
   __lock_acquire+0x5ae/0x15a0
   lock_acquire+0x14c/0x400
   __fs_reclaim_acquire+0x44/0x50
   fs_reclaim_acquire+0xbe/0x100
   fs_reclaim_correct_nesting+0x20/0x70
   dotest+0x5e/0x148
   locking_selftest+0x2854/0x2a88
   start_kernel+0x3b2/0x4f0
   startup_continue+0x2e/0x40
 }
 ... key      at: [<000001c991f498e8>] __fs_reclaim_map+0x0/0x30
 ... acquired at:
   check_prev_add+0x178/0xf40
   __lock_acquire+0x12aa/0x15a0
   lock_acquire+0x14c/0x400
   __fs_reclaim_acquire+0x44/0x50
   fs_reclaim_acquire+0xbe/0x100
   __kmalloc_cache_noprof+0x5a/0x6d0
   kobject_uevent_env+0xd4/0x420
   ap_send_se_bind_uevent+0x48/0x70
   se_bind_store+0x146/0x3a0
   kernfs_fop_write_iter+0x18c/0x270
   vfs_write+0x23c/0x380
   ksys_write+0x88/0x120
   __do_syscall+0x170/0x750
   system_call+0x72/0x90
stack backtrace:
CPU: 6 UID: 0 PID: 11034 Comm: setupseguest.sh Not tainted 7.1.0-20260601.rc6.git2.516b5dbd4d4a.300.fc44.s390x+debug #1 PREEMPT
Hardware name: IBM 9175 ME1 701 (KVM/Linux)
Call Trace:
 [<000001c98ffa0a7e>] dump_stack_lvl+0xae/0x108
 [<000001c9900a6d7a>] print_bad_irq_dependency+0x47a/0x480
 [<000001c9900a7184>] check_irq_usage+0x404/0x4c0
 [<000001c9900a73b8>] check_prev_add+0x178/0xf40
 [<000001c9900aaf1a>] __lock_acquire+0x12aa/0x15a0
 [<000001c9900ab35c>] lock_acquire+0x14c/0x400
 [<000001c9903be454>] __fs_reclaim_acquire+0x44/0x50
 [<000001c9903be51e>] fs_reclaim_acquire+0xbe/0x100
 [<000001c9903cf4ca>] __kmalloc_cache_noprof+0x5a/0x6d0
 [<000001c9910ca9d4>] kobject_uevent_env+0xd4/0x420
 [<000001c990d84098>] ap_send_se_bind_uevent+0x48/0x70
 [<000001c990d87416>] se_bind_store+0x146/0x3a0
 [<000001c99057da7c>] kernfs_fop_write_iter+0x18c/0x270
 [<000001c99047712c>] vfs_write+0x23c/0x380
 [<000001c990477438>] ksys_write+0x88/0x120
 [<000001c9910f64e0>] __do_syscall+0x170/0x750
 [<000001c99110a412>] system_call+0x72/0x90
INFO: lockdep is turned off.

Fixes: 4179c3984227 ("s390/ap: Implement SE bind and associate uevents")
Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
Suggested-by: Finn Callies <fcallies@linux.ibm.com>
Reviewed-by: Finn Callies <fcallies@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
11 days agoASoC: SOF: amd: set ipc flags to zero
Vijendar Mukunda [Tue, 9 Jun 2026 16:08:45 +0000 (21:38 +0530)] 
ASoC: SOF: amd: set ipc flags to zero

As per design, set IPC conf structure flags to zero during acp init
sequence.

Link: https://github.com/thesofproject/linux/pull/5642
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Tested-by: Umang Jain <uajain@igalia.com>
Link: https://patch.msgid.link/20260609160938.3717513-2-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoASoC: SOF: amd: fix for ipc flags check
Vijendar Mukunda [Tue, 9 Jun 2026 16:08:44 +0000 (21:38 +0530)] 
ASoC: SOF: amd: fix for ipc flags check

Firmware will set dsp_ack to 1 when firmware sends response for the IPC
command issued by host. Similarly dsp_msg flag will be updated to 1.

During ACP D0 entry, the value read from the sof_dsp_ack_write scratch
flag can be uninitialized. A non-zero garbage value is treated as a
pending DSP IPC ack before SOF_FW_BOOT_COMPLETE, causing a spurious
"IPC reply before FW_BOOT_COMPLETE" log.

Fix the condition checks for ipc flags.

Fixes: 738a2b5e2cc9 ("ASoC: SOF: amd: Add IPC support for ACP IP block")
Link: https://github.com/thesofproject/linux/pull/5642
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Tested-by: Umang Jain <uajain@igalia.com>
Link: https://patch.msgid.link/20260609160938.3717513-1-Vijendar.Mukunda@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
11 days agoMerge branch 'net-ethtool-let-ops-locked-drivers-run-without-rtnl_lock'
Jakub Kicinski [Tue, 9 Jun 2026 17:13:08 +0000 (10:13 -0700)] 
Merge branch 'net-ethtool-let-ops-locked-drivers-run-without-rtnl_lock'

Jakub Kicinski says:

====================
net: ethtool: let ops locked drivers run without rtnl_lock

With the ethtool_get_link_ksettings() situation hopefully ironed out
the previous series (commit 6a5d837f0ce2) let's return to the main
part of the series.

We have been slowly moving towards removing the rtnl_lock dependency
in driver ops since the concept of "ops-locked" drivers have been
introduced last year. Since last year will take the netdev instance
lock before invoking any ndo or ethtool op of "ops-locked" drivers.

We dipped our toes into rtnl_lock-less ops with the queue binding API.
Queue stats, NAPI, and other netdev-netlink objects are also queried
without holding rtnl_lock already. It's time to take the next logical
step and lift the requirement from ethtool ops.

The direct motivation for this patchset is that ethtool ops often
involve communicating with device FW, and may take a long time
to complete. Aggressive polling of device state on machines
with 10+ NICs have been shown to significantly increase rtnl_lock
pressure.

There's a handful of areas which still need rtnl_lock (see below).
I decided to convert everything to rtnl_lock-less by default, and
add a set of flags which let the drivers request rtnl_lock to still
be taken. I don't love this, but I'm worried that opt-in would be
even more confusing.

Known issues / exclusions:
 - qdiscs - qdisc configuration currently assumes rtnl_lock, this
   is mostly impacting set_channels callback. qdisc config is probably
   the easiest one of the exclusions to tackle, it's fairly self-contained.
 - features - even tho feature changes are (correctly) plumbed to
   the driver thru ndos they are part of ethtool uAPI. ethtool itself
   calls netdev_features_change() if it has spotted device feature change
   before vs after to the callback. Some drivers also call
   netdev_features_change() directly in response to various changes,
   e.g. setting priv flags.
   Since features have to propagate to upper and lower devices anything
   that touches features is quite hard to move from under rtnl_lock.
 - phylink - phylink and SFP depend on rtnl_lock today, I suspect
   that this is purely for historic reasons. I started poking at
   it and don't really see a need for a global lock. But accessing
   the netdev instance lock from the SFP entry points will require
   some attention from the phylink folks.
 - phydev - similar to phylink, looks quite doable. But no ops-locked
   driver currently has a phydev (fbnic only uses phylink) so phydev
   related paths retain a ASSERT_RTNL() for now.

Tested on mlx5, bnxt and fbnic.
====================

Link: https://patch.msgid.link/20260605002912.3456868-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days agodocs: net: ethtool: document ops-locked drivers and op_needs_rtnl
Jakub Kicinski [Fri, 5 Jun 2026 00:29:12 +0000 (17:29 -0700)] 
docs: net: ethtool: document ops-locked drivers and op_needs_rtnl

Catch up various bits of documentation after the locking changes.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260605002912.3456868-13-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days agonet: ethtool: optionally skip rtnl_lock on IOCTL path
Jakub Kicinski [Fri, 5 Jun 2026 00:29:11 +0000 (17:29 -0700)] 
net: ethtool: optionally skip rtnl_lock on IOCTL path

Convert the IOCTL path similarly to how we converted Netlink.
The device lookup gets a little hairy. We could take rtnl_lock
unconditionally and drop it before calling the driver (this would
avoid the reference + liveness check). But I think being able
to make progress even if rtnl is dead-locked is quite useful.

First extra concern is handling features. List all the cmds which
modify features and always take rtnl_lock. We could fold this list
into ethtool_ioctl_needs_rtnl() but seems cleaner to keep
ethtool_ioctl_needs_rtnl() driver-related. If a driver changed
features and we were not holding rtnl_lock - warn about it.
It can only happen on buggy ops locked drivers (buggy because
they should have set appropriate "I need rtnl for op X" bit).

Second wrinkle is the PHY ID hack which drops the locks while
sleeping. Convert its static "busy" variable which used to
be protected by rtnl_lock to a field in struct ethtool_netdev_state.
This feature is about identifying an adapter or a port within
a system, so being able to blink multiple LEDs at the same
time is likely not very useful in practice. But it's the simplest
fix, we can add a mutex if someone thinks a system should only
be ID'ing one port at a time.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260605002912.3456868-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days agonet: ethtool: ioctl: concentrate the locking
Jakub Kicinski [Fri, 5 Jun 2026 00:29:10 +0000 (17:29 -0700)] 
net: ethtool: ioctl: concentrate the locking

Add another layer of helper functions to make upcoming locking
changes easier. Otherwise we'd need a pretty complex goto
structure. netdev instance lock is now taken slightly sooner
but that should not be an issue since rtnl_lock is already held,
anyway.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260605002912.3456868-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days agonet: ethtool: optionally skip rtnl_lock in RSS context handlers
Jakub Kicinski [Fri, 5 Jun 2026 00:29:09 +0000 (17:29 -0700)] 
net: ethtool: optionally skip rtnl_lock in RSS context handlers

Skip rtnl_lock in RSS context handlers if device is ops-locked.
Fairly trivial conversion. bnxt needed rtnl_lock for changing
the main context but looks like additional contexts are fine
without it.

Note (for review bots?) that ethnl_ops_begin() checks whether
the device is still registered.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260605002912.3456868-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days agonet: ethtool: optionally skip rtnl_lock in ethnl_act_module_fw_flash()
Jakub Kicinski [Fri, 5 Jun 2026 00:29:08 +0000 (17:29 -0700)] 
net: ethtool: optionally skip rtnl_lock in ethnl_act_module_fw_flash()

Module firmware flashing reads SFF-8024 identifier bytes via
.get_module_eeprom_by_page(). Other than that it modifies
a bit in the netdev->ethtool struct. Both should be ops-locked
at this point.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260605002912.3456868-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>