git.ipfire.org Git - thirdparty/linux.git/log

net: dsa: realtek: rtl8365mb: reject unsupported topologies

Explicitly enforce the presence of a CPU port (-EINVAL) and reject DSA
cascade links (-EOPNOTSUPP) during setup to prevent silent failures.

These topologies were already non-functional. Without a CPU port, the
driver does not activate CPU tagging. Additionally, the switch hardware
was not designed to be cascaded, and DSA links never worked because
CPU tagging is not enabled for them.

Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-2-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: use ERR_PTR

Convert numeric error codes into human-readable strings by using %pe
together with ERR_PTR() in dev_err() messages. Also use dev_err_probe()
instead of checking for -EPROBE_DEFER.

Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-1-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: restore RX state on reload failure

lan966x_fdma_reload() backs up rx->page_pool and rx->fdma before
reallocating the RX resources for the new MTU. If the allocation fails,
the restore path puts these fields back before restarting RX.

However, the reload path also updates rx->page_order and rx->max_mtu
before calling lan966x_fdma_rx_alloc(). These fields are not restored on
failure, so RX can be restarted with the old pages, old FDMA state and
old page pool, but with the page geometry from the failed new MTU.

This can make the XDP path advertise a frame size derived from the new
page_order while the actual RX pages still come from the old allocation.
For example, after a failed reload to a jumbo MTU, xdp_init_buff() may be
called with a frame size larger than the restored RX pages.

lan966x_fdma_rx_alloc_page_pool() also registers the newly allocated page
pool with each port's XDP RXQ before fdma_alloc_coherent() is called. If
fdma_alloc_coherent() fails, the new page pool is destroyed, but the
rollback path does not restore the per-port XDP RXQ mem model
registration either.

Save and restore rx->page_order and rx->max_mtu, and restore the old page
pool registration for each port's XDP RXQ before RX is started again.
This keeps the restored RX state consistent after a failed reload.

Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260607145747.1494514-1-lgs201920130244@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ocp: fix resource freeing order

Commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable
events") added a call to ptp_disable_all_events() which changes the
configuration of pins if they support EXTTS events. In ptp_ocp_detach()
pins resources are freed before ptp_clock_unregister() and it leads to
use-after-free during driver removal. Fix it by changing the order of
free/unregister calls. To avoid irq handler running on the other core
while ptp device unregistering, call synchronize_irq() after HW is
configured to stop producing irqs and no irqs are in-flight.

Fixes: a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260608155952.240304-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pse-pd: pd692x0: support disabling disable ports GPIO

Microchip PSE controllers have a dedicated disable ports input that like it
name says disables PoE on all ports.

So lets support parsing that GPIO and using the GPIO flags to set it low
by default and enable PoE on all ports during probe.

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260607165600.1260210-2-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: pse-pd: microchip,pd692x0: add port disable GPIO

Microchip PSE controllers have a dedicated port disable input that like it
name suggest, will disable PoE on all ports.

So, lets document that GPIO.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260607165600.1260210-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add getsockopt_iter binary to .gitignore

The generated binary for getsockopt_iter.c shouldn't show up as an
untracked git file after running:

make -C tools/testing/selftests TARGETS=net install

Let's just ignore it.

Fixes: d39887f55d8e ("net: selftests: add getsockopt_iter regression tests")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260608112259.4022-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tun: zero the whole vnet header in tun_put_user()

tun_put_user() declares an on-stack struct virtio_net_hdr_v1_hash_tunnel
without zeroing it. For a non-tunnel skb, virtio_net_hdr_tnl_from_skb()
only initializes the first 10 bytes (sizeof(struct virtio_net_hdr)),
leaving bytes 10..23 (num_buffers and the hash/tunnel fields) as stack
garbage.

An unprivileged user can set the vnet header size to 24 with
TUNSETVNETHDRSZ, so __tun_vnet_hdr_put() copies all 24 bytes of the
partially-initialized struct to userspace, leaking 14 bytes of kernel
stack on every read of a non-tunnel packet.

Fix it the same way tun_get_user() already does by zeroing the whole
header right after declaration.

Fixes: 288f30435132 ("tun: enable gso over UDP tunnel support.")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260607054428.3050243-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rds: fix NULL deref in rds_ib_send_cqe_handler() on masked atomic completion

rds_ib_xmit_atomic() always programs a masked atomic opcode
(IB_WR_MASKED_ATOMIC_CMP_AND_SWP or IB_WR_MASKED_ATOMIC_FETCH_AND_ADD)
for every RDS atomic cmsg.  But the completion-side switch in
rds_ib_send_unmap_op() only handles the non-masked opcodes, so a masked
atomic completion falls through to default and returns rm == NULL while
send->s_op is left set.  rds_ib_send_cqe_handler() then dereferences the
NULL rm via rm->m_final_op, oopsing in softirq context.  An unprivileged
AF_RDS sendmsg() of an atomic cmsg over an active RDS/IB connection
triggers it; on hardware that natively accepts masked atomics (mlx4,
mlx5) no extra setup is needed.

  RDS/IB: rds_ib_send_unmap_op: unexpected opcode 0xd in WR!
  Oops: general protection fault [#1] SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000190-0x0000000000000197]
  RIP: rds_ib_send_cqe_handler+0x25c/0xb10 (net/rds/ib_send.c:282)
  Call Trace:
   <IRQ>
   rds_ib_send_cqe_handler (net/rds/ib_send.c:282)
   poll_scq (net/rds/ib_cm.c:274)
   rds_ib_tasklet_fn_send (net/rds/ib_cm.c:294)
   tasklet_action_common (kernel/softirq.c:943)
   handle_softirqs (kernel/softirq.c:573)
   run_ksoftirqd (kernel/softirq.c:479)
   </IRQ>
  Kernel panic - not syncing: Fatal exception in interrupt

Handle the masked atomic opcodes in the same case as the non-masked
ones: they map to the same struct rds_message.atomic union member, so
the existing container_of()/rds_ib_send_unmap_atomic() body is correct
for them.

Fixes: 20c72bd5f5f9 ("RDS: Implement masked atomic operations")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260606192447.1179255-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: guard timestamp cmsgs to real error queue skbs

skb_is_err_queue() treats PACKET_OUTGOING as the sole marker for an skb
from sk_error_queue. That assumption is not true for AF_PACKET sockets:
outgoing packet taps are also delivered to packet sockets with
skb->pkt_type == PACKET_OUTGOING, but their skb->cb is owned by AF_PACKET
instead of struct sock_exterr_skb.

If such an skb is received with timestamping enabled, the generic
timestamp cmsg path can read AF_PACKET control-buffer state as
sock_exterr_skb::opt_stats. With SO_RXQ_OVFL enabled, the packet drop
counter overlaps opt_stats. An odd drop count makes the path emit
SCM_TIMESTAMPING_OPT_STATS with skb->len and skb->data. For non-linear
skbs this copies past the linear head and can trigger hardened usercopy or
disclose adjacent heap contents.

Keep skb_is_err_queue() local to net/socket.c, but make it verify that
the PACKET_OUTGOING marker is paired with the sock_rmem_free destructor
installed by sock_queue_err_skb(). AF_PACKET receive skbs use normal
receive ownership and no longer pass as error-queue skbs, while legitimate
sk_error_queue entries keep the PACKET_OUTGOING marker and sock_rmem_free
ownership.

Fixes: 8605330aac5a ("tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs")
Signed-off-by: Kyle Zeng <kylebot@openai.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260607021819.49698-1-kylebot@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: validate embedded INIT chunk and address list lengths in cookie

sctp_unpack_cookie() only checked that the embedded INIT chunk length
did not exceed the remaining cookie payload, but did not ensure that the
INIT chunk is large enough to contain a complete INIT header.

A malformed COOKIE_ECHO can therefore carry a truncated INIT chunk whose
length field is smaller than sizeof(struct sctp_init_chunk).  Later,
sctp_process_init() accesses INIT parameters unconditionally, which may
lead to out-of-bounds reads.

In addition, raw_addr_list_len is not fully validated against the
remaining cookie payload. When cookie authentication is disabled, an
attacker can supply an oversized raw_addr_list_len and cause
sctp_raw_to_bind_addrs() to read beyond the end of the cookie. The
address parser also lacks sufficient bounds checks for parameter headers
and lengths, allowing malformed address parameters to trigger
out-of-bounds reads.

Fix this by:

- requiring the embedded INIT chunk length to be at least sizeof(struct
  sctp_init_chunk);
- validating that the INIT chunk and raw address list together fit
  within the cookie payload;
- verifying sufficient data exists for each address parameter header and
  payload before parsing it.

Note that sctp_verify_init() must be called after sctp_unpack_cookie()
and before sctp_process_init() when cookie authentication is disabled.
This will be addressed in a separate patch.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/75af23a89adf881a0895d511775e4770da367cbf.1780873427.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-add-retry-mechanism-to-ndo_set_rx_mode_async'

Stanislav Fomichev says:

====================
net: add retry mechanism to ndo_set_rx_mode_async

Original async ndo_set_rx_mode work left one place where we do netdev_WARN
in response to a ENOMEM. The intent was to see whether actual real
users can hit that (adding uc/mc under memory pressure seems like a
very unlikely thing to do). However, it was quickly triggered by
syzbot's failslab. Add a retry mechanism and downgrade netdev_WARN
to netdev_err. The retry logic is a typical exponential backoff:
1, 2, 4, 8 seconds, 15 in total, hopefully enough for a system to resolve
memory pressure.
====================

Link: https://patch.msgid.link/20260608154014.227538-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6_vti: set netns_immutable on the fallback device.

john1988 and Noam Rathaus reported that vti6_init_net() does not set the
netns_immutable flag on the per-netns fallback tunnel device (ip6_vti0).

Other similar tunnel drivers (like ip6_tunnel, sit, ip6_gre, and ip_tunnel)
correctly set this flag during their fallback device initialization to
prevent them from being moved to another network namespace.

Fixes: 61220ab34948 ("vti6: Enable namespace changing")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20260608155918.787644-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt: convert to core rx_mode retry mechanism

Remove the driver-specific BNXT_STATE_L2_FILTER_RETRY + timer + sp_task
retry mechanism and rely on the core stack's ndo_set_rx_mode_async retry
instead.

bnxt_cfg_rx_mode() now returns errors instead of swallowing them. The
PF-unavailable case (-ENODEV from HWRM on a VF) is normalized to
-EAGAIN at the boundary so callers can match on a single "retry me"
errno without re-implementing the VF/-ENODEV check. Other errors
propagate unchanged.

This removes:
- BNXT_STATE_L2_FILTER_RETRY state bit
- BNXT_RX_MASK_SP_EVENT sp_event bit
- Retry trigger from bnxt_timer()
- BNXT_RX_MASK_SP_EVENT handling from bnxt_sp_task()

bnxt_init_chip() still calls bnxt_cfg_rx_mode() directly during open.
On a fresh open dev->uc is empty and the call effectively cannot fail
on the unicast path. But on FW reset reopen (bnxt_fw_reset_task ->
bnxt_open) a VF may have a populated dev->uc and the PF may be
transiently unavailable; since that path doesn't go through
__dev_open(), the follow-up rx_mode call that would otherwise drive
the core retry doesn't fire. On -EAGAIN, swallow the error and call
netif_rx_mode_schedule_retry() explicitly. The unicast filter loop
truncates vnic->uc_filter_count on failure, so the retry's delta check
sees pending work and reinstalls.

Cc: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260608154014.227538-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add retry mechanism to ndo_set_rx_mode_async

When ndo_set_rx_mode_async returns an error, schedule a retry with
exponential backoff (1s, 2s, 4s, 8s -- 15s total). Give up after the
4th retry and log an error via netdev_err().

This moves retry logic from individual drivers into the core stack.

Timer callback does not hold a ref on dev. Safe because the timer can
only be armed when dev is IFF_UP, and __dev_close_many runs
timer_delete_sync before clearing IFF_UP. Unregister always closes
IFF_UP devices first, so by the time dev can be freed the timer is
dead and cannot be re-armed.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260608154014.227538-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: change ndo_set_rx_mode_async return type to int

Change the return type of ndo_set_rx_mode_async from void to int to
allow drivers to report failures back to the core stack. This is a
prerequisite for adding retry logic in the core when drivers fail to
program RX filters (e.g. bnxt VF when PF is unavailable).

All existing implementations return 0 for now, maintaining current
behavior.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260608154014.227538-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: fix uninit-value in __sctp_rcv_asconf_lookup()

__sctp_rcv_asconf_lookup() in net/sctp/input.c only checks that the ASCONF
chunk can hold the ADDIP header and a parameter header, then calls
af->from_addr_param(), which reads the full address (16 bytes for IPv6)
trusting the parameter's declared length.

An unauthenticated peer can send a truncated trailing ASCONF chunk that
declares an IPv6 address parameter but stops after the 4-byte parameter
header; reached from the no-association lookup path, from_addr_param() then
reads uninitialized bytes past the parameter.

Impact: an unauthenticated SCTP peer makes the receive path read up to 16
bytes of uninitialized memory past a truncated ASCONF address parameter.

The sibling __sctp_rcv_init_lookup() bounds parameters with
sctp_walk_params(); this path open-codes the fetch and omits the bound.
Verify the whole address parameter lies within the chunk before
from_addr_param() reads it, the same class of fix as commit 51e5ad549c43
("net: sctp: fix KMSAN uninit-value in sctp_inq_pop").

Fixes: df2185771439 ("[SCTP]: Update association lookup to look at ASCONF chunks as well")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20260608122234.459098-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: drv: Replace strcpy() + strlcat() with snprintf()

Avoid string function that are due to be deprecated.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20260608095523.2606-36-david.laight.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: eth: benet: Use strscpy() to copy strings into arrays

Replacing strcpy() with strscpy() ensures that overflow of the target
buffer cannot happen.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20260608095500.2567-4-david.laight.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries

mana_query_link_cfg() sends an HWC command to firmware on every call,
but the link speed and QoS values it returns only change when the
driver explicitly calls mana_set_bw_clamp(). This function is called
not only by userspace via ethtool get_link_ksettings, but also
periodically by hv_netvsc through netvsc_get_link_ksettings and by
the sysfs speed_show attribute via dev_attr_show, resulting in
unnecessary HWC traffic every few minutes.

Add a link_cfg_error field to mana_port_context to cache the query
result. The field uses three states: 1 (not yet queried, initial
value set during mana_probe_port), 0 (success, speed/max_speed are
valid), or a negative errno for permanent errors like -EOPNOTSUPP
when the hardware does not support the command. Transient errors and
qos_unconfigured responses are not cached so that subsequent calls
will retry.

MANA is ops-locked because it implements net_shaper_ops, so the core
already takes netdev_lock() around all ethtool_ops and net_shaper_ops
entry points. Reuse that lock to serialize mana_query_link_cfg() and
mana_set_bw_clamp(). This prevents a concurrent mana_set_bw_clamp()
from racing with an in-flight query and publishing stale pre-clamp
speed/max_speed.

Invalidate the cache inside mana_set_bw_clamp() on success, so all
current and future callers that change the link configuration
automatically trigger a fresh query on the next mana_query_link_cfg()
call. Also reset link_cfg_error during resume in mana_probe() under
netdev_lock(), so that any query already in flight cannot later
store 0 and silently overwrite the post-resume invalidation.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260606133301.2180073-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: validate netdev belongs to switch in .netdev_to_port()

The .netdev_to_port() currently takes only a net_device and returns the
port index, without verifying the netdev actually belongs to the switch
being operated on. This can cause flower rule parsing to silently
resolve to a wrong port on the local hardware.

Update both implementations felix_netdev_to_port() and
ocelot_netdev_to_port() to validate ownership. Also update the callers
in ocelot_flower.c to pass through the ocelot context.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260606125247.305167-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix NULL pointer dereference

PCIe errors detected by a Root Port or Downstream Port cause error
recovery services to run on all subordinate devices regardless of
administrative state.

The .error_detected() callback, bnxt_io_error_detected(), disables
and synchronizes IRQs via bnxt_disable_int_sync(), which calls
bnxt_cp_num_to_irq_num() to map completion rings to IRQs using
bp->bnapi.

Since bp->bnapi is allocated on NIC open and freed on NIC close, PCIe
error recovery on a closed NIC can dereference a NULL pointer.

Check if bp->bnapi is NULL before disabling and synchronizing IRQs.

Fixes: e5811b8c09df ("bnxt_en: Add IRQ remapping logic.")
Cc: stable@vger.kernel.org
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/aiNM1CY2-StPilxW@hpe.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add support for PF device 0x00C1

Update the device id table to include the new device id 0x00C1.
This device's BAR layout is similar to VF's, update the function,
mana_gd_init_registers(), accordingly.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20260605212302.2135499-1-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: qca8k: Add support for force mode for fixed link topology

A fixed link topology is commonly used to connect this switch (on port
0 or 6) to a SoC's MAC over SGMII. When inband negotiation is not used,
the switch needs to be configured to operate in force mode. As such,
enable support for force mode.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Link: https://patch.msgid.link/20260605-qca8337-force-mode-v2-1-d9a6b6545bfa@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-motorcomm-8531s-set-ds-func-and-8522-driver'

Minda Chen says:

====================
Add motorcomm 8531s set ds func and 8522 driver

This patch is for Starfive JHB100 EVB board. JHB100 contain
1 RGMII/RMII and 1 RMII synopsys GMAC cores. In the EVB board, RGMII
interface connect with YT8531s Ethernet PHY. RMII interface connect
with YT8522 ethernet PHY. So patch 1-2 is for RGMII interface
patch 3 is RMII is for RMII interface.

JHB100 is a Starfive new RISC-V SoC for datacenter BMC (BaseBoard
Managent Controller). Similar with Aspeed 27x0.

The JHB100 minimal system upstream is in progress:
https://patchwork.kernel.org/project/linux-riscv/cover/20260508053632.818548-1-changhuang.liang@starfivetech.com/
====================

Link: https://patch.msgid.link/20260605060212.41895-1-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: Add YT8522 100M RMII PHY support

Add YT8522 100M RMII ethernet PHY base driver support, including
PHY ID and base config init function.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260605060212.41895-4-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: motorcomm: phy: set drive strength in YT8531s RGMII

Set RXD and RX CLK pin drive strength while in YT8531s connect
with RGMII. Need to check 8531s PHY ID because 8521 and 8531s
pin drive strength is different, 8521 can not call
yt8531_set_ds().

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260605060212.41895-3-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: move mdio lock out from yt8531_set_ds()

yt8531_set_ds() default set register with mdio lock and only called
with YT8531 PHY. But new type YT8531s support RGMII and has the same
pin strength setting with YT8531, YT8531s need to call yt8531_set_ds()
setting pin drive strength. But YT8531s config init function
yt8521_config_init() already get the mdio lock with phy_select_page().
If calling yt8521_config_init() with mdio lock will cause dead lock.

Need to get the lock before calling yt8531_set_ds() and move mdio
lock out from it for YT8531s.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260605060212.41895-2-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: stream: fully roll back denied add-stream state

When ADD_OUT_STREAMS is denied, SCTP only shrinks the queued chunks and
then lowers outcnt. That leaves removed stream metadata behind, so a
later re-add can reuse a stale ext and hit a null-pointer dereference in
the scheduler get path.

Fix the rollback by tearing down the removed stream state the same way
other stream resizes do. Unschedule the current scheduler state, drop
the removed stream ext state with sctp_stream_outq_migrate(), and then
reschedule the remaining streams.

This keeps scheduler-private RR/FC/PRIO lists consistent while fully
rolling back denied outgoing stream additions.

Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/d78954ecd94954653ee299400e98d74a03a6f7d3.1780603399.git.bronzed_45_vested@icloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mana-per-vport-eq'

Long Li says:

====================
net: mana: Per-vPort EQ and MSI-X management

This series moves EQ ownership from the shared mana_context to per-vPort
mana_port_context, enabling each vPort to have dedicated MSI-X vectors
when the hardware provides enough vectors. When vectors are limited, the
driver falls back to sharing MSI-X among vPorts.

The series introduces a GDMA IRQ Context (GIC) abstraction with reference
counting to manage interrupt context lifecycle. This allows both Ethernet
and RDMA EQs to dynamically acquire dedicated or shared MSI-X vectors at
vPort creation time rather than pre-allocating all vectors at probe time.
====================

Link: https://patch.msgid.link/20260605005717.2059954-1-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

RDMA/mana_ib: Allocate interrupt contexts on EQs

Use the GIC functions to allocate interrupt contexts for RDMA EQs. These
interrupt contexts may be shared with Ethernet EQs when MSI-X vectors
are limited.

The driver now supports allocating dedicated MSI-X for each EQ. Indicate
this capability through driver capability bits. The RDMA EQs pass
use_msi_bitmap=false to share MSI-X vectors with Ethernet, while the
capability flag advertises that the driver supports per-vPort EQ
separation when hardware has sufficient vectors.

Populate eq.irq on all RDMA EQs for consistency with the Ethernet path.

Also relocate the GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE define to its
numeric BIT(6) position among the other capability flags.

Signed-off-by: Long Li <longli@microsoft.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Link: https://patch.msgid.link/20260605005717.2059954-7-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Allocate interrupt context for each EQ when creating vPort

Use GIC functions to create a dedicated interrupt context or acquire a
shared interrupt context for each EQ when setting up a vPort.

The caller now owns the GIC reference across the EQ create/destroy
lifecycle: mana_create_eq() calls mana_gd_get_gic() before creating
each EQ and mana_destroy_eq() calls mana_gd_put_gic() after destroying
it. The msix_index invalidation is moved from mana_gd_deregister_irq()
to the mana_gd_create_eq() error path so that mana_destroy_eq() can
read the index before teardown.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/20260605005717.2059954-6-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Use GIC functions to allocate global EQs

Replace the GDMA global interrupt setup code with the new GIC allocation
and release functions for managing interrupt contexts.

This changes the per-queue interrupt names in /proc/interrupts from
mana_q0, mana_q1, ... to mana_msi1, mana_msi2, ... to reflect the
MSI-X index rather than a zero-based queue number. The HWC interrupt
name (mana_hwc) is unchanged.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/20260605005717.2059954-5-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Introduce GIC context with refcounting for interrupt management

To allow Ethernet EQs to use dedicated or shared MSI-X vectors and RDMA
EQs to share the same MSI-X, introduce a GIC (GDMA IRQ Context) with
reference counting. This allows the driver to create an interrupt context
on an assigned or unassigned MSI-X vector and share it across multiple
EQ consumers.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/20260605005717.2059954-4-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Query device capabilities and configure MSI-X sharing for EQs

When querying the device, adjust the max number of queues to allow
dedicated MSI-X vectors for each vPort. The per-vPort queue count is
clamped towards MANA_DEF_NUM_QUEUES but will not exceed the hardware
maximum reported by the device.

MSI-X sharing among vPorts is enabled when there are not enough MSI-X
vectors for dedicated allocation, or when the platform does not support
dynamic MSI-X allocation (in which case all vectors are pre-allocated
at probe time and sharing is always used). The msi_sharing flag is
reset at the top of mana_gd_query_max_resources() so it is recomputed
from current hardware state on each probe or resume cycle.

Clamp apc->max_queues to gc->max_num_queues_vport in mana_init_port()
so that on resume, if max_num_queues_vport has decreased due to fewer
MSI-X vectors, num_queues is reduced accordingly before EQ allocation.

A device reporting zero ports now results in a fatal probe error since
the per-vPort MSI-X math requires at least one port.

Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is
used at GDMA device probe time for querying device capabilities.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/20260605005717.2059954-3-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Create separate EQs for each vPort

To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
sharing among the vPorts and create dedicated EQs for each vPort.

Move the EQ definition from struct mana_context to struct mana_port_context
and update related support functions. Export mana_create_eq() and
mana_destroy_eq() for use by the MANA RDMA driver.

RSS QPs now take a vport reference via pd->vport_use_count to ensure
EQs outlive all QP consumers. The vport must already be configured by
a raw QP before an RSS QP can be created. EQs are only destroyed when
the last QP (raw or RSS) on the PD releases its reference.

Restrict each vport to a single RSS QP. The hardware only supports one
steering configuration (indirection table / hash key) per vport, and
mana_disable_vport_rx() on QP destroy disables RX globally for the
vport. Previously, creating a second RSS QP would silently overwrite
the first QP's steering config and destroy would blackhole all traffic.
This is now explicitly rejected with -EBUSY. Existing applications
(DPDK being the primary RDMA consumer) always create one RSS QP per
vport, so no real-world flows are affected.

Reject cross-port PD sharing for both raw and RSS QPs. Since EQs and
vport configuration are per-port, a PD is bound to the port used by
its first raw QP. Subsequent QPs on the same PD must use the same
port or the creation fails with -EINVAL. Previously this was silently
broken: with shared EQs it appeared to work, but with per-vPort EQs
a cross-port PD would cause wrong-port EQ teardown and corruption.
DPDK creates one PD per port so no existing flows are affected.

Serialize mana_set_channels() and the async per-port queue reset
handler against RDMA vport configuration to prevent RDMA from claiming
the vport during the detach/attach window. A channel_changing flag is
set under apc->vport_mutex before detach and checked by
mana_cfg_vport() when called from the RDMA path, blocking RDMA from
grabbing the vport during the entire window. When the port is down
and RDMA already holds the vport, the channel change is rejected with
-EBUSY.

Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/20260605005717.2059954-2-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull runtime verifier fixes from Steven Rostedt:

- Fix reset ordering on per-task destruction

   Reset the task before dropping the slot instead of after, which was
   causing out-of-bound memory accesses.

- Fix HA monitor synchronization and cleanup

   Ensure synchronous cleanup for HA monitors by running timer callbacks
   in RCU read-side critical sections and using synchronize_rcu() during
   destruction.

- Avoid armed timers after tasks exit

   Add automatic cleanup for per-task HA monitors to prevent timers from
   firing after task exit.

- Fix memory ordering for DA/HA monitors

   Fix race conditions during monitor start by using release-acquire
   semantics for the monitoring flag.

- Fix initialization for DA/HA monitors

   Ensure monitors are not initialized relying on potentially corrupted
   state like the monitoring flag, that is not reset by all monitors
   type and may have an unknown state in monitors reusing the storage
   (per-task).

- Fix memory safety in per-task and per-object monitors

   Prevent use-after-free and out-of-bounds access by synchronizing with
   in-flight tracepoint probes using tracepoint_synchronize_unregister()
   before freeing monitor storage or releasing task slots.

- Adjust monitors for preemptible tracepoints

   Fix monitors that relied on tracepoints disabling preemption.
   Explicitly disable task migration when per-CPU monitors handle events
   to avoid accessing the wrong state and update the opid monitor logic.

- Fix incorrect __user specifier usage

   Remove __user from a non-pointer variable in the extract_params()
   helper.

- Fix bugs in the rv tool

   Ensure strings are NUL-terminated, fix substring matching in monitor
   searches, and improve cleanup and exit status handling.

- Fix several bugs in rvgen

   Fix LTL literal stringification, subparsers' options handling, and
   suffix stripping in dot2k.

* tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  verification/rvgen: Fix ltl2k writing True as a literal
  verification/rvgen: Fix options shared among commands
  verification/rvgen: Fix suffix strip in dot2k
  tools/rv: Fix cleanup after failed trace setup
  tools/rv: Fix substring match when listing container monitors
  tools/rv: Fix substring match bug in monitor name search
  tools/rv: Ensure monitor name and desc are NUL-terminated
  rv: Use 0 to check preemption enabled in opid
  rv: Prevent task migration while handling per-CPU events
  rv: Ensure synchronous cleanup for HA monitors
  rv: Add automatic cleanup handlers for per-task HA monitors
  rv: Do not rely on clean monitor when initialising HA
  rv: Fix monitor start ordering and memory ordering for monitoring flag
  rv: Ensure all pending probes terminate on per-obj monitor destroy
  rv: Prevent in-flight per-task handlers from using invalid slots
  rv: Reset per-task DA monitors before releasing the slot
  rv: Fix __user specifier usage in extract_params()

net: ncsi: Set ncsi_stop_dev() to inline while NET_NCSI not enabled

While NET_NCSI not enabled, ncsi_stop_dev() is not inline
and call with it, casue compile waring:

linux/include/net/ncsi.h:63:13: warning: 'ncsi_stop_dev'
defined but not used [-Wunused-function]
static void ncsi_stop_dev(struct ncsi_dev *nd)

Setting ncsi_stop_dev() to inline like other function to
remove compile warnings.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Link: https://patch.msgid.link/20260605033607.37630-1-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'trace-tools-v7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull RTLA fix from Steven Rostedt:

- Fix multi-character short option parsing

   Fix regression in parsing of multiple-character short options
   (eg -p100 /= -p 100/, -un /= -u -n/) caused by getopt_long()
   internal state corruption after a refactoring.

* tag 'trace-tools-v7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla: Fix parsing of multi-character short options

Merge branch 'consolidate-fcrypt-and-pcbc-code-into-net-rxrpc'

Eric Biggers says:

====================
Consolidate FCrypt and PCBC code into net/rxrpc/

The FCrypt "block cipher" and the PCBC mode of operation are obsolete
and insecure. Since their only user is net/rxrpc/, they belong there,
not in the crypto API.

Therefore, this series removes these algorithms from the crypto API and
replaces them with local implementations in net/rxrpc/.

The local implementations are simpler too, as they avoid the crypto API
boilerplate.

I don't know how to test all the code in net/rxrpc/, but everything
should still work. I added a KUnit test for the crypto functions.
====================

Link: https://patch.msgid.link/20260522050740.84561-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: pcbc - Remove support for PCBC mode

The only user of PCBC mode (Propagating Cipher Block Chaining mode) was
net/rxrpc/rxkad.c, which now uses local code instead.

While PCBC was an interesting cryptographic experiment, it has largely
been relegated to the history books and academic exercises. It is
non-parallelizable (i.e., very slow) and doesn't actually achieve the
integrity properties it was apparently intended to achieve.

Remove support for it from the crypto API.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260522050740.84561-6-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: fcrypt - Remove support for FCrypt block cipher

Remove the insecure FCrypt block cipher from the crypto API.  Its only
user was net/rxrpc/, but now net/rxrpc/ implements it locally.  The
crypto API implementation is no longer needed.

For some additional context: FCrypt was designed in 1988 and is
essentially a weakened version of DES.  It has the same 56-bit key size
as DES, which is easily brute forced.  Moreover, it's cryptographically
weak and doesn't even provide the intended 56-bit security level.  Its
author considers it to be a mistake, as well
(https://lists.openafs.org/pipermail/openafs-devel/2000-December/005320.html).

But fortunately this 1980s-era homebrew block cipher was never adopted
outside of net/rxrpc/.  So its code can just be kept there.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260522050740.84561-5-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rxrpc: Reimplement DES-PCBC using DES library

Since the use of "pcbc(des)" in rxkad_decrypt_ticket() is the only
remaining user of the crypto API "pcbc" template, just implement
DES-PCBC by locally implementing PCBC mode on top of the DES library.
Note that only the decryption direction is needed.

This will allow support for the obsolete PCBC mode to be removed from
the crypto API.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260522050740.84561-4-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rxrpc: Use local FCrypt-PCBC implementation

Use the local implementation of FCrypt-PCBC instead of the crypto API
one. This will allow the crypto API one to be removed. It also
simplifies the code quite a bit.

The local FCrypt-PCBC implementation is also significantly faster than
the crypto API one, since the crypto API one had a lot of overhead. For
example, benchmarking on an x86_64 CPU, I see that FCrypt-PCBC
decryption throughput improved from 83 MB/s to 157 MB/s.

(Meanwhile, AES-256-GCM decryption is 8064 MB/s on the same CPU.
Clearly, anyone looking for good performance, or anything that is
actually secure for that matter, needs to look elsewhere anyway.)

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260522050740.84561-3-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rxrpc: Add local FCrypt-PCBC implementation

Add a local implementation of FCrypt-PCBC encryption and decryption.
This will be used instead of the crypto API one, allowing the crypto API
one to be removed.  It will also simplify rxkad.c quite a bit.

A KUnit test is included.  The FCrypt-PCBC test vectors are borrowed
from the existing ones in crypto/testmgr.h.  Note that this adds the
first KUnit test for net/rxrpc/, which previously had no KUnit tests.

The FCrypt code is based on crypto/fcrypt.c, but I simplified it a bit.
The PCBC part is straightforward and I just wrote it from scratch.

Tested with:

    kunit.py run --kunitconfig net/rxrpc/

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://patch.msgid.link/20260522050740.84561-2-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ASoC: renesas: fsi: Add SPU clock control in hw_startup/shutdown

Enable and disable the SPU clock in fsi_hw_startup() and
fsi_hw_shutdown() to ensure the clock is active while the
driver accesses hardware registers.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-12-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: add fsi_clk_prepare/unprepare()

Add fsi_clk_prepare() and fsi_clk_unprepare() helpers and call them
from fsi_dai_startup() and fsi_dai_shutdown().
This ensures clk_prepare() and clk_unprepare() are executed from
sleepable contexts and keeps clocks prepared only while audio streams
are active.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-11-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: Add SPU clock support

FSI register accesses on the r8a7740 require the SPU bus clock to be
enabled. Add support for acquiring and managing the SPU clock via the
device tree to ensure proper register access.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-10-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: refactor clock initialization

Move fsi_clk_init() from set_fmt() to the probe path.
This ensures that clock resources are acquired only once during device
initialization, instead of being looked up repeatedly whenever set_fmt()
is called.
Together with the previous conversion to devm_clk_get_optional(), the
driver can now probe successfully even when optional clocks are absent.
The set_rate() callbacks continue to validate that all required clocks
are available before applying hardware-specific configuration.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-9-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: Use devm_clk_get_optional() for optional clocks

The xck, ick, and div clocks are optional. Switch from devm_clk_get()
to devm_clk_get_optional() to correctly handle cases where these clocks
are missing.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-8-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: Move fsi_clk_init()

Move fsi_clk_init() after set_rate() functions to prepare for subsequent
refactoring.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-7-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: Fix register access from in-flight IRQ after shutdown

In-flight IRQs may still be running when the SPU clock is disabled,
leading to register access after shutdown and causing system hangs.

Fix this to use fsi_stream_is_working() when handling in-flight IRQ
handlers. If no streams are active, the handler now returns immediately
to prevent hardware access.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-6-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: Move fsi_stream_is_working()

Move fsi_stream_is_working() before fsi_count_fifo_err().
This prepares for a subsequent patch that needs to check stream status
when handling in-flight IRQ handlers. No functional changwqes intended.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-5-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: renesas: fsi: Fix trigger stop ordering

Call fsi_stream_stop() before fsi_hw_shutdown(). This matches the existing
order in the suspend path.
This change ensures all register accesses during stream shutdown are fully
completed before disabling the clocks.

Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Link: https://patch.msgid.link/20260609113836.45079-4-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: renesas,fsi: add support multiple clocks

The FSI on r8a7740 requires the SPU bus/bridge clock to be enabled before
accessing its registers. Without this clock, any register access leads to
a system hang as the FSI block sits behind the SPU bus.
Update the binding to support multiple clocks to properly describe the
hardware clock tree, including:
  - SPU bus/bridge clock (spu) for register access.
  - CPG DIV6 clocks (icka/b) as functional clock.
  - FSI dividers (diva/b) for audio clock generation.
  - External clock inputs (xcka/b) provided by the board.
The hardware supports several valid clock configurations. For example,
when both FSIA and FSIB operate as slaves, only the fck and spu clocks
are required. When a port operates as a master, it can use either an
internal clock source (ickx + divx) or an external clock source
(ickx + xckx). Therefore, while fck and spu are mandatory on r8a7740,
the remaining clocks (icka/b, diva/b and xcka/b) are optional and depend
on the selected master/slave configuration and clock source.
Both sh73a0 and r8a7740 define the SPU DIV6 clock control register at
0xe6150084. The binding therefore documents the clocks supported by the
FSI driver for these variants.

Signed-off-by: bui duc phuc <phucduc.bui@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260609113836.45079-2-phucduc.bui@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

kconfig: tests: fix typo in comment

scripts/kconfig/tests/no_write_if_dep_unmet/__init__.py contains a typo
"COFIG_" for "CONFIG_". Fix it.

Discovered while searching for typos in CONFIG_* variable references.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260609021712.7965-1-enelsonmoore@gmail.com
Signed-off-by: Nathan Chancellor <nathan@kernel.org>

ASoC: codecs: aw88261: fixes and cleanup

Val Packett <val@packett.cool> says:

The Awinic smart speaker/amp drivers were merged in a very
"downstream-brained" state, where configuration was only really
determined by the binary "firmware" (register list) file instead
of properly participating in the ASoC system. Let's start
untangling this mess. This series makes aw88261 actually usable
on devices like fairphone-fp5, motorola-dubai and xiaomi-pipa.

Link: https://patch.msgid.link/20260529200550.529719-1-val@packett.cool

ASoC: codecs: aw88261: make volume control usable

- Invert the value to match userspace expectations (in the hardware,
  positive numbers represent negative dB attenuation)
- Provide TLV metadata for the dB scale (and divide the raw values by 2
  as the excessive precision used by HW is not representable in TLV)
- Do not unnecessarily reset the volume while switching profiles
- Simplify aw88261_dev_set_volume using regmap_update_bits
- Do not add the initial volume from the profile to the requested volume
  as that would throw off the dB mapping (if a lower max limit is
  desired, it can be set in the UCM profile in userspace)

With this change, it's actually possible to use this hardware volume
control as PlaybackVolume in an ALSA UCM profile.

Fixes: 028a2ae25691 ("ASoC: codecs: Add aw88261 amplifier driver")
Signed-off-by: Val Packett <val@packett.cool>
Link: https://patch.msgid.link/20260529200550.529719-8-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: aw88261: fix incorrect masks for boost regs

The boost-related register fields used in aw88261_reg_force_set use the
exact same definitions as the rest of the fields, where the mask must be
inverted when passing it to regmap_update_bits, but they weren't
inverted here.

Fixes: 028a2ae25691 ("ASoC: codecs: Add aw88261 amplifier driver")
Signed-off-by: Val Packett <val@packett.cool>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://patch.msgid.link/20260529200550.529719-7-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: aw88261: remove async start

Codec drivers are not supposed to do anything like this. The result was
that the first second or so of playback was essentially inaudible, and
very short alert sounds could be missed entirely. Let's not do this.

Signed-off-by: Val Packett <val@packett.cool>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://patch.msgid.link/20260529200550.529719-6-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: aw88261: remove fade in/out on start/stop

This "feature" was copied from downstream, but it does not belong in
the kernel at all. Remove it to simplify the driver.

Signed-off-by: Val Packett <val@packett.cool>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://patch.msgid.link/20260529200550.529719-5-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: aw88261: reduce log spam

This driver would create a wall of logspam during initialization due to
e.g. the PLL not being ready while waiting for it to stabilize. Change
intermediate dev_err() calls to dev_dbg() to reduce the noise.

While here, log the detected chip ID when that check fails.

Signed-off-by: Val Packett <val@packett.cool>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://patch.msgid.link/20260529200550.529719-4-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: aw88261: add TDM support

This amp supports TDM mode, so implement the set_tdm_slot operation to
let the SoC driver configure the TDM slot number, width, and masks.

Signed-off-by: Val Packett <val@packett.cool>
Link: https://patch.msgid.link/20260529200550.529719-3-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: aw88261: support changing sample rate and bit width

The aw88261 driver only worked with 32-bit 48kHz streams so far due to
the lack of a proper PLL initialization sequence. Fix by selecting all
the necessary PLL settings based on what was passed to us by the
hw_params/set_fmt ops. This replaces the strange downstream routine
that tries two divider modes in sequence.

Fixes: 028a2ae25691 ("ASoC: codecs: Add aw88261 amplifier driver")
Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcm6490-fairphone-fp5
Signed-off-by: Val Packett <val@packett.cool>
Link: https://patch.msgid.link/20260529200550.529719-2-val@packett.cool
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: dw: fix race between IRQ handler and error handler on SMP

On SMP systems, dw_spi_handle_err() can be called from the SPI core
kthread while the IRQ handler is still accessing the FIFO on another
CPU. Resetting the chip via dw_spi_reset_chip() during an active FIFO
read/write causes a bus error.

Fix this by calling disable_irq() before the chip reset, which masks
the IRQ and waits for any in-flight handler to complete via
synchronize_irq(). This ensures no handler is accessing the FIFO when
the reset occurs.

Signed-off-by: Peng Yang <pyangyyd@amazon.com>
Suggested-by: Jonathan Chocron <jonnyc@amazon.com>
Link: https://patch.msgid.link/20260608095849.3446-1-pyangyyd@amazon.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: amd: yc: Add DMI quirk for ASUS EXPERTBOOK PM1403CDA

Add a DMI quirk for the ASUS EXPERTBOOK PM1403CDA fixing the issue
where the internal microphone was not detected.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=221608
Signed-off-by: Zhang Heng <zhangheng@kylinos.cn>
Link: https://patch.msgid.link/20260604125815.42297-1-zhangheng@kylinos.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: meson-spifc: fix runtime PM leak on remove

pm_runtime_get_sync() increments the runtime PM usage counter even when it
returns an error. meson_spifc_remove() uses it to resume the controller
before disabling runtime PM, but never drops the usage counter again.

Balance the get with pm_runtime_put_noidle() after disabling runtime PM,
matching the teardown pattern used by other SPI controller drivers.

Found by static analysis. I do not have hardware to test this.

Fixes: c3e4bc5434d2 ("spi: meson: Add support for Amlogic Meson SPIFC")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Link: https://patch.msgid.link/20260609052647.5-1-ruoyuw560@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

software node: allow passing reference args to PROPERTY_ENTRY_REF()

When dynamically creating software nodes and properties for subsequent
use with software_node_register() current implementation of
PROPERTY_ENTRY_REF() is not suitable because it creates a temporary
instance of struct software_node_ref_args on stack which will later
disappear, and software_node_register() only does shallow copy of
properties.

Fix this by allowing to pass address of reference arguments structure
directly into PROPERTY_ENTRY_REF(), so that caller can manage lifetime
of the object properly.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/aiTo4dvKu8pyimHA@google.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

regulator: dt-bindings: mt6311: Convert to DT schema

Convert mediatek,mt6311 to DT schema.

Signed-off-by: Ninad Naik <ninadnaik07@gmail.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260604162624.644241-1-ninadnaik07@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: qcom_smd-regulator: Add PM8019

Stephan Gerhold <stephan.gerhold@linaro.org> says:

Add the definitions and dt-bindings for the regulators in PM8019 to allow
controlling them through the RPM firmware. PM8019 is typically used
together with the MDM9607 SoC.

Link: https://patch.msgid.link/20260608-rpm-smd-regulator-pm8019-v1-0-c671388b9ea5@linaro.org

regulator: qcom_smd-regulator: Add PM8019

Add the definitions for the regulators in PM8019 to allow controlling them
through the RPM firmware. Reading the TYPE/SUBTYPE registers using SPMI
reveals that PM8019 uses a mixture of regulators from PMA8084 (hfsmps,
pldo) and PM8916 (nldo).

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260608-rpm-smd-regulator-pm8019-v1-2-c671388b9ea5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: dt-bindings: qcom,smd-rpm-regulator: Add PM8019

Add the qcom,rpm-pm8019-regulators compatible to allow describing
regulators controlled by the RPM firmware on platforms that use PM8019.

Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Link: https://patch.msgid.link/20260608-rpm-smd-regulator-pm8019-v1-1-c671388b9ea5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: Use named initializers for platform_device_id arrays

Named initializers are better readable and more robust to changes of the
struct definition. This robustness is relevant for a planned change to
struct platform_device_id replacing .driver_data by an anonymous union.

While touching these arrays unify spacing and usage of commas.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/3fcd432a505bb1bb7f8ef0fba9162243200b3347.1780606153.git.u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: rzv2h-rspi: Add suspend/resume support

Add suspend/resume support to the rzv2h-rspi driver by implementing
suspend and resume callbacks that delegate to spi_controller_suspend()
and spi_controller_resume() respectively.

Signed-off-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Link: https://patch.msgid.link/20260608202509.3651345-1-tommaso.merciai.xr@bp.renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: qcom-geni: Fix cs_change handling on the last transfer

TPM TIS SPI probe fails with:

tpm_tis_spi: probe of spi11.0 failed with error -110

TPM TIS SPI sets cs_change=1 on single-transfer messages to keep CS
asserted across the header, wait-state, and data phases of a transaction.
CS deassertion between these phases violates the TCG SPI flow control
specification.

This bug was introduced by commit b99181cdf9fa ("spi-geni-qcom: remove
manual CS control"), which replaced manual CS control with automatic CS
control via the FRAGMENTATION bit. The FRAGMENTATION bit controls CS
behavior after a transfer: when set to 1, CS remains asserted; when
cleared to 0, CS is deasserted.

The commit correctly sets FRAGMENTATION for non-last transfers with
cs_change=0 to keep CS asserted between chained transfers, but misses the
case where cs_change=1 is set on the last transfer. When cs_change=1 on
the last transfer, the client requests CS to remain asserted after the
message completes, so FRAGMENTATION must be set to 1 in this case as well.

Fix setup_se_xfer() to set FRAGMENTATION when cs_change=1 on the last
transfer.

Also fix the same missing case in setup_gsi_xfer() and correct it to
write 1 instead of the raw bitmask FRAGMENTATION (value 4) to
peripheral.fragmentation. This field is a 1-bit boolean consumed by
gpi_create_spi_tre() via u32_encode_bits(..., TRE_SPI_GO_FRAG). Writing 4
to a 1-bit field causes u32_encode_bits() to mask it to 0, silently
disabling the FRAGMENTATION bit in the GPI TRE regardless of the
cs_change logic.

Fixes: b99181cdf9fa ("spi-geni-qcom: remove manual CS control")
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Link: https://patch.msgid.link/20260609-fix-spi-fragmentation-bit-logic-v2-1-e18efc255563@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

hwmon: (ina238) Add update_interval_us attribute

The INA238 family supports eight conversion time steps from 50 us to
4120 us (SQ52206: 66 us to 8230 us). At the millisecond granularity of
update_interval, the four shortest steps (50, 84, 150, 280 us) all
round to the same value and cannot be individually selected.

Add support for the generic update_interval_us attribute, which reports
and programs the same ADC cycle time as update_interval but in
microseconds, giving userspace full access to all conversion time steps.

Both attributes reflect the total cycle time including the active
averaging count: the reported value is the raw conversion time
multiplied by the number of averaged samples, and writes apply the
inverse mapping.

Signed-off-by: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>
Link: https://lore.kernel.org/r/20260609-hwmon-ina238-update-interval-us-v2-v3-3-016b55567950@advastore.com
[groeck: Fixed some multi-line alignment issues]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

hwmon: Add update_interval_us chip attribute

Some hardware monitoring chips support update intervals below one
millisecond. The existing update_interval attribute uses millisecond
granularity, which causes sub-millisecond steps to round to the same
value and become inaccessible from userspace.

Introduce update_interval_us, a companion chip-level attribute that
expresses the same update interval in microseconds. Drivers
implementing this attribute should also implement update_interval for
compatibility with millisecond-based userspace interfaces.

Signed-off-by: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>
Link: https://lore.kernel.org/r/20260609-hwmon-ina238-update-interval-us-v2-v3-2-016b55567950@advastore.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

hwmon: (ina238) Add support for samples and update_interval

Expose INA238 ADC averaging count (AVG) and conversion timing
(VBUSCT/VSHCT/VTCT) through chip-level hwmon attributes:

chip/samples
chip/update_interval

Use per-chip conversion-time lookup tables so the same helpers work
for INA228/INA237/INA238/INA700/INA780 and SQ52206. Cache ADC_CONFIG
in driver data and update it on writes to avoid extra register reads
during read-modify-write updates.

Report update_interval in milliseconds as required by the hwmon ABI.
Compute it from raw ADC cycle time multiplied by the active averaging
count, and apply the inverse mapping on writes so programmed conversion
time tracks the selected sample count.

Clamp user-provided update_interval before unit scaling to prevent
overflow in arithmetic conversions.

Also combine chip attributes in HWMON_CHANNEL_INFO using a bitwise OR
for a single logical chip channel.

Signed-off-by: Ferdinand Schwenk <ferdinand.schwenk@advastore.com>
Link: https://lore.kernel.org/r/20260609-hwmon-ina238-update-interval-us-v2-v3-1-016b55567950@advastore.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

svcrdma: wake sq waiters when the transport closes

Threads parked in svc_rdma_sq_wait() on sc_sq_ticket_wait or
sc_send_wait can hang indefinitely in TASK_UNINTERRUPTIBLE state
across transport teardown, pinning svc_xprt references and
blocking svc_rdma_free().

The close path sets XPT_CLOSE before invoking xpo_detach and both
wait_event predicates include an XPT_CLOSE term, but the
predicates are re-evaluated only on wakeup. sc_sq_ticket_wait has
no completion-driven wake path; it is advanced solely by the
chained ticket handoff inside svc_rdma_sq_wait() itself. Without
an explicit wake at close, parked threads never observe
XPT_CLOSE, hold their svc_xprt_get reference forever, and
svc_rdma_free() blocks on xpt_ref dropping to zero.

Two close entry points reach this transport. Local teardown runs
svc_rdma_detach() from svc_handle_xprt() -> svc_delete_xprt() ->
xpo_detach() on a worker thread. A remote disconnect arrives at
svc_rdma_cma_handler(), which calls svc_xprt_deferred_close():
that sets XPT_CLOSE and enqueues the transport but does not
access either RDMA waitqueue, so a worker already parked in
svc_rdma_sq_wait() never re-evaluates its predicate. With every
worker parked on this transport, no thread is available to run
the local teardown either, and the wake site there is
unreachable.

Introduce svc_rdma_xprt_deferred_close(), a thin svcrdma wrapper
that calls svc_xprt_deferred_close() and then wakes both
sc_sq_ticket_wait and sc_send_wait. Convert the svcrdma producers
that called svc_xprt_deferred_close() directly:
svc_rdma_cma_handler(), qp_event_handler(),
svc_rdma_post_send_err(), svc_rdma_wc_send(), the sendto drop
path, the rw completion error paths, and the recvfrom flush and
read-list error paths.

Wake both waitqueues from svc_rdma_detach() as well. The
synchronous svc_xprt_close() path (backchannel ENOTCONN, device
removal via svc_rdma_xprt_done) reaches detach without flowing
through svc_xprt_deferred_close() and therefore does not invoke
the new helper.

Fixes: ccc89b9d1ed2 ("svcrdma: Add fair queuing for Send Queue access")
Cc: stable@vger.kernel.org
Assisted-by: kres (claude-opus-4-7)
Signed-off-by: Chris Mason <clm@meta.com>
[ cel: add svc_rdma_xprt_deferred_close() to complete the fix ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: reset write verifier on deferred writeback errors

nfsd_vfs_write() and nfsd_commit() both call filemap_check_wb_err() to
detect deferred writeback errors, but neither rotates the server's write
verifier (nn->writeverf) when this check fails. Every other
durable-storage-failure path in these functions calls
commit_reset_write_verifier() before returning an error.

The missing rotation means clients holding UNSTABLE write data under the
current verifier will COMMIT, receive the unchanged verifier back, and
conclude their data is durable — silently dropping data that failed
writeback. This violates the UNSTABLE+COMMIT durability contract
(RFC 1813 §3.3.7, RFC 8881 §18.32).

Add commit_reset_write_verifier() calls at both filemap_check_wb_err()
error sites, matching the pattern used by adjacent error paths in the
same functions. The helper already filters -EAGAIN and -ESTALE
internally, so the calls are unconditionally safe.

Reported-by: Chris Mason <clm@meta.com>
Fixes: 555dbf1a9aac ("nfsd: Replace use of rwsem with errseq_t")
Cc: stable@vger.kernel.org
Assisted-by: kres:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race

When find_or_alloc_open_stateowner() encounters an unconfirmed owner, it
calls release_openowner() and sets oo = NULL. Control then falls through
past the `if (oo)` guard -- which would have freed any pre-allocated
`new` -- and unconditionally executes `new = alloc_stateowner(...)`. If
`new` was already allocated on a prior iteration, the pointer is
silently overwritten and the previous allocation (slab object + owner
name buffer) is leaked.

This requires a race: two NFSv4.0 OPEN threads with the same owner
string, where a concurrent thread inserts a new unconfirmed owner into
the hash between retry iterations. The window is narrow but repeatable
under adversarial conditions.

Fix by adding `goto retry` after `oo = NULL` so the already-allocated
`new` is reused on the next iteration rather than overwritten.

Reported-by: Chris Mason <clm@meta.com>
Fixes: 23df17788c62 ("nfsd: perform all find_openstateowner_str calls in the one place.")
Cc: stable@vger.kernel.org
Assisted-by: kres:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

sunrpc: wait for in-flight TLS handshake callback when cancel loses race

When wait_for_completion_interruptible_timeout() in
svc_tcp_handshake() returns 0 (timeout) or -ERESTARTSYS (signal) and
tls_handshake_cancel() then returns false, handshake_complete() has
won the cancellation race: it has set HANDSHAKE_F_REQ_COMPLETED and
is about to invoke svc_tcp_handshake_done(), but the callback's
side effects on xpt_flags and on svsk->sk_handshake_done have not
yet committed.

The current code reads xpt_flags immediately to decide whether the
session succeeded. Two races result.

If the callback has executed set_bit(XPT_TLS_SESSION) but not yet
clear_bit(XPT_HANDSHAKE), svc_tcp_handshake() sees a session,
enqueues the transport, and returns. svc_xprt_received() then
clears XPT_BUSY, a worker thread picks the transport up, the
dispatcher in svc_handle_xprt() observes XPT_HANDSHAKE still set,
and xpo_handshake is invoked a second time. That svc_tcp_handshake()
calls init_completion(&svsk->sk_handshake_done) while the original
callback concurrently calls complete_all() on it, corrupting the
embedded swait_queue.

If the callback has set HANDSHAKE_F_REQ_COMPLETED but not yet
entered svc_tcp_handshake_done(), svc_tcp_handshake() reads
XPT_TLS_SESSION as clear and tears the connection down even though
the handshake is about to succeed.

Wait for the callback to commit before inspecting xpt_flags. The
completion is guaranteed to fire because handshake_complete()
invokes svc_tcp_handshake_done() unconditionally once it has set
HANDSHAKE_F_REQ_COMPLETED.

Fixes: b3cbf98e2fdf ("SUNRPC: Support TLS handshake in the server-side TCP socket code")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

sunrpc: pin svc_xprt across the asynchronous TLS handshake callback

svc_tcp_handshake() stores the raw svc_xprt pointer in
tls_handshake_args.ta_data and submits the request through
tls_server_hello_x509(). The handshake core takes only
sock_hold(req->hr_sk); nothing references the embedding struct
svc_sock that svc_tcp_handshake_done() reaches via container_of().

Two close races leave the in-flight callback writing through a freed
svc_sock. svc_sock_free() calls tls_handshake_cancel() and discards
its return value: a false return means handshake_complete() has
already set HANDSHAKE_F_REQ_COMPLETED but hp_done() may not have
finished, yet svc_sock_free() proceeds to kfree(svsk). The
cancel-loser fall-through inside svc_tcp_handshake() itself produces
the same window: when wait_for_completion_interruptible_timeout()
returns <= 0 (timeout or signal) and tls_handshake_cancel() returns
false, the function does not drain, returns, and svc_handle_xprt()
calls svc_xprt_received(), which clears XPT_BUSY and can drop the
last reference. A concurrent close then runs svc_sock_free() while
svc_tcp_handshake_done() is still updating xpt_flags and walking
svsk->sk_handshake_done.

The corruption surfaces as set_bit/clear_bit RMW into the freed
xpt_flags slab slot and as complete_all() walking and writing the
freed wait_queue_head_t list embedded in sk_handshake_done -- a
slab-corruption primitive, not a benign read. The path is reachable
on any TLS-enabled NFS server whenever a connection close overlaps
the tlshd downcall delivery window; the interruptible wait means
signal delivery suffices, not just SVC_HANDSHAKE_TO expiry.

Take svc_xprt_get(xprt) immediately before tls_server_hello_x509()
so the in-flight callback owns its own reference. Release it on the
two edges where the callback is guaranteed not to fire -- submission
failure from tls_server_hello_x509() and a successful
tls_handshake_cancel() -- and at the tail of
svc_tcp_handshake_done() after complete_all().

Fixes: b3cbf98e2fdf ("SUNRPC: Support TLS handshake in the server-side TCP socket code")
Cc: stable@vger.kernel.org
Signed-off-by: Chris Mason <clm@meta.com>
Assisted-by: kres (claude-opus-4-7)
[cel: rewrote commit message to describe the actual change]
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: fix posix_acl leak on SETACL decode failure

nfsaclsvc_decode_setaclargs() and nfs3svc_decode_setaclargs() each
call nfs_stream_decode_acl() twice, first for NFS_ACL and then for
NFS_DFACL.  Each successful call transfers ownership of a freshly
allocated posix_acl into argp->acl_access or argp->acl_default.  If
the first call succeeds but the second fails, the decoder returns
false and argp->acl_access is left dangling.

ACLPROC2_SETACL.pc_release was wired to nfssvc_release_attrstat and
ACLPROC3_SETACL.pc_release was wired to nfs3svc_release_fhandle.
Both only call fh_put() and have no knowledge of the ACL fields on
argp.  The posix_acl_release() pairs sat at the out: labels inside
nfsacld_proc_setacl() and nfsd3_proc_setacl(), but svc_process()
skips pc_func when pc_decode returns false, so that cleanup is
unreachable on decode failure:

    svc_process_common()
      pc_decode()                  /* decode_setaclargs: false */
      /* pc_func skipped */
      pc_release()                 /* fh_put only -- ACLs leaked */

The orphaned posix_acl is leaked for the lifetime of the server.

Fix by adding nfsaclsvc_release_setacl() and nfs3svc_release_setacl(),
which release both argp->acl_access and argp->acl_default in addition
to fh_put(), and wiring them as pc_release for their respective SETACL
procedures.  pc_release runs on every path svc_process() takes after
decode, including decode failure, so the posix_acl_release() pairs are
removed from the proc functions' out: labels to keep ownership in one
place.  This matches the existing release_getacl() pattern used by
the sibling GETACL procedures.

Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.")
Cc: stable@vger.kernel.org
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: fix posix_acl leak and ignored error in nfsd4_create_file

nfsd4_create_file() has two bugs in its ACL handling:

The return value of nfsd4_acl_to_attr() is silently discarded.  When
the NFSv4-to-POSIX ACL conversion fails (e.g., -EINVAL for
unsupported ACE types), the file is created without any ACL and the
client receives NFS4_OK.  This violates RFC 7530/8881 which require
the server to reject unsupported attributes on CREATE.

When start_creating() fails after ACL attributes have been populated
in attrs (either via nfsd4_acl_to_attr or via ownership transfer from
open->op_dpacl/op_pacl), the function jumps to out_write which skips
nfsd_attrs_free().  The posix_acl allocations are leaked.  A client
can trigger this repeatedly with OPEN(CREATE), ACL attributes, and an
invalid filename (e.g., longer than NAME_MAX).

Fix both by capturing the nfsd4_acl_to_attr() return value and by
changing the early error paths to jump to out instead of out_write.
Initialize child to ERR_PTR(-EINVAL) so that end_creating() is safe
to call even if start_creating() was never reached.

Reported-by: Chris Mason <clm@meta.com>
Fixes: 7ab96df840e6 ("VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()")
Cc: stable@vger.kernel.org
Assisted-by: kres:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: check get_user() return when reading princhashlen

In __cld_pipe_inprogress_downcall(), the get_user() that reads
princhashlen from the userspace cld_msg_v2 buffer does not check its
return value. A failing copy leaves princhashlen with uninitialised
stack contents, which are then used to drive memdup_user() and stored
as princhash.len on the resulting reclaim record. The other get_user()
calls in this function all check the return; only this one is missed,
which is most likely a copy-paste oversight from when v2 upcalls were
introduced.

Mirror the existing pattern used a few lines above for namelen.
namecopy is declared with __free(kfree) so the early return cleans up
the already-allocated buffer automatically.

Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
Cc: stable@vger.kernel.org
Signed-off-by: Dominik Woźniak <stalion@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: fix inverted cp_ttl check in async copy reaper

nfsd4_async_copy_reaper() is supposed to keep completed async copy
state around for NFSD_COPY_INITIAL_TTL (10) laundromat ticks so
that OFFLOAD_STATUS can report the result, then reap the state once
the countdown expires.

The TTL predicate is inverted: `if (--copy->cp_ttl)` is true while
ticks remain and false when the counter reaches zero. This causes
the copy to be reaped on the very first tick (cp_ttl goes from 10
to 9, which is non-zero) instead of after all 10 ticks elapse.
Once reaped, OFFLOAD_STATUS returns NFS4ERR_BAD_STATEID because
the copy state has already been freed.

Fix by negating the test so that cleanup runs when the TTL expires.

Fixes: aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live")
Cc: stable@vger.kernel.org
Reported-by: Chris Mason <clm@meta.com>
Assisted-by: kres:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: fix dead ACL conflict guard in nfsd4_create

nfsd4_create() steals create->cr_dpacl/cr_pacl into the local
nfsd_attrs via the designated initializer, then immediately sets the
source pointers to NULL. The subsequent conflict guard tests the
already-nilled source fields, making it permanently dead code:

if (create->cr_acl) {
if (create->cr_dpacl || create->cr_pacl) /* always false */

When a client encodes both FATTR4_WORD0_ACL and
FATTR4_WORD2_POSIX_{DEFAULT,ACCESS}_ACL in the same CREATE fattr
bitmap, nfsd4_acl_to_attr() overwrites attrs.na_pacl/na_dpacl without
releasing the originals, leaking two posix_acl slab objects per
request. Repeated requests cause unbounded slab exhaustion.

Fix by checking attrs.na_dpacl/na_pacl (the stolen values) instead of
the nilled create->cr_dpacl/cr_pacl, matching the correct pattern
already used in nfsd4_setattr().

Reported-by: Chris Mason <clm@meta.com>
Assisted-by: kres:claude-opus-4-6
Fixes: d2ca50606f5f ("NFSD: Add support for POSIX draft ACLs for file creation")
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix SECINFO_NO_NAME decode error cleanup

nfsd4_decode_secinfo_no_name() currently initializes sin_exp after
decoding sin_style. If the XDR stream is truncated, the decoder returns
nfserr_bad_xdr before sin_exp is initialized.

Since commit 3fdc54646234 ("NFSD: Reduce amount of struct
nfsd4_compoundargs that needs clearing"), the inline iops array is not
cleared between RPC calls. A failed SECINFO_NO_NAME decode can therefore
leave sin_exp holding stale union contents from a previous operation.

The error response path still invokes nfsd4_secinfo_no_name_release(),
which calls exp_put() on a non-NULL sin_exp.

Initialize sin_exp before the first failable decode step, matching
nfsd4_decode_secinfo().

Fixes: 3fdc54646234 ("NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing")
Cc: stable@vger.kernel.org
Signed-off-by: Guannan Wang <wgnbuaa@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

sunrpc: harden rq_procinfo lifecycle to prevent double-free

The svc_release_rqst() function executes the callback inside
rqstp->rq_procinfo->pc_release. However, if a worker thread begins
processing a new request and encounters an early error path (e.g.,
unsupported protocol, short frame, or bad auth) before a valid
rq_procinfo is installed, a stale release hook can be re-triggered
against reused state from the previous RPC, resulting in a double-free
or use-after-free vulnerability.

Harden the lifecycle of rq_procinfo by:
1. Ensuring svc_release_rqst() always clears rq_procinfo after the
optional pc_release() call, regardless of whether the hook exists.
2. Explicitly clearing rq_procinfo at request entry in svc_process()
before any early decode or drop paths.
3. Ensuring svc_process_bc() does the same at backchannel entry.

This guarantees that error flows will not encounter a non-NULL stale
rq_procinfo pointer when there is nothing to release.

Fixes: d9adbb6e10bf ("sunrpc: delay pc_release callback until after the reply is sent")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Suggested-by: Chuck Lever <cel@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Luxiao Xu <rakukuip@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Return an error from xdr_buf_to_bvec() on overflow

xdr_buf_to_bvec() returns a slot count even when the caller's bvec
budget is exhausted partway through the xdr_buf. Callers feed that
count into iov_iter_bvec() and continue as if the conversion had
succeeded, silently sending or writing fewer bytes than the data
length declares. For an NFS WRITE the server reports the truncated
transfer to the client as full success.

The overflow represents an internal invariant violation: a higher
layer reserved a bvec budget too small for the xdr_buf it then
asked the encoder to convert. That is a server-side fault, not a
media I/O failure and not a malformed client argument.

Change xdr_buf_to_bvec() to return a signed int and have the
overflow label return -ESERVERFAULT. Update the three callers to
detect the negative return and fail the request: nfsd_vfs_write()
folds the error into host_err, which nfserrno() translates to
nfserr_serverfault for the WRITE reply; svc_udp_sendto() and
svc_tcp_sendmsg() propagate the error out of the send path.

Reported-by: Chris Mason <clm@meta.com>
Fixes: 2eb2b9358181 ("SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Bound-check xdr_buf_to_bvec() stores before writing

xdr_buf_to_bvec() writes a bio_vec into the caller's array before
testing whether that slot is in range, and the head branch performs
the store with no check at all. When the caller's budget is exactly
used up, the next store lands one element past the end of the array.
The overflow label returns count - 1, which masks the surplus store
but cannot undo it.

rq_bvec, the array passed by nfsd_vfs_write(), is allocated to
exactly rq_maxpages entries with no slack. The OOB store can land in
adjacent slab memory; the bv_len and bv_offset fields written there
are derived from client-supplied RPC payload sizes.

Move the in-range check ahead of the store in the head, page-loop,
and tail branches. With the check at the top of each sequence, count
is incremented only after a successful store, so the overflow label
can return count directly.

Reported-by: Chris Mason <clm@meta.com>
Fixes: 2eb2b9358181 ("SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: release layout stid on setlease failure

nfs4_alloc_stid() publishes the new stid into cl->cl_stateids via
idr_alloc_cyclic() under cl_lock before returning to
nfsd4_alloc_layout_stateid(). When nfsd4_layout_setlease() then
fails, the error path frees the layout stateid directly with
kmem_cache_free() without ever calling idr_remove(), leaving the
IDR slot pointing at freed slab memory. Any subsequent IDR walker
(states_show, client teardown) dereferences the dangling pointer.

The correct teardown for an IDR-published stid is nfs4_put_stid(),
which removes the IDR slot under cl_lock, dispatches sc_free
(nfsd4_free_layout_stateid) to release ls->ls_file via
nfsd4_close_layout(), and drops the nfs4_file reference in its
tail.

A second issue blocks that switch: nfsd4_free_layout_stateid()
unconditionally inspects ls->ls_fence_work via
delayed_work_pending() under ls_lock, but
INIT_DELAYED_WORK(&ls->ls_fence_work, ...) currently runs only
after the setlease call. On the setlease-failure path the
destructor would touch an uninitialized delayed_work.

    nfsd4_alloc_layout_stateid()
      nfs4_alloc_stid()           /* idr_alloc_cyclic under cl_lock */
      nfsd4_layout_setlease()     /* fails */
        nfs4_put_stid()
          nfsd4_free_layout_stateid()
            delayed_work_pending(&ls->ls_fence_work)  /* needs INIT */
            nfsd4_close_layout()  /* nfsd_file_put(ls->ls_file) */
          put_nfs4_file()

Fix by hoisting the ls_fenced / ls_fence_delay / INIT_DELAYED_WORK
initialization above the nfsd4_layout_setlease() call, and replace
the manual nfsd_file_put + put_nfs4_file + kmem_cache_free cleanup
with a single nfs4_put_stid(stp).

Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls")
Cc: stable@vger.kernel.org
Assisted-by: kres (claude-opus-4-7)
Signed-off-by: Chris Mason <clm@meta.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

lockd: Avoid hashing uninitialized bytes in nlm4svc_lookup_file()

file_hash() digests the first LOCKD_FH_HASH_SIZE bytes of
nfs_fh.data when bucketing nlm_files[], independent of fh.size.
Commit 3de744ee4e45 ("lockd: Use xdrgen XDR functions for the
NLMv4 TEST procedure") set .pc_argzero to zero for the converted
procedures and moved file-handle population into
nlm4svc_lookup_file(), which copies only xdr_lock->fh.len bytes
into lock->fh.data.

When an NLMv4 client presents a file handle shorter than
LOCKD_FH_HASH_SIZE, bytes fh.len..31 retain whatever the argument
buffer held from an earlier request. The same wire handle then
hashes to different buckets across calls; nlm_lookup_file() misses
the existing nlm_file entry, and lock-state lookups fail.

Zero only the tail bytes that file_hash() would otherwise consume.
Handles of LOCKD_FH_HASH_SIZE or larger already populate every byte
that file_hash() reads.

Reported-by: Jeff Layton <jlayton@kernel.org>
Closes: https://lore.kernel.org/r/5229a9746d723a3f830120c0b966510f75badfc2.camel@kernel.org
Fixes: 3de744ee4e45 ("lockd: Use xdrgen XDR functions for the NLMv4 TEST procedure")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

lockd: Plug nlm_file refcount leak on cached nlm_do_fopen() failure

The cached-file path in nlm_lookup_file() reaches the found: label
unconditionally, even when nlm_do_fopen() fails. At that label
*result and file->f_count are updated before the error is returned.
The wrappers nlm3svc_lookup_file() and nlm4svc_lookup_file() then
bail out of their switch without copying *result back to their
caller, so the proc handler's local nlm_file pointer remains NULL
and the cleanup path skips nlm_release_file(). The f_count
increment is never released, and nlm_traverse_files() can no
longer reap the file because its refcount never returns to zero
between requests.

Short-circuit the cached path so neither *result nor f_count is
touched when nlm_do_fopen() fails on a hashed nlm_file.

Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

lockd: Plug nlm_file leak when nlm_do_fopen() fails

A client can repeatedly drive nlm_do_fopen() failures by presenting
file handles that the underlying export rejects. After kzalloc_obj()
succeeds in nlm_lookup_file(), the freshly allocated nlm_file is not
yet inserted into nlm_files[]. The nlm_do_fopen() failure path jumps
to out_unlock, which releases nlm_file_mutex and returns without
freeing the allocation, so each failure leaks one nlm_file.

Route the failure through out_free so kfree() runs before the
function returns.

Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Revert "NFSD: Defer sub-object cleanup in export put callbacks"

This reverts commit 48db892356d6cb80f6942885545de4a6dd8d2a29.

Commit 48db892356d6 ("NFSD: Defer sub-object cleanup in export
put callbacks") moved path_put() and auth_domain_put() out of
svc_export_put() and expkey_put() and behind queue_rcu_work() to
close a claimed use-after-free in e_show() and c_show() against
ex_path and ex_client->name. Discussion in [1] shows neither
the diagnosis nor the remedy survives review.

The downstream teardown of both sub-objects is already RCU-deferred.
auth_domain_put() reaches svcauth_unix_domain_release(), which frees
the unix_domain and its ->name through call_rcu(). path_put()
reaches dentry_free(), which frees the dentry through call_rcu(),
and prepend_path() is already structured to tolerate concurrent
dentry teardown. A reader in cache_seq_start_rcu() therefore
observes both sub-objects through the next grace period regardless
of whether svc_export_put() runs synchronously, so the synchronous
form was never unsafe.

The crash signature in the report cited by commit 48db892356d6
("NFSD: Defer sub-object cleanup in export put callbacks") has a
different root cause: a /proc/net/rpc cache file held open across
network-namespace exit lets cache_destroy_net() free cd->hash_table
while a reader is still walking it. The correct fix pins cd->net for
the open fd's lifetime and does not require any deferral inside
svc_export_put().

Meanwhile, deferring path_put() out of svc_export_put() reintroduces
the regression that commit 69d803c40ede ("nfsd: Revert "nfsd:
release svc_expkey/svc_export with rcu_work"") repaired: after
"exportfs -r" drops the last cache reference, the mount reference
held through ex_path lingers in the workqueue, so a subsequent
umount fails with EBUSY.

Restore the synchronous path_put() and auth_domain_put() in
svc_export_put() and expkey_put() and the call_rcu()/kfree_rcu()
free of the containing structures. The unrelated fix for
ex_uuid/ex_stats from commit 2530766492ec ("nfsd: fix UAF when
access ex_uuid or ex_stats") is preserved.

Link: https://lore.kernel.org/all/10019b42-4589-4f9f-8d5b-d8197db1ce3c@huawei.com/
Fixes: 48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Alexandr Alexandrov <alexandr.alexandrov@oracle.com>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Revert "svcrdma: Use contiguous pages for RDMA Read sink buffers"

Jonathan Flynn reports that commit 18755b8c2f24 ("svcrdma: Use
contiguous pages for RDMA Read sink buffers") regresses NFS/RDMA
WRITE throughput from 73.9 GiB/s to 30.3 GiB/s on a 128-core
single-NUMA-node server driving dual 400Gb/s links with 640 nfsd
threads. Server CPU utilization rises from 8.5% to 76%, with
roughly three quarters of all cycles spent spinning on zone->lock.

The sink buffers are allocated as high-order page blocks, split
into single pages so each sub-page carries an independent refcount,
and later released one page at a time through folio batches. The
per-CPU page caches cannot satisfy an allocation stream whose alloc
order differs from its free order, so every sink buffer page makes
a round trip through the buddy allocator's free lists, serialized
on the zone lock of the single NUMA node. The rq_pages entries that
the split pages displace, bulk-allocated moments earlier by
svc_alloc_arg(), are freed without ever being used, doubling the
allocator traffic.

The regression cannot be addressed trivially. Revert the commit
now; a reworked approach can return in an upcoming merge window.

Reported-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Reported-by: Mike Snitzer <snitzer@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/aiHlPmeZq3WgMwoJ@kernel.org/
Closes: https://lore.kernel.org/linux-nfs/3cb119b4b2a8aada30c0c60286778a54@mail.gmail.com/
Fixes: 18755b8c2f24 ("svcrdma: Use contiguous pages for RDMA Read sink buffers")
Cc: stable@vger.kernel.org
Tested-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

lockd: Unify cast_status

cast_status folds internal lock-daemon sentinels into NLMv1/v3
wire status codes for the v3 reply path.  Two variants have
existed since the original kernel import: a strict allowlist
under CONFIG_LOCKD_V4 and a sentinel-translation form for the
!CONFIG_LOCKD_V4 build.  The split was never grounded in a
behavioural difference -- nlmsvc_testlock and nlmsvc_lock,
which feed cast_status, return the same set of values in both
configurations -- and recent xdrgen conversions have narrowed
the caller set further: nlm__int__stale_fh and nlm__int__failed
are now translated at their point of origin in nlm3svc_lookup_file,
and the cast_status wraps around nlmsvc_cancel_blocked and
nlmsvc_unlock have been dropped because those functions return
only wire codes.

Collapse the two variants into one.  The unified form keeps the
CONFIG_LOCKD_V4 arm's allowlist, retains the nlm__int__deadlock
translation, and folds anything else to nlm_lck_denied_nolocks
after a pr_warn_once so an unexpected sentinel from a future
refactor remains visible in the kernel log instead of being
silently passed to the wire encoder.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

lockd: Remove dead code from fs/lockd/xdr.c

All NLMv3 server-side procedures now dispatch through
xdrgen-generated encoder and decoder functions, leaving the
hand-written XDR processing in fs/lockd/xdr.c with no remaining
callers.

Remove fs/lockd/xdr.c, the fs/lockd/svcxdr.h header it included,
the Makefile entry, and the now-unused nlmsvc_decode_* /
nlmsvc_encode_* prototypes from fs/lockd/xdr.h. The structure
definitions and status code macros in xdr.h are retained as they
are still used by the client-side code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>