git.ipfire.org Git - thirdparty/kernel/linux.git/log

sched_ext: Sub-allocator over kernel-claimed BPF arena pages

Build a per-scheduler sub-allocator on top of pages claimed from the BPF
arena registered in the previous patch. Subsequent kernel-managed
arena-resident structures (e.g. per-CPU set_cmask cmask) carve their storage
from this pool.

scx_arena_pool_init() creates a gen_pool. scx_arena_alloc() returns the
kernel VA. On exhaustion, the pool grows by claiming more pages via
bpf_arena_alloc_pages_sleepable(). Chunks are added at the kernel-side
mapping address. Callers translate to the BPF-arena form themselves if
needed.

Allocations sleep (GFP_KERNEL) - they may grow the pool through vzalloc and
arena page allocation. All current consumers run from the enable path (after
ops.init() and the kernel-side arena auto-discovery, before validate_ops()),
where sleeping is fine.

scx_arena_pool_destroy() walks each chunk, returns outstanding ranges to the
gen_pool with gen_pool_free() and then calls gen_pool_destroy(). The
underlying arena pages are released when the arena map itself is torn down,
so the pool destroy doesn't free them explicitly.

v2: Switch scx_arena_alloc() to a loop. (Andrea)

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

sched_ext: Require an arena for cid-form schedulers

Upcoming patches will let the kernel place arena-resident scratch shared
with the BPF program (e.g. per-CPU set_cmask cmask) so the BPF side can
dereference it directly via __arena pointers, replacing the current
cmask_copy_from_kernel() probe-read loop. That requires each cid-form
scheduler to expose its arena to the kernel. Kernel- side accesses are
recovered by the per-arena scratch-page mechanism.

bpf_scx_reg_cid() walks the struct_ops member progs via
bpf_struct_ops_for_each_prog() and reads each prog's arena via
bpf_prog_arena(). The verifier enforces one arena per program, so each
member prog contributes at most one arena. All non-NULL contributions must
match and at least one member prog must use an arena. The map ref is held on
scx_sched and dropped on sched destroy. cpu-form schedulers (bpf_scx_reg)
are unchanged - no arena requirement.

Signed-off-by: Tejun Heo <tj@kernel.org>

macsec: fix replay protection at XPN lower-PN wrap

In macsec_post_decrypt(), when pn is U32_MAX, pn + 1 overflows u32 to 0
and the first branch never fires. If next_pn_halves.lower is also in the
upper half, pn_same_half(pn, lower) is true and the XPN else-if does not
fire either, leaving next_pn_halves unchanged. An attacker that captures
the legitimate frame carrying pn == 0xFFFFFFFF on an XPN association
can then replay it indefinitely, since lowest_pn never rises above
the captured pn and macsec_decrypt() reconstructs the same IV.

Extend the XPN else-if to also fire when pn + 1 wraps to 0, so receipt
of pn == U32_MAX advances next_pn_halves to (upper + 1, 0).

Fixes: a21ecf0e0338 ("macsec: Support XPN frame handling - IEEE 802.1AEbw")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/SYBPR01MB78813FD49E58F253B989F197AF012@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'arena_direct_access' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next into for-7.2

Merge tag 'bootconfig-fixes-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull bootconfig fix from Masami Hiramatsu:

- Fix buf leak in apply_xbc

* tag 'bootconfig-fixes-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/bootconfig: Fix buf leaks in apply_xbc

rds: filter RDS_INFO_* getsockopt by caller's netns

The RDS_INFO_* family of getsockopt(2) options reads several
file-scope global lists that are not per-netns:

  rds_sock_info / rds6_sock_info,
  rds_sock_inc_info / rds6_sock_inc_info        -> rds_sock_list
  rds_tcp_tc_info / rds6_tcp_tc_info            -> rds_tcp_tc_list
  rds_conn_info / rds6_conn_info,
  rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
  *_RETRANS_MESSAGES variants),
  rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
                                                -> rds_conn_hash[]

The handlers do not filter by the caller's network namespace.
rds_info_getsockopt() has no netns or capable() check, and
rds_create() has no capable() check, so AF_RDS is reachable from
an unprivileged user namespace. As a result, an unprivileged
caller in a fresh user_ns plus netns can read the bound address
and sock inode of every RDS socket on the host, the peer address
of incoming messages on every RDS socket on the host, the peer
address and TCP sequence numbers of every rds-tcp connection on
the host, and the peer address and RDS sequence numbers of every
RDS connection on the host.

The rds-tcp transport is reachable from a non-initial netns (see
rds_set_transport()), so a one-shot init_net gate at
rds_info_getsockopt() would deny legitimate per-netns visibility
to rds-tcp callers. Instead, filter at each handler by comparing
the netns of the caller's socket to the netns of the list entry,
or to rds_conn_net(conn) for connection paths. Only copy entries
whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
aggregate statistics and remain global.

Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
binds 127.0.0.1:4242 in init_net as root. A child process enters
a fresh user_ns plus netns and opens AF_RDS there, then calls
getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
child sees the init_net socket. After this change, the child
sees zero entries.

Drop the rds_sock_count, rds_tcp_tc_count, and rds6_tcp_tc_count
globals. v2 used them for the size precheck and lens->nr; v3
replaced the precheck with a per-ns count from a first pass over
the list, so the globals have no remaining readers. The matching
increments and decrements in rds_create()/rds_destroy_sock() and
rds_tcp_set_callbacks()/rds_tcp_restore_callbacks() go away with
them. Reported by the kernel test robot under clang W=1.

Suggested-by: Allison Henderson <achender@kernel.org>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Link: https://patch.msgid.link/20260520084236.2724349-1-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlabel: fix IPv6 unlabeled address add error handling

netlbl_unlhsh_add_addr6() always returned zero after
netlbl_af6list_add(), masking failures such as duplicate
IPv6 static label entries.

Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20260522022910.398416-1-zhaochenguang@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rds: annotate data-race around rs_seen_congestion

rs_seen_congestion is read in rds_poll() and written in rds_sendmsg()
and rds_poll() without any lock. Use READ_ONCE()/WRITE_ONCE() to
annotate these lockless accesses and silence KCSAN.

Reported-by: syzbot+fbf3648ae7f5bdb05c59@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a0f8d94.050a0220.6b33c.0000.GAE@google.com/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Allison Henderson <achender@kernel.org>
Tested-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260522011621.304470-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gpu: nova-core: vbios: remove unused rom_header field

This is only used during construction, so we can remove it.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-22-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: move constants and functions to be associated

Move constants and functions to be inside the impls of the types they
are related to. This makes it more obvious what each type and value is
for.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-21-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: drop redundant TryFrom import

This is unused.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-20-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: drop unused image wrappers

These are unused currently, and it is probably sufficient to just check
the type of BIOS image in the future.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-19-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: remove unnecessary fields in PciRomHeader

Remove unnecessary fields in PciRomHeader. This allows a simplification
to use `FromBytes` instead of reading fields piecemeal. A lot of these
checks were redundant as well since it checks the size of the `data`
first in `BiosImage`.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-18-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use let-else in Vbios::new

Improve readability by moving the success path outside of a nested
branch.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-17-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use single logical block for the FWSEC section

Currently, FWSEC takes the first image and the last image. Treat the
first FWSEC image and all following image data as one logical block
for building the final FWSEC image. This avoids explicitly tracking
two FWSEC images.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-16-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use the first PCI-AT image

Currently, PCI-AT takes the final image if multiple exist. Use the
first one instead, to match nouveau behaviour.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-15-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

net/sched: sch_ets: make cl->quantum lockless

cl->quantum does not need to be protected by RTNL or qdisc spinlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260522110356.1403343-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: igmp: annotate data-races around im->users

/proc/net/igmp walks IPv4 multicast memberships under RCU and
prints im->users without holding RTNL, while multicast join and leave
paths update the field while holding RTNL. Annotate this intentional
lockless snapshot with READ_ONCE() and the matching writers with
WRITE_ONCE().

Signed-off-by: Yuyang Huang <sigefriedhyy@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260522093906.39764-1-sigefriedhyy@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo()

ipv6_hop_jumbo() calls pskb_trim_rcsum(), which can change skb pointers.
Let's recompute nh pointer to make sure any change won't mess things up.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Justin Iurman <justin.iurman@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260522112013.12342-1-justin.iurman@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: exthdrs: refresh nh after handling HAO option

ip6_parse_tlv() caches skb_network_header(skb) in nh while walking
IPv6 TLVs.

ipv6_dest_hao() may call pskb_expand_head() for a cloned skb, which can
move the skb head and invalidate the cached network header pointer.
Refresh nh after ipv6_dest_hao() returns so any trailing padding or TLVs
are parsed from the current skb head.

This matches the existing pattern used in ip6_parse_tlv() after helpers
that can modify skb header storage.

Fixes: a831f5bbc89a ("[IPV6] MIP6: Add inbound interface of home address option.")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Co-developed-by: Luxing Yin <tr0jan@lzu.edu.cn>
Signed-off-by: Luxing Yin <tr0jan@lzu.edu.cn>
Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/7aba1debc2196189172499e5769802b026f8caf8.1779247873.git.zcliangcn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: nf_conntrack_ftp: avoid u16 overflows

get_port and try_number() parse comma-separated decimal values from FTP PORT
and EPRT commands into a u_int32_t array, but does not validate that each
value fits in a single octet. RFC 959 specifies that PORT parameters
are decimal integers in the range 0-255, representing the four octets
of an IP address followed by two octets encoding the port number.

Values exceeding 255 are silently accepted. In try_rfc959(), the raw
u32 values are combined via shift-and-OR to form the IP and port:

  cmd->u3.ip = htonl((array[0] << 24) | (array[1] << 16) |
                     (array[2] << 8) | array[3]);
  cmd->u.tcp.port = htons((array[4] << 8) | array[5]);

When array elements exceed 255, bits from one field bleed into adjacent
fields after shifting, producing IP addresses and port numbers that
differ from what the text representation suggests. For example,
"PORT 10,0,1,2,256,22" yields port (256<<8)|22 = 65558, truncated to
u16 = 22. This mismatch between the textual and computed values can
confuse network monitoring tools that parse FTP commands independently.

Ignore the command by returning 0 (no match) when any accumulated
value exceeds 255 so that no expectation is created.

Signed-off-by: Giuseppe Caruso <giuseppecaruso0990@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>

memblock: don't touch memblock arrays when memblock_free() is called late

When memblock_free() is called after memblock_discard() on architectures
that don't select ARCH_KEEP_MEMBLOCK, it tries to update memblock.reserved
that was already discarded and it causes use-after-free, for example

[    8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
[    8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
[    8.514775] Call Trace:
[    8.514775]  <TASK>
[    8.514775]  kasan_report+0xb2/0x1b0
[    8.514775]  memblock_isolate_range+0x4ac/0x650
[    8.514775]  memblock_phys_free+0xc4/0x190
[    8.514775]  housekeeping_late_init+0x257/0x280
[    8.514775]  do_one_initcall+0xaa/0x470
[    8.514775]  do_initcalls+0x1b4/0x1f0
[    8.514775]  kernel_init_freeable+0x4b5/0x550
[    8.514775]  kernel_init+0x1c/0x150
[    8.514775]  ret_from_fork+0x5dc/0x8e0
[    8.514775]  ret_from_fork_asm+0x1a/0x30
[    8.514775]  </TASK>

Make sure memblock_free() updates memblock.reserved only when called early
enough or when ARCH_KEEP_MEMBLOCK is enabled.

Reported-by: Waiman Long <longman@redhat.com>
Reported-by: Breno Leitao <leitao@debian.org>
Closes: https://lore.kernel.org/all/20260505051821.1107133-1-longman@redhat.com
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Breno Leitao <leitao@debian.org>
Fixes: 87ce9e83ab8b ("memblock, treewide: make memblock_free() handle late freeing")
Link: https://patch.msgid.link/20260513105122.502506-1-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Merge tag 'nf-26-05-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

Patches 7+8 fix a regression from 7.1-rc1. Everything else
is from 2.6.x to 5.3 releases.  There are additional known
issues with these patches (drive-by-findings in related code).

There are many old bugs all over netfilter and our ability to review
feature patches has come to a complete halt due to lack of time.
There are further security bugs that we cannot address
due to lack of time, maintainers and reviewers.

Other remarks: The xtables 32bit compat interface is already
off in many vendor kernels, the plan is to remove it soon.

1) Prevent RST packets with invalid sequence numbers from forcing TCP
   connections into the CLOSE state without a direction check.
   From Hamza Mahfooz.
2) Re-derive the TCP header pointer after skb_ensure_writable in
   synproxy_tstamp_adjust. Prevent use-after-free and invalid checksum
   updates caused by stale pointers during buffer expansion.
   From Chris Mason.
3) Fix a race condition causing keymap list corruption in conntracks gre/pptp
   helper.
4) Use raw_smp_processor_id() in xt_cpu to prevent splats under
   PREEMPT_RCU.
5) Disable netfilter payload mangling in user namespaces (nft_payload.c
   and nf_queue).
   TCP option mangling via nft_exthdr.c remains enabled.
   There will be followups here to restrict resp. revalidate
   headers.
6) Fix an out-of-bounds read in ebtables's compat_mtw_from_user function.
7) Use list_for_each_entry_rcu() to traverse fib6_siblings in
   nft_fib6_info_nh_uses_dev(). Ensure safe list walking under RCU.
8) Fix an out-of-bounds read in nft_fib_ipv6 caused by incorrect list
   traversal.
9) Add nft_fib_nexthop selftest to netfilter. Cover nexthop enumeration for
    single, group, and multipath route shapes.
    All three nft_fib6 fixes from Jiayuan Chen.
10) Fix destination corruption in shift operations when source and destination
    registers overlap.  Reject partial register overlap for all operations
    from control plane.  From Fernando Fernandez Mancera.

* tag 'nf-26-05-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix dst corruption in same register operation
  selftests: netfilter: add nft_fib_nexthop test
  netfilter: nft_fib_ipv6: handle routes via external nexthop
  netfilter: nft_fib_ipv6: walk fib6_siblings under RCU
  netfilter: ebtables: fix OOB read in compat_mtw_from_user
  netfilter: disable payload mangling in userns
  netfilter: xt_cpu: prefer raw_smp_processor_id
  netfilter: nf_conntrack_gre: fix gre keymap list corruption
  netfilter: synproxy: refresh tcphdr after skb_ensure_writable
  netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check
====================

Link: https://patch.msgid.link/20260522104257.2008-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'v7.1-rc5' into rdma.git for-next

For dependencies in the following patches

Resolve conflicts, use the goto labels from the rc tag.

* tag 'v7.1-rc5': (1526 commits)

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

isofs: replace __get_free_page() with kmalloc()

isofs_readdir() allocates a temporary buffer with __get_free_page().

kmalloc() is a better API for such use and it also provides better
scalability and more debugging possibilities.

Replace use of __get_free_page() with kmalloc().

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Link: https://patch.msgid.link/20260523-b4-fs-v1-11-275e36a83f0e@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>

quota: allocate dquot_hash with kmalloc()

dquot_init() allocates a single page for dquot_hash with
__get_free_pages().

kmalloc() is a better API for such use and it also provides better
scalability and more debugging possibilities.

Replace use of __get_free_pages() with kmalloc() and get rid of the order
variable that remained 0 for more than 20 years.

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Link: https://patch.msgid.link/20260523-b4-fs-v1-1-275e36a83f0e@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>

dm-inlinecrypt: add support for hardware-wrapped keys

Add support for hardware-wrapped encryption keys to the
dm-inlinecrypt target.

Introduce a new optional argument <key_type> to indicate
whether the provided key is a raw key or a hardware-wrapped
key. Based on this flag, the appropriate blk-crypto key type
is selected when initializing the key.

This allows dm-inlinecrypt to work with hardware that requires
keys to be wrapped and managed by the underlying inline
encryption engine.

Update the target argument parsing accordingly and pass the
key type to blk_crypto_init_key(). Documentation is also
updated to reflect the new parameter and usage.

Signed-off-by: Linlin Zhang <linlin.zhang@oss.qualcomm.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: e7f57d2c47e2 ("dm-inlinecrypt: add target for inline block device encryption")

RDMA/counter: Fix incorrect port index in rdma_counter_init() error cleanup

The error cleanup loop in rdma_counter_init() iterates with variable
'i' but accesses dev->port_data[port] instead of dev->port_data[i].
This causes the failed port's hstats to be freed multiple times while
leaking hstats of previously initialized ports.

Fixes: 56594ae1d250 ("RDMA/core: Annotate destroy of mutex to ensure that it is released as unlocked")
Link: https://patch.msgid.link/r/20260520104546.1776253-3-cuitao@kylinos.cn
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/counter: Fix num_counters leak on bind_qp failure in alloc_and_bind()

When __rdma_counter_bind_qp() fails in alloc_and_bind(), the error path
jumps to err_mode which frees the counter without decrementing
port_counter->num_counters. The only place that decrements is
rdma_counter_free(), which is unreachable since the counter was never
successfully bound.

This leak accumulates across repeated failures, permanently preventing
the port from switching to AUTO mode (-EBUSY in __counter_set_mode())
and blocking the MANUAL→NONE auto-revert in rdma_counter_free(). When
the mode was NONE before the call, the MANUAL mode set by
__counter_set_mode() also leaks since the revert logic is never
reached.

Add an err_bind label between the num_counters increment and the
existing err_mode label. It decrements num_counters and mirrors the
MANUAL→NONE revert from rdma_counter_free(), ensuring the port state
is fully restored on bind failure.

Link: https://patch.msgid.link/r/20260520104546.1776253-2-cuitao@kylinos.cn
Signed-off-by: Tao Cui <cuitao@kylinos.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

net/mlx5: HWS: Reject unsupported remove-header action

mlx5_cmd_hws_packet_reformat_alloc() handles
MLX5_REFORMAT_TYPE_REMOVE_HDR by looking up a matching HWS remove-header
action.

If mlx5_fs_get_action_remove_header_vlan() returns NULL, the code only
logs an error and continues. The function then returns success with a NULL
HWS action stored in the packet-reformat object.

Return an error when no matching remove-header action is available.

Fixes: aecd9d1020e3 ("net/mlx5: fs, add HWS packet reformat API function")
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260506000054.51797-1-prathameshdeshpande7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'arena_direct_access'

Tejun Heo says:

====================
This makes BPF arena memory directly dereferenceable from kernel code
(struct_ops callbacks, kfuncs). Each arena gets a per-arena scratch page
that an arch fault hook installs into empty PTEs on kernel-side faults,
after KFENCE. The faulting instruction retries and the violation is reported
through the program's BPF stream.

v4:
- Patch 1: note that the strict-zero cmpxchg is narrower than pte_none() in
  inline comments on both x86 and arm64. (Andrea)
- Patch 2: stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL via a
  new include/linux/bpf_defs.h. (lkp)
- Patch 7: scx_arena_alloc() retries via a loop instead of a single retry on
  pool growth. (Andrea)
- Picked up Reviewed-by tags from Emil and Andrea.

v3: https://lore.kernel.org/r/20260520235052.4180316-1-tj@kernel.org
v2: https://lore.kernel.org/r/20260517211232.1670594-1-tj@kernel.org
v1 (RFC): https://lore.kernel.org/r/20260427105109.2554518-1-tj@kernel.org

Motivation
----------

sched_ext's ops_cid.set_cmask() hands the BPF scheduler a struct scx_cmask
*. The kernel translates a kernel cpumask to a cmask, but it had no way to
write into the arena, so the cmask lived in kernel memory and was passed as
a trusted pointer. BPF cmask helpers all operate on arena cmasks though, so
the BPF side had to word-by-word probe-read the kernel cmask into an arena
cmask via cmask_copy_from_kernel() before any helper could touch it. It
works, but is clumsy.

The shape isn't unique to set_cmask. Sub-scheduler support is on the way and
more sched_ext callbacks will want to pass structured data to BPF. Anywhere
a kfunc or struct_ops callback wants to hand a struct to a BPF program,
arena residence is the natural answer.

Approach
--------

Each arena gets a per-arena scratch page. Arenas stay sparsely mapped as
today - PTEs are populated only for allocated pages. A new arch fault hook
(bpf_arena_handle_page_fault) is wired into x86 page_fault_oops() and arm64
__do_kernel_fault(), after KFENCE. When a kernel-side access faults inside
an arena's kern_vm range, the helper walks the stack to find the BPF program
responsible, range-checks the fault address against prog->aux->arena, and
atomically installs the scratch page into the empty PTE via the new
ptep_try_set() wrapper. The kernel instruction retries and reads/writes the
scratch page. Free paths and map destruction treat scratch as non-owned.
Real allocation refuses to overwrite scratch (apply_range_set_cb returns
-EBUSY). A scratched address stays dead until map destroy, since its
presence means the BPF program has already malfunctioned.

The mechanism is default behavior - no UAPI flag.

What this preserves
-------------------

All the debugging properties of today's sparse-PTE design are preserved:

* BPF programs still fault on unmapped arena accesses. The fault semantics
  (instruction retry with rdst = 0) and the violation report through
  bpf_streams are unchanged for prog-side accesses.

* The first kernel-side touch of an unmapped address is reported via
  bpf_streams the same way as a prog-side fault, with the stack walk
  attributing it to the originating prog.

* User-side fault on a never-scratched address still lazy-allocates a real
  page (or returns SIGSEGV under BPF_F_SEGV_ON_FAULT). User-side fault on a
  scratched address SIGSEGVs.

What changes for the kernel-side caller is just that an unmapped deref no
longer oopses - it retries through the scratch page and emits a violation
report. The same shape today's BPF instruction faults have.

Patches 1-2 (atomic PTE install + arena scratch-page recovery)
--------------------------------------------------------------

  mm: Add ptep_try_set() for lockless empty-slot installs
  bpf: Recover arena kernel faults with scratch page

Patches 3-5 (helpers used by struct_ops registration)
-----------------------------------------------------

  bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers
  bpf: Add bpf_struct_ops_for_each_prog()
  bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()
====================

Link: https://lore.kernel.org/bpf/20260522172219.1423324-1-tj@kernel.org/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ACPI: video: Switch over to devres-based resource management

Turn acpi_video_bus_remove_notify_handler() into a devm
action added by acpi_video_bus_probe() after calling
acpi_video_bus_add_notify_handler and use the newly introduced
devm_acpi_install_notify_handler() to install an ACPI notify
handler for the video bus device.

This replaces the rollback path remnant in acpi_video_bus_probe()
and allows acpi_video_bus_remove() to be dropped altogether.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2556320.jE0xQCEvom@rafael.j.wysocki

ACPI: video: Use devm for video->entry and backlight cleanup

Introduce acpi_video_bus_del() for removing the video bus object
from the video_bus_head list and unregistering backlight and make
acpi_video_bus_probe() add it as a devm action after adding the
video bus object to the video_bus_head list.

Accordingly, remove the code superseded by it from
acpi_video_bus_remove() and from the rollback path in
acpi_video_bus_probe().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2279582.Icojqenx9y@rafael.j.wysocki

ACPI: video: Use devm action for freeing video devices

Rename acpi_video_bus_put_devices() to devm_acpi_video_bus_get_devices()
and turn acpi_video_bus_put_devices() into a devm action added by it for
freeing the video devices allocated by it and the attached_array memory.

Accordingly, remove the acpi_video_bus_put_devices() calls and
attached_array freeing from acpi_video_bus_remove() and the rollback
path in acpi_video_bus_probe().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1932803.atdPhlSkOF@rafael.j.wysocki

ACPI: video: Use devm action for video bus object cleanup

Introduce acpi_video_bus_free() for freeing video bus object memory
and reversing changes related to it made during ACPI video bus device
probe, modify acpi_video_bus_probe() to add acpi_video_bus_free() as
a devm action, and remove the code superseded by it from
acpi_video_bus_remove().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3892168.MHq7AAxBmi@rafael.j.wysocki

ACPI: video: Rearrange probe and remove code

Rearrange some ACPI video bus probe and remove code so that it is more
clear that the probe and removal are carried in reverse orders, which
will also facilitate subsequent changes.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2276683.Mh6RI2rZIc@rafael.j.wysocki

ACPI: video: Reduce the number of auxiliary device dereferences

Store the &aux_dev->dev pointer in a separate local variable in
acpi_video_bus_probe() to avoid dereferencing aux_dev many times.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2707186.Lt9SDvczpP@rafael.j.wysocki

ACPI: PAD: Switch over to devres-based resource management

Use the newly introduced devm_acpi_install_notify_handler() for
installing an ACPI notify handler and since that function checks the
ACPI companion of the owner device against NULL internally, remove the
the explicit ACPI companion check from acpi_pad_probe().

However, to prevent the notify handler from running acpi_pad_idle_cpus()
with the number of idle CPUs greater than zero after acpi_pad_remove()
has returned, add a bool static variable for synchronization between
the two.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1964581.CQOukoFCf9@rafael.j.wysocki

ACPI: PAD: Fix teardown ordering in acpi_pad_remove()

The ACPI notify handler installed by acpi_pad_probe() needs to be
removed before calling acpi_pad_idle_cpus() in acpi_pad_remove()
so it doesn't schedule idle time injection on some CPUs again.

Fixes: 8e0af5141ab9 ("ACPI: create Processor Aggregator Device driver")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2064153.usQuhbGJ8B@rafael.j.wysocki

ACPI: PAD: Pass struct device pointer to acpi_pad_notify()

Use the struct device pointer to the dev member in the struct
platform_device object representing the platform device used for driver
binding as the last argument of acpi_dev_install_notify_handler() and
accordingly update acpi_pad_notify() to pass that pointer directly to
dev_name() when generating the netlink event.

Since the dev_name() value for an ACPI-enumerated platform device is the
same as the dev_name() value for the dev member of its ACPI companion
object, as per acpi_create_platform_device(), the above code modification
is not expected to cause functionality to change.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1862521.VLH7GnMWUR@rafael.j.wysocki

ACPI: PAD: Rearrange acpi_pad_notify()

Use an if () in acpi_pad_notify() instead of a switch () statement to
make the code somewhat easier to follow and reduce its indentation
level.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3345485.5fSG56mABF@rafael.j.wysocki

ACPI: thermal: Switch over to devres-based resource management

Switch over the ACPI thermal zone driver to devres-based resource
management by making the following changes:

* Turn acpi_thermal_zone_free() into a devm action added from
   acpi_thermal_probe() after allocating the struct acpi_thermal object.

* Rename acpi_thermal_unregister_thermal_zone() to
   acpi_thermal_zone_unregister(), add acpi_thermal_pm_queue flushing to
   it, and turn it into a devm action added by acpi_thermal_probe()
   after calling acpi_thermal_register_thermal_zone().

* Use the newly introduced devm_acpi_install_notify_handler() for
   installing an ACPI notify handler.

* Drop acpi_thermal_remove() that is not necessary any more.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3698719.iIbC2pHGDl@rafael.j.wysocki

ACPI: HED: Switch over to devres-based resource management

Use the newly introduced devm_acpi_install_notify_handler() for
installing an ACPI notify handler and since that function checks the
ACPI companion of the owner device against NULL internally, remove the
the explicit ACPI companion check from acpi_hed_probe().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/7950702.EvYhyI6sBW@rafael.j.wysocki

ACPI: HED: Refine guarding against adding a second instance

There can be only one ACPI hardware event device (HED) in use at a time,
so acpi_hed_probe() uses static variable hed_handle for guarding against
adding a second HED instance, but there is no reason for that variable
to hold an ACPI handle, so change it to a bool one.

While at it also set that variable at the end of acpi_hed_probe() to
avouid the need to clear it when installing the ACPI notify handler
fails.

Note that ACPI devices are enumerated sequentially, so there's no need
for additional locking around the accesses to that variable.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2042970.PYKUYFuaPT@rafael.j.wysocki

ACPI: battery: Switch over to devres-based resource management

The ACPI battery driver already uses devm_kzalloc() for allocating
memory and devm_mutex_init() for mutex initialization, but it still
carries out some manual rollback in acpi_battery_probe().

Switch it over to devres-based resource management completely by
making three changes:

* Rename acpi_battery_update_retry() to devm_acpi_battery_update_retry(),
   turn sysfs_battery_cleanup() into a devm action and modify the former
   to add it.

* Add devm_acpi_battery_init_wakeup() for initializing the wakeup
   source and make it add a custom devm action to automatically remove
   the wakeup source registered by it.

* Make acpi_battery_probe() use devm_acpi_install_notify_handler()
   that has just been introduced for installing an ACPI notify handler.

Note that the code ordering change related to the last of the above
changes does not matter because there is no functional dependency
between the PM notifier and the wakeup source or the ACPI notify
handler.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/10856906.nUPlyArG6x@rafael.j.wysocki

ACPI: AC: Switch over to devres-based resource management

Use devm_kzalloc() for allocating memory, devm_power_supply_register()
for registering a power supply class device and the newly introduced
devm_acpi_install_notify_handler() for installing an ACPI notify handler.

Note that the code ordering change related to the third of the above
modifications does not matter because there is no order dependency
between the battery notifier and the ACPI notify handler.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3422377.44csPzL39Z@rafael.j.wysocki

ACPI: NFIT: core: Use devm_acpi_install_notify_handler()

Now that devm_acpi_install_notify_handler() is available, use it in
acpi_nfit_probe() instead of a custom devm action removing an ACPI
notify handler installed via acpi_dev_install_notify_handler().

Also drop the explicit ACPI_COMPANION() check against NULL that is
not necessary any more becuase devm_acpi_install_notify_handler()
carries out an equivalent check internally and use ACPI_HANDLE() to
retrieve the platform device's ACPI handle.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3048737.e9J7NaK4W3@rafael.j.wysocki

ACPI: bus: Introduce devm_acpi_install_notify_handler()

Introduce devm_acpi_install_notify_handler() for installing an ACPI
notify handler managed by devres that will be removed automatically on
driver detach.

It installs the notify handler on the device object in the ACPI
namespace that corresponds to the owner device's ACPI companion, if
present (an error is returned if the owner device doesn't have an ACPI
companion).

Currently, there is no way to manually remove the notify handler
installed by it because none of its users brought on subsequently
will need to do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Kerneldoc comment refinement ]
Link: https://patch.msgid.link/2268031.irdbgypaU6@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

RDMA/hns: Fix log flood after cmd_mbox failure

hns_roce_cmd_mbox() is the command interface between driver and
hardware. When hardware is abnormal, the unlimited error printings
after hns_roce_cmd_mbox() failure will cause log flood and even
system crash.

Replace ibdev_err() and ibdev_warn() with their ratelimited versions
in the error handling path after hns_roce_cmd_mbox() (and its wrappers
hns_roce_create_hw_ctx/hns_roce_destroy_hw_ctx) fails.

Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://patch.msgid.link/r/20260520055759.2354037-4-huangjunxian6@hisilicon.com
Signed-off-by: Lianfa Weng <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/hns: Fix warning in poll cq direct mode

CQs allocated by ib_alloc_cq() always have a comp_handler. Though
in direct mode this handler is never expected to be called, it
is still called when the driver is reset, triggering the following
WARN_ONCE():

Call trace:
ib_cq_completion_direct+0x38/0x60
hns_roce_cq_completion+0x54/0x90 (hns_roce_hw_v2]
hns_roce_handle_device_err+Ox1c8/0x340 [hns_roce_hw_v2]
hns_roce_hw_v2_uninit_instance.constprop.0+0x34/0x70 [hns_roce_hw_v2]
hns_roce_hw_v2_reset_notify+0xc4/0xe0 [hns_roce_hw_v2]
hclge_notify_roce_client+0x60/0xbc [hclge]
hclge_reset_rebuild+0x48/0x34c [hclge]
hclge_reset_subtask+0xcc/0xec [hclge]
hclge_reset_service_task+0x80/0x160 [hclge]
hclge_service_task+0x50/0x80 (hclge]
process_one_work+0x1cc/0x4d0
worker_thread+0x154/0x414
kthread+0x104/0x144
ret_from_fork+0x10/0x18

Fixes: f295e4cece5c ("RDMA/hns: Delete unnecessary callback functions for cq")
Link: https://patch.msgid.link/r/20260520055759.2354037-3-huangjunxian6@hisilicon.com
Signed-off-by: Lianfa Weng <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

gpu: nova-core: vbios: construct `FwSecBiosImage` directly from BIOS images

`FwSecBiosBuilder` now only contains `falcon_ucode_offset` which just
gets passed directly into `FwSecBiosImage`. Remove `FwSecBiosBuilder`
and construct `FwSecBiosImage` directly, as a simplification.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-14-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: store PMU lookup entries in a KVVec

The current code copies the data into a KVec and parses it on demand. We
can simplify the code by storing the parsed entries.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-13-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: read PMU lookup entries using FromBytes

This simplifies the construction of `PmuLookupTableEntry` and is
allowed now that the driver can assume it is little endian.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-12-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: simplify setup_falcon_data

The code first computes `pmu_in_first_fwsec` or adjusts the offset and
then uses it in a branch just once to get the correct source for the PMU
table. This can be simplified to a single branch while also avoiding the
mutation of `offset`. Also, adjust the code after this to keep the
success case non-nested.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-11-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: compute FWSEC-relative Falcon data offset

Push the computation of the falcon data offset into a helper function.
The subtraction to create the offset should be checked, and by doing
this the check can be folded into the existing check in
`falcon_data_ptr`.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-10-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: keep PmuLookupTable local in setup_falcon_data

This does not need to be stored in `FwSecBiosBuilder` so we can remove
it from there, and just create and use it locally in
`setup_falcon_data`.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-9-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: drop unused falcon_data_offset from FwSecBiosBuilder

This is unused, so we can remove it.

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-8-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use checked accesses in `setup_falcon_data`

Use checked arithmetic for `ucode_offset` in `setup_falcon_data`. This
prevents a malformed firmware from causing a panic.

Fixes: dc70c6ae2441 ("gpu: nova-core: vbios: Add support to look up PMU table in FWSEC")
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-7-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use checked access in `FwSecBiosImage::header`

Use checked access in `FwSecBiosImage::header` for getting the header
version since the value is firmware derived.

Fixes: 47c4846e4319 ("gpu: nova-core: vbios: Add support for FWSEC ucode extraction")
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-6-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use checked ops and accesses in `FwSecBiosImage::ucode`

Use checked arithmetic and access for extracting the microcode since the
offsets are firmware derived.

Fixes: 47c4846e4319 ("gpu: nova-core: vbios: Add support for FWSEC ucode extraction")
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-5-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: read BitToken using FromBytes

If `header.token_size` is smaller than `BitToken`, then we currently can
read past the end of `image.base.data`. Use checked arithmetic for
computing offsets and simplify reading it in using `FromBytes`.

Fixes: dc70c6ae2441 ("gpu: nova-core: vbios: Add support to look up PMU table in FWSEC")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-4-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: avoid reading too far in read_more_at_offset

Fix bug where `read_more_at_offset` would unnecessarily read more data.
This happens when the window to read has some part cached and some part
not. It would read `len` bytes instead of just the uncached portion,
which could read past `BIOS_MAX_SCAN_LEN`.

Fixes: 6fda04e7f0cd ("gpu: nova-core: vbios: Add base support for VBIOS construction and iteration")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-3-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: use checked arithmetic for bios image range end

`read_bios_image_at_offset` is called with a length from the VBIOS
header, so we should be more defensive here and use checked arithmetic.

Fixes: 6fda04e7f0cd ("gpu: nova-core: vbios: Add base support for VBIOS construction and iteration")
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-2-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

gpu: nova-core: vbios: stop scanning at BIOS_MAX_SCAN_LEN

Current code lets `current_offset` go to `BIOS_MAX_SCAN_LEN` which is
one byte too far.

Fixes: 6fda04e7f0cd ("gpu: nova-core: vbios: Add base support for VBIOS construction and iteration")
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260525-fix-vbios-v5-1-e5e455251537@nvidia.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>

IB/mlx4: Fix refcount leak in add_port() error path

After kobject_init_and_add(), the lifetime of the embedded struct
kobject is expected to be managed through the kobject core reference
counting.

In add_port(), failure paths after kobject_init_and_add() must not free
struct mlx4_port directly, because the embedded kobject is then managed
by the kobject core. Freeing it directly leaves the kobject reference
counting unbalanced and can lead to incorrect lifetime handling.

Allocate the pkey and gid attribute arrays before kobject_init_and_add(),
so failures before kobject initialization can be handled by directly
freeing the allocated memory. Once kobject_init_and_add() has been
called, unwind later failures by removing any successfully created sysfs
groups, calling kobject_del(), and then releasing the embedded kobject
with kobject_put().

Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Link: https://patch.msgid.link/r/20260518021910.972900-1-lgs201920130244@gmail.com
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/rxe: Fix a use-after-free problem in rxe_mmap

rxe_mmap() removes a rxe_mmap_info struct from the pending_mmaps list
and releases pending_lock while the struct's kref is still at 1:

   list_del_init(&ip->pending_mmaps);
   spin_unlock_bh(&rxe->pending_lock);   /* ref == 1, no lock held */
   ret = remap_vmalloc_range(vma, ip->obj, 0);  /* walks PTEs */
   [...]
   rxe_vma_open(vma);                    /* kref_get, ref → 2 */
   remap_vmalloc_range_partial() walks PTEs without any lock.

A concurrent DESTROY_CQ ioctl on another CPU calls:

    kref_put(&q->ip->ref, rxe_mmap_release)   /* ref 1→0 */
    vfree(ip->obj)   /* clears vmalloc PTEs mid-walk */
    kfree(ip)        /* frees rxe_mmap_info */

This yields:

   1. Kernel crash, vmalloc_to_page() returns NULL when vfree wins the
   per-PTE race -> vm_insert_page(NULL) → GPF in validate_page_before_insert

   2. Page UAF, vmalloc_to_page() reads a stale PTE before vfree clears
   it. User VMA holds a PTE to a free'd page which might eventually get
   reallocated later by vmalloc which allows the attacker to get a clean
   page-level UAF.

   It is worth noting that even though a page-level UAF is possible given
   the strong primitive, it is statistically very difficult to achieve
   given the very short time window (after the last insert_page and before
   the kref_get).

The call trace are as below:

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
  CPU: 0 UID: 1000 PID: 413 Comm: poc Not tainted 7.0.0-rc5-dirty #28 PREEMPT(lazy)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:validate_page_before_insert+0x32/0x300
  Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 e8 93 b5 a3 ff 48 8d 7b 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 7b 02 00 00 4c 8b 63 08 31 ff 4d 89 e5 41 83 e5
  RSP: 0018:ffff88811b15f2f0 EFLAGS: 00000202
  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
  RBP: ffff88811b15f318 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881181eee00
  R13: 0000000000000000 R14: ffff8881181eee00 R15: ffff8881181eee20
  FS:  00007b1e000f76c0(0000) GS:ffff8884268e0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007b1e00a24ac0 CR3: 0000000116eb3000 CR4: 00000000000006f0
  Call Trace:
   <TASK>
   insert_page+0x8f/0x190
   ? __pfx_insert_page+0x10/0x10
   ? kasan_save_alloc_info+0x38/0x60
   vm_insert_page+0x2e7/0x400
   remap_vmalloc_range_partial+0x212/0x3e0
   remap_vmalloc_range+0x6e/0xb0
   ? __kasan_check_write+0x14/0x30
   rxe_mmap+0x2e9/0x5d0
   ib_uverbs_mmap+0x1ad/0x2c0
   __mmap_region+0x12c2/0x2ad0
   ? __pfx___mmap_region+0x10/0x10
   ? __sanitizer_cov_trace_switch+0x58/0xb0
   ? mas_prev_slot+0x360/0x39c0
   ? __sanitizer_cov_trace_switch+0x58/0xb0
   ? mas_next_slot+0x1e5b/0x2f40
   ? __sanitizer_cov_trace_cmp8+0x18/0x30
   ? unmapped_area_topdown+0x4dd/0x610
   ? kfree+0x1b1/0x440
   ? free_cpumask_var+0x16/0x30
   ? __kasan_slab_free+0x7d/0xa0
   ? __sanitizer_cov_trace_cmp8+0x18/0x30
   mmap_region+0x2e6/0x3c0
   do_mmap+0xa3e/0x12a0
   ? __pfx_do_mmap+0x10/0x10
   ? __kasan_check_write+0x14/0x30
   ? down_write_killable+0xba/0x160
   ? __pfx_down_write_killable+0x10/0x10
   ? __sanitizer_cov_trace_cmp4+0x16/0x30
   vm_mmap_pgoff+0x2d4/0x4a0
   ? __pfx_vm_mmap_pgoff+0x10/0x10
   ? fget+0x1bf/0x270
   ksys_mmap_pgoff+0x40c/0x690
   ? __sanitizer_cov_trace_const_cmp4+0x16/0x30
   ? __pfx_ksys_mmap_pgoff+0x10/0x10
   ? __kasan_check_write+0x14/0x30
   ? _raw_spin_trylock+0xbb/0x130
   ? __pfx__raw_spin_trylock+0x10/0x10
   __x64_sys_mmap+0x135/0x1e0
   x64_sys_call+0x1c14/0x2790
   do_syscall_64+0xd2/0x1050
   ? rcu_core+0x352/0x7d0
   ? rcu_core_si+0xe/0x20
   ? handle_softirqs+0x1aa/0x650
   ? __sanitizer_cov_trace_cmp4+0x16/0x30
   ? fpregs_assert_state_consistent+0xe1/0x160
   ? irqentry_exit+0xb1/0x670
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://patch.msgid.link/r/20260515002537.6209-1-yanjun.zhu@linux.dev
Reported-and-tested-by: nasm <n4sm@protonmail.com>
Suggested-by: nasm <n4sm@protonmail.com>
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

firmware: zynqmp: Add dynamic CSU register discovery and sysfs interface

Add support for dynamically discovering and exposing Configuration
Security Unit (CSU) registers through sysfs. Leverage the existing
PM_QUERY_DATA API to discover available registers at runtime, making
the interface flexible and maintainable.

Key features:
- Dynamic register discovery using PM_QUERY_DATA API
  * PM_QID_GET_NODE_COUNT: Query number of available registers
  * PM_QID_GET_NODE_NAME: Query register names by index
- Automatic sysfs attribute creation under csu_registers/ group
- Read operations via existing IOCTL_READ_REG API
- Write operations via existing IOCTL_MASK_WRITE_REG API

The sysfs interface is created at:
  /sys/devices/platform/firmware:zynqmp-firmware/csu_registers/

Currently supported registers include:
  - multiboot (CSU_MULTI_BOOT)
  - idcode (CSU_IDCODE, read-only)
  - pcap-status (CSU_PCAP_STATUS, read-only)

The dynamic discovery approach allows firmware to control which
registers are exposed without requiring kernel changes, improving
maintainability and security.

The firmware does not currently expose per-register access mode
information, so the kernel cannot distinguish read-only registers
from read-write ones at discovery time. All discovered registers are
therefore created with sysfs mode 0644, and the firmware is
responsible for rejecting writes to registers it treats as read-only
(for example idcode and pcap-status); that error is propagated back
to userspace from the store callback. If a per-register access-mode
query is added to the firmware in the future, sysfs permissions can
be tightened to match.

CSU register discovery is an optional feature: on firmware that lacks
support for PM_QID_GET_NODE_COUNT or PM_QID_GET_NODE_NAME, the probe
returns gracefully without exposing any sysfs entries. To keep the
memory footprint minimal on that path, partial devm allocations made
during discovery are explicitly released on failure so that no memory
lingers until device unbind when the feature is unavailable.

Signed-off-by: Ronak Jain <ronak.jain@amd.com>
Signed-off-by: Michal Simek <michal.simek@amd.com>
Link: https://lore.kernel.org/r/20260520093654.3303917-3-ronak.jain@amd.com

Documentation: ABI: add sysfs interface for ZynqMP CSU registers

Document the new sysfs interface that exposes Configuration Security
Unit (CSU) registers through the zynqmp-firmware driver.

The interface is available under:

  /sys/devices/platform/firmware:zynqmp-firmware/csu_registers/

The CSU registers are discovered at boot time using the PM_QUERY_DATA
firmware API. The following registers are currently supported:

  - multiboot     (CSU_MULTI_BOOT)
  - idcode        (CSU_IDCODE, read-only)
  - pcap-status   (CSU_PCAP_STATUS, read-only)

Read operations use the existing IOCTL_READ_REG firmware interface,
while write operations use IOCTL_MASK_WRITE_REG.

Access control is enforced by the firmware. Write attempts to
read-only registers are rejected by firmware even though the sysfs file
permissions allow writes.

Document the ABI entry accordingly.

Signed-off-by: Ronak Jain <ronak.jain@amd.com>
Signed-off-by: Michal Simek <michal.simek@amd.com>
Link: https://lore.kernel.org/r/20260520093654.3303917-2-ronak.jain@amd.com

RDMA/irdma: Fix out-of-bounds write in irdma_copy_user_pgaddrs

The irdma_copy_user_pgaddrs function loops through all of the umem DMA
blocks to populate the PBLEs and will stop when either the last DMA
block is reached or palloc->total_cnt is reached. The issue is that
the logic for checking palloc->total_cnt would only work for non-zero
values.

When irdma_setup_pbles is called with lvl==0, it
calls irdma_copy_user_pgaddrs with palloc->total_cnt==0, which means
the only way to break out of the loop is to reach the last umem DMA
block, which means it could end up going beyond the fixed size of 4
iwmr->pgaddrmem array that is used in the lvl==0 case.

In the case of QP/CQ/SRQ rings, the value of lvl is determined by a
separate input (for example, req.cq_pages in the case of a CQ). So,
we must perform explicit checking to ensure we don't overflow the
pgaddrmem array if the user provides a umem that consists of more
blocks than their provided req.cq_pages.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://patch.msgid.link/r/20260512183852.614045-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5

Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

hpfs: fix a crash if hpfs_map_dnode_bitmap fails

If hpfs_map_dnode_bitmap fails, the code would call hpfs_brelse4 on
uninitialized quad buffer head, causing a crash.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Farhad Alemi <farhad.alemi@berkeley.edu>
Cc: stable@vger.kernel.org

ASoC: qcom: q6asm-dai: fix error handling

Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> says:

Here is the set of patches, that fixes one of the isssue reported by
Richard Acayan, while doing fix for the reported issue, found various
other issues in the existing code.

This set contains some of those cleanups along with few trivial coding
style patches which looked uncomfortable to read.

Patch 1 should be enough to fix the issue reported.

Tested this is on UNO-Q.

Link: https://patch.msgid.link/20260518092347.3446946-1-srinivas.kandagatla@oss.qualcomm.com

ASoC: qcom: q6asm-dai: use pointer type with kzalloc_obj()

Use kzalloc_obj(*prtd) instead of explicitly naming the structure type.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260518092347.3446946-6-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: qcom: q6asm-dai: remove unnecessary braces

The ASM_CLIENT_EVENT_DATA_WRITE_DONE case does not declare any local
variables or require a separate scope, so drop the unnecessary braces.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260518092347.3446946-5-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: qcom: q6asm-dai: fix error handling in prepare and set_params

Fix error handling in q6asm_dai_compr_set_params() and q6asm_dai_prepare()
for both CMD_CLOSE and q6asm_unmap_memory_regions().

In both the functions, we are doing q6asm_audio_client_free in failure
cases, which means if prepare or set_params fail, we can never recover.
Now open and close are done in respective dai_open/close functions.

Fixes: 2a9e92d371db ("ASoC: qdsp6: q6asm: Add q6asm dai driver")
Cc: Stable@vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260518092347.3446946-4-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: qcom: q6asm-dai: close stream only when running

q6asm_dai_close() and q6asm_dai_compr_free() currently issue CMD_CLOSE
whenever prtd->state is non-zero.

After prepare() closes an existing stream, the state is updated to
Q6ASM_STREAM_STOPPED. Since this state is also non-zero, the close and
free paths can send CMD_CLOSE again for a stream that has already been
closed.

Restrict CMD_CLOSE to the Q6ASM_STREAM_RUNNING state so the command is
sent only when the ASM stream is still active.

Fixes: 2a9e92d371db ("ASoC: qdsp6: q6asm: Add q6asm dai driver")
Cc: Stable@vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260518092347.3446946-3-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: qcom: q6asm-dai: do not set stream state in event and trigger callbacks

The q6asm-dai stream state is used by prepare() to decide whether an
existing stream setup needs to be closed before opening/configuring a new
one. Updating the state from trigger or asynchronous DSP callbacks can make
that state stale or incorrect relative to the actual setup lifetime.

In particular, setting Q6ASM_STREAM_STOPPED on STOP or EOS completion can
make prepare() believe there is no active setup to close, which can result
in opening/configuring the same stream more than once.

Keep stream state updates tied to prepare(), where the stream is actually
closed and reopened, and stop changing it from trigger and EOS callbacks.

Fixes: bfbb12dfa144 ("ASoC: qcom: q6asm-dai: perform correct state check before closing")
Cc: Stable@vger.kernel.org
Closes: https://lore.kernel.org/all/afS7rTHdc9TyIeLx@rdacayan/
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260518092347.3446946-2-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: max98090: switch to standard set_jack callback

Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> says:

The MAX98090 codec driver currently exposes a custom
max98090_mic_detect() helper for machine drivers to register a headset
jack.

This series converts the driver to use the standard component
.set_jack callback and updates the mt8173-max98090 machine driver to use
snd_soc_component_set_jack() instead of the codec-specific helper.

Using the standard callback removes the need for a custom exported
symbol and allows machine drivers to use the common ASoC jack
registration interface. This also improves compatibility with machine
drivers, such as Qualcomm platforms, that already rely on
snd_soc_component_set_jack().

Link: https://patch.msgid.link/20260520155002.145306-1-srinivas.kandagatla@oss.qualcomm.com

ASoC: codecs: max98090: use component set_jack callback

The MAX98090 driver provides a custom max98090_mic_detect() helper for
machine drivers to register a jack.

This can be implemented using the standard component set_jack callback
instead. Doing so allows machine drivers to use
snd_soc_component_set_jack(), which is also the interface used by
machine drivers including Qualcomm ones.

Convert max98090_mic_detect() to a component set_jack callback and remove
the exported helper.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260520155002.145306-3-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mt8173-max98090: use standard callback to set jack

use snd_soc_component_set_jack() instead of custom callback to
max98090 codec.

This will help other drivers using the standard callback to exercise
the standard path instead of custom callback.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20260520155002.145306-2-srinivas.kandagatla@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: soc-core: Add core support for ignoring suspend on selected DAPM widgets

Chancel Liu <chancel.liu@nxp.com> says:

Some audio systems require specific DAPM widgets to remain powered
during system suspend. Introduce a generic and reusable mechanism in
the ASoC core to mark selected DAPM widgets as ignore_suspend.

The unified mechanism consists of two parts:
1. Parse and store the name list of widgets to ignore suspend in
struct snd_soc_card

The list of widgets can be provided either by the machine driver or
parsed from Device Tree. Different machines have different routing and
power requirements. Each machine can specify its own widgets to ignore
suspend through DT property. It enables flexible policy without hard
code. A new helper, snd_soc_of_parse_ignore_suspend_widgets() is added
for this purpose.

2. Apply ignore_suspend flags during snd_soc_bind_card()

After all components have been probed and all DAPM widgets have been
registered, snd_soc_bind_card() performs a unified lookup of the
configured widget names across all DAPM contexts of the card and marks
the matching widgets with ignore_suspend = 1.

Switch to use core ignore-suspend-widgets support for imx-rpmsg driver.

Chancel Liu (3):
  ASoC: dapm: Fix widget lookup with prefixed names across DAPM contexts
  ASoC: soc-core: Add core support for ignoring suspend on selected DAPM
    widgets
  ASoC: fsl: imx-rpmsg: Switch to core ignore-suspend-widgets support

Link: https://patch.msgid.link/20260507013654.2945915-1-chancel.liu@nxp.com

ASoC: fsl: imx-rpmsg: Switch to core ignore-suspend-widgets support

The imx-rpmsg machine driver currently implements its own logic to
parse ignore-suspend-widgets from Device Tree and manually traverse
DAPM widgets to mark them as ignore_suspend.

It also has a potential issue that some widgets listed in the property
(e.g. "Headphone Jack") belong to card or CPU DAI DAPM context.

Switch to use snd_soc_of_parse_ignore_suspend_widgets() with the
introduction of a generic ignore-suspend-widgets mechanism in the ASoC
core.

Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Link: https://patch.msgid.link/20260507013654.2945915-4-chancel.liu@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: soc-core: Add core support for ignoring suspend on selected DAPM widgets

Some audio systems require specific DAPM widgets to remain powered
during system suspend. Introduce a generic and reusable mechanism in
the ASoC core to mark selected DAPM widgets as ignore_suspend.

The unified mechanism consists of two parts:
1. Parse and store the name list of widgets to ignore suspend in
struct snd_soc_card

The list of widgets can be provided either by the machine driver or
parsed from Device Tree. Different machines have different routing and
power requirements. Each machine can specify its own widgets to ignore
suspend through DT property. It enables flexible policy without hard
code. A new helper, snd_soc_of_parse_ignore_suspend_widgets() is added
for this purpose.

2. Apply ignore_suspend flags during snd_soc_bind_card()

After all components have been probed and all DAPM widgets have been
registered, snd_soc_bind_card() performs a unified lookup of the
configured widget names across all DAPM contexts of the card and marks
the matching widgets with ignore_suspend = 1.

Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Link: https://patch.msgid.link/20260507013654.2945915-3-chancel.liu@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dapm: Fix widget lookup with prefixed names across DAPM contexts

Currently dapm_find_widget() manually constructs a prefixed widget name
based on the provided DAPM context and compares it using strcmp(). This
happens to work in most cases because callers usually know which DAPM
context the target widget belongs to and pass in the matching DAPM
context.

However, this assumption breaks when search_other_contexts is enabled.
In such cases, callers may intentionally pass a different DAPM context,
while searching for a widget that actually belongs to another DAPM
context.

For example, when searching for a "DAC" widget, the widget belongs to
the codec DAPM and be registered with a codec prefix, while the caller
passes card->dapm and intends to search across all DAPM contexts. The
current implementation incorrectly applies the caller card DAPM causing
the lookup to fail even though the widget exists on the card.

Improve the matching strategy to support both use cases:
1. When the caller provides a fully qualified name with prefix, perform
   exact string matching. This preserves the ability to use prefixes for
   disambiguation.
2. When the caller provides a bare widget name without prefix, try exact
   matching first, then fall back to prefix-stripped comparison using
   snd_soc_dapm_widget_name_cmp().

To determine whether the pin name includes a prefix, a new helper
function snd_soc_dapm_pin_has_prefix() is introduced. It checks if the
pin name starts with any known component prefix on the card.

This fixes widget lookup failures when searching across different DAPM
contexts while maintaining backward compatibility for explicitly
prefixed lookups.

Fixes: ae4fc532244b ("ASoC: dapm: use component prefix when checking widget names")
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Assisted-by: Cody:Claude-4.5-Sonnet
Link: https://patch.msgid.link/20260507013654.2945915-2-chancel.liu@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: omap2-mcspi: Use of_device_get_match_data()

Use of_device_get_match_data() to fetch platform match data directly
instead of open-coding an of_match_device() lookup.

This also lets the driver drop the of_device.h include.

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260519004352.627148-1-rosenp@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: bytcht_es8316: Fix MCLK leak on init errors

byt_cht_es8316_init() enables MCLK before configuring the codec sysclk
and creating the headset jack. If either of those later steps fails, the
function returns without disabling MCLK, leaving the clock enabled after
card registration fails.

Track whether this driver enabled MCLK and disable it on the init error
paths. Add the matching DAI link exit callback so the same clock enable
is also balanced when ASoC cleans up a successfully initialized link.

Fixes: a03bdaa565cb ("ASoC: Intel: add machine driver for BYT/CHT + ES8316")
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260519-asoc-bytcht-es8316-mclk-leak-v1-1-b4a11cdc2afd@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

drm/i915/dp: Detect changes in common link parameters

Detect DPRX capability changes without a long HPD or RX_CAP_CHANGED
signal and queue a corresponding link params reset.

Besides detecting the above unexpected capability changes, this also
avoids races between queuing and handling a deferred link params reset.

v2: (Ville)
- Query/set intel_dp::reset_link_params instead of using helpers for
these.
- Assert matching types for old/new common rate elements as well.
- Add TODO: for adding a struct tracking both rates and number of rates.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260522160514.2628249-5-imre.deak@intel.com

drm/i915/dp: Cache max common lane count

Cache the maximum common lane count together with the common link
rates.

This is safe because the cached value is updated:
- during driver probe, before the connector is registered and can be
used for mode validation or modesetting
- during resume, before output HW state readout can query it
- during connector detection, right after updating the sink/link
capabilities

Caching the value allows detecting max common lane count changes in
a follow-up change and keeps the tracking of max common lane count
aligned with that of common rates.

Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260522160514.2628249-4-imre.deak@intel.com

drm/i915/dp: Add helper to set common link params

Add intel_dp_set_common_link_params() to prepare for updating the
maximum common lane count together with the common rates.

Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260522160514.2628249-3-imre.deak@intel.com

drm/i915/dp: Reset link params after a DPRX capability change

There is no reason to distinguish between DPRX capability changes
signaled via a long HPD and via an RX_CAP_CHANGED HPD IRQ.

Both cases result in reading out the DPRX capabilities and updating the
corresponding sink and common capabilities cached in intel_dp, however
only the long HPD resets the link training/recovery state and MST link
probe parameters correspondingly. The link training/recovery state may
contain reduced maximum link rate/lane count values left over from a
previous link training failure.

Based on the above after an RX_CAP_CHANGED increased the link rate, lane
count parameters the maximum link rate/lane count in the link
training/recovery state may remain below these, leaving the newly added
valid configurations unavailable for subsequent modesets in an
inconsistent way.

Handle RX_CAP_CHANGED IRQs the same way as long HPDs and reset the link
recovery state and MST link probe parameters in that case as well.

v2: Set intel_dp::reset_link_params instead of using a helper for this.
(Ville).

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260522160514.2628249-2-imre.deak@intel.com

m68k: defconfig: Update defconfigs for v7.1-rc1

  - Drop CONFIG_MPLS_IPTUNNEL=m (depends on LWTUNNEL, which is no longer
    auto-enabled since since commit 309b905deee59561 ("ipv6: convert
    CONFIG_IPV6 to built-in only and clean up Kconfigs")),
  - Drop CONFIG_HID_ITE=n and CONFIG_HID_REDRAGON=n (disabled by default
    since commit 3d39be2a76d1dfed ("HID: drop 'default !EXPERT' from
    tristate symbols")),
  - Enable modular build of the CMAC, MD5, SHA-512, and SHA-3 algorithms
    (no longer auto-enabled since commits 4c1c07820a0e4d82 ("smb:
    client: Remove obsolete cmac(aes) allocation"), 7aa0f56d4b48fb1a
    ("scsi: iscsi_tcp: Remove unneeded selections of CRYPTO and
    CRYPTO_MD5"), commit 4061bc8c03975e64 ("crypto: rng - Don't pull in
    DRBG when CRYPTO_FIPS=n"), resp. ce260754bb435aea ("crypto:
    jitterentropy - Use SHA-3 library")),
  - Drop CONFIG_CRYPTO_DRBG_HASH=y and CONFIG_CRYPTO_DRBG_CTR=y (depend
    on CRYPTO_DRBG_MENU, which is no longer auto-enabled since commit
    4061bc8c03975e64 ("crypto: rng - Don't pull in DRBG when
    CRYPTO_FIPS=n")),
  - Enable modular build of all CRC functions and crypto library code
    for KUnit tests,
  - Enable benchmarking in the (modular) string functions KUnit test,
  - Enable modular build of the new test module for stress/performance
    analysis of workqueue.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patch.msgid.link/d20ee047f2133570673e108d1ffb0c6400a2e240.1777290987.git.geert@linux-m68k.org

drm/hyperv: validate VMBus packet size in receive callback

hyperv_receive_sub() reads msg->vid_hdr.type and dispatches into one
of four message-type branches without knowing how many bytes the host
wrote into hv->recv_buf. The completion path then runs
memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE), so the consumer that
wakes on wait_for_completion_timeout() can read up to 16 KiB of
residue from a prior message as if it were the response payload.

Pass bytes_recvd into hyperv_receive_sub() and reject any packet that
does not cover the pipe + synthvid header. A single switch on
msg->vid_hdr.type then computes the type-specific payload size: the
three completion-driving types (SYNTHVID_VERSION_RESPONSE,
SYNTHVID_RESOLUTION_RESPONSE, SYNTHVID_VRAM_LOCATION_ACK) fall through
to a shared exit that requires that size before memcpy/complete, while
SYNTHVID_FEATURE_CHANGE validates its own payload and returns before
reading is_dirt_needed. Unknown types are dropped.

SYNTHVID_RESOLUTION_RESPONSE is variable length: the host fills
resolution_count entries, not the full SYNTHVID_MAX_RESOLUTION_COUNT
array. Validate the fixed prefix first so resolution_count can be
read, bound it against the array, then require only the count-sized
array, so the shorter responses the host actually sends are accepted.

Only run the sub-handler when vmbus_recvpacket() returned success. The
memcpy length is bytes_recvd, which is bounded by VMBUS_MAX_PACKET_SIZE
only on a successful receive; on -ENOBUFS vmbus_recvpacket() instead
reports the required length, which can exceed hv->recv_buf, so copying
bytes_recvd would read and write past the 16 KiB buffers. Gating on the
success return keeps the copy bounded. The nonzero-return path is itself
a malformed-message case and is now logged rather than silently skipped;
channel recovery is not attempted.

Rejected packets are reported via drm_err_ratelimited() rather than
silently dropped, matching the CoCo-hardened pattern in
hv_kvp_onchannelcallback().

Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Cc: stable@vger.kernel.org # 5.14+
Signed-off-by: Berkant Koc <me@berkoc.com>
Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Link: https://patch.msgid.link/8200dbc199c7a9b75ac7e8af6c748d2189b5ebd5.1779542874.git.me@berkoc.com

drm/hyperv: validate resolution_count and fix WIN8 fallback

A SYNTHVID_RESOLUTION_RESPONSE with resolution_count > 64 walks past
the supported_resolution[SYNTHVID_MAX_RESOLUTION_COUNT] array in the
parse loop. Bound resolution_count against the array size, folded
into the existing zero-check.

When the WIN10 resolution probe fails, the caller in
hyperv_connect_vsp() left hv->screen_*_max / preferred_* unpopulated,
which sets mode_config.max_width / max_height to 0 and makes
drm_internal_framebuffer_create() reject every userspace framebuffer
with -EINVAL. The pre-WIN10 branch had the same gap for
preferred_width / preferred_height. Use a single post-probe fallback
guarded by screen_width_max == 0 so both paths converge on the WIN8
defaults.

Signed-off-by: Berkant Koc <me@berkoc.com>
Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Cc: stable@vger.kernel.org # 5.14+
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Link: https://patch.msgid.link/6945b22419c7d404b4954a113de2ac9c900dba93.1779542874.git.me@berkoc.com

ASoC: add shared BCLK rate constraint for cross-DAI coordination

Troy Mitchell <troy.mitchell@linux.spacemit.com> says:

On some SoCs (e.g. SpacemiT K3), multiple I2S controllers share the
same physical BCLK. When one controller is already streaming, the
others must use hw_params that result in the same BCLK rate, otherwise
the shared clock would be reconfigured and corrupt the active stream.

This series adds framework-level support for this constraint:

Patch 1 adds the dt-bindings for the spacemit,k3-i2s compatible.
The K3 SoC uses the same I2S IP as K1 but requires additional clocks:
a dedicated sysclk_div, along with c_sysclk and c_bclk which are
shared across multiple I2S controllers.

Patch 2 adds a DEFINE_GUARD wrapping snd_soc_card_mutex_lock() and
snd_soc_card_mutex_unlock() so that scope-based locking picks up the
SND_SOC_CARD_CLASS_RUNTIME lockdep subclass.

Patch 3 adds the constraint logic in soc-pcm.c. During PCM open,
every DAI that has a bclk clock pointer gets a hw_rule registered
unconditionally. The rule callback runs at hw_refine time: it scans
the card for an active peer sharing the same physical BCLK (via
clk_is_match()) that has already completed hw_params, then constrains
the current stream's rate to match the established BCLK rate. The
first DAI to complete hw_params is unconstrained; subsequent DAIs
must match. Two modes are supported:

  - Default (I2S): BCLK = rate * channels * sample_bits. The rule
    derives the valid rate range from the current channel and
    sample_bits intervals.

  - Explicit ratio (TDM): if the driver sets dai->bclk_ratio
    (e.g. slots * slot_width), the rule computes the single valid
    rate as active_bclk_rate / bclk_ratio.

This series was prompted by review feedback on the SpacemiT K3 I2S
series, where a vendor-specific fixed-sample-rate property was rejected
in favor of a generic framework solution:
https://lore.kernel.org/all/afFqgF6ZRwYdfUmL@sirena.co.uk/

Link: https://patch.msgid.link/20260522-i2s-same-blk-v4-0-a71a86faaa20@linux.spacemit.com

ASoC: soc-pcm: constrain hw_params when DAIs share the same BCLK

When multiple CPU DAIs on the same sound card share the same physical
BCLK, add a hw_rule during PCM open that constrains the sample rate so
the resulting BCLK rate stays consistent across all sharing DAIs.

The rule callback scans all DAIs on the card at hw_refine time, looking
for an active peer that shares the same physical BCLK (via
clk_is_match()) and has already completed hw_params (checked via
dai->symmetric_rate != 0). This ensures the constraint uses the real
BCLK rate established by the peer's clk_set_rate() in hw_params, not a
stale boot-time default.

The first DAI to complete hw_params is unconstrained (no active peer
yet); subsequent DAIs are constrained to match.

The rule supports two modes:
- If the DAI has an explicit bclk_ratio set (e.g. for TDM where
  BCLK = rate * slots * slot_width), the rate is constrained to
  active_bclk_rate / bclk_ratio.
- Otherwise, the default formula BCLK = rate * channels * sample_bits
  is used to derive the valid rate range.

The constraint is purely additive: DAIs that do not set a bclk clock
pointer are completely unaffected.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://patch.msgid.link/20260522-i2s-same-blk-v4-3-a71a86faaa20@linux.spacemit.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: soc-pcm: add DEFINE_GUARD for snd_soc_card_mutex

Define a guard class wrapping snd_soc_card_mutex_lock() and
snd_soc_card_mutex_unlock() so that scope-based locking can be used
while still picking up the SND_SOC_CARD_CLASS_RUNTIME lockdep subclass.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://patch.msgid.link/20260522-i2s-same-blk-v4-2-a71a86faaa20@linux.spacemit.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: soc-dai: add shared BCLK clock for cross-DAI rate constraints

Add a bclk field to struct snd_soc_dai and a helper function
snd_soc_dai_set_bclk_clk() that platform drivers can use to declare
which clock is their BCLK.

Also cache the bclk_ratio in snd_soc_dai_set_bclk_ratio() so that
the framework can use it later in hw_rule evaluation for TDM
configurations where BCLK = rate * slots * slot_width.

When multiple DAIs on the same card share the same physical BCLK
(detected via clk_is_match()), the ASoC core can automatically
constrain their hw_params so that the resulting BCLK rates are
compatible. This commit adds the data structure support; the actual
constraint logic follows in the next patch.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://patch.msgid.link/20260522-i2s-same-blk-v4-1-a71a86faaa20@linux.spacemit.com
Signed-off-by: Mark Brown <broonie@kernel.org>

drm/i915/psr: Allow SCL=0 on platforms with always-on VRR TG

For Legacy timing generator, if there are no panel replay/sel_update or
other SRD constraints, the Set context latency (SCL) window should be
at least 1.

However, for VRR timing generator the SCL window can be 0. It has other
guardband constraints, but that are checked during guardband computation.

Allow SCL to be 0 for platforms that have VRR TG always on.

Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patch.msgid.link/20260517142753.2813959-3-ankit.k.nautiyal@intel.com

drm/i915/psr: Simplify the conditions for SCL computation

'needs_sel_update' is common for both display version branches, so check it
once and keep the version specific checks as separate early returns.

v2: Split into separate early returns. (Jani)

Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patch.msgid.link/20260517142753.2813959-2-ankit.k.nautiyal@intel.com

soc: renesas: Identify R-Car R8A779MD M3Le SoC

Add support for identifying the R-Car M3Le (R8A779MD) SoC.

The Renesas R-Car R8A779MD M3Le SoC is a variant of the already
supported R-Car M3-N SoC with reduced peripherals.
Enable support for the M3Le SoC through already existing ARCH_R8A77965
configuration symbol. PRR reads 0x67c05501.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Link: https://patch.msgid.link/20260504144534.43745-6-marek.vasut+renesas@mailbox.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>