git.ipfire.org Git - thirdparty/linux.git/log

smb/server: implement FSCTL_SET_COMPRESSION ioctl handler

Example:

  1. client: smbinfo setcompression no /mnt/file
  2. client: smbinfo getcompression /mnt/file
             Compression: 0 (NONE)
  3. client: smbinfo setcompression lznt1 /mnt/file
  4. client: smbinfo getcompression /mnt/file
             Compression: 2 (LZNT1)
  5. client: smbinfo setcompression default /mnt/file
  6. client: smbinfo getcompression /mnt/file
             Compression: 2 (LZNT1)

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/server: implement FSCTL_GET_COMPRESSION ioctl handler

Example:

  1. server: chattr +c /export/file
  2. client: smbinfo getcompression /mnt/file
             Compression: 2 (LZNT1)

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/server: get compression file attribute on open

Example:

  1. server: chattr +c /export/file
  2. client: lsattr /mnt/file
             --------c------------- /mnt/file

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: move compression definitions into common/fscc.h

These definitions will also be used by ksmbd, move them into
common header file.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: remove duplicate server/smbfsctl.h

Rename the following places:

  - FSCTL_COPYCHUNK -> FSCTL_SRV_COPYCHUNK
  - FSCTL_COPYCHUNK_WRITE -> FSCTL_SRV_COPYCHUNK_WRITE
  - FSCTL_REQUEST_RESUME_KEY -> FSCTL_SRV_REQUEST_RESUME_KEY

server/smbfsctl.h contains the following additional definitions compared to
common/smbfsctl.h:

  - IO_REPARSE_TAG_LX_SYMLINK_LE
  - IO_REPARSE_TAG_AF_UNIX_LE
  - IO_REPARSE_TAG_LX_FIFO_LE
  - IO_REPARSE_TAG_LX_CHR_LE
  - IO_REPARSE_TAG_LX_BLK_LE

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: prevent path traversal bypass by restricting caseless retry

ksmbd_vfs_path_lookup() enforces LOOKUP_BENEATH to restrict path
resolution within the share root. When a crafted path attempts to
escape the share boundary using parent-directory components ('..'),
vfs_path_parent_lookup() detects this and immediately fails,
returning -EXDEV.

However, a bug exists in __ksmbd_vfs_kern_path() under caseless mode.
The function fails to intercept the -EXDEV error and erroneously
falls through to the caseless retry logic, which is intended only
for genuinely missing files. During this retry process, the path
is reconstructed, leading to an unintended LOOKUP_BENEATH bypass
that allows write-capable users to create zero-length files or
directories outside the exported share.

Fix this by ensuring that the execution only proceeds to the caseless
lookup retry when the error is specifically -ENOENT. Any other errors,
such as -EXDEV from a path traversal attempt, must be returned immediately.

Cc: stable@vger.kernel.org
Reported-by: Y s65 <yu4ys@outlook.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix UAF of struct file_lock in SMB2_LOCK deferred-lock cancellation

When a blocking byte-range lock request is deferred in the
FILE_LOCK_DEFERRED path, ksmbd registers the asynchronous work into
the connection's async_requests list via setup_async_work(). The cancel
callback smb2_remove_blocked_lock() holds a reference to the flock.

If the lock waiter is subsequently woken up but the work state is no
longer KSMBD_WORK_ACTIVE (e.g., due to a concurrent cancellation), the
cleanup path calls locks_free_lock(flock) without dequeuing the work from
the async_requests list. Concurrently, smb2_cancel() walks the list
under conn->request_lock and invokes the cancel callback, which then
dereferences the already freed 'flock'. This leads to a slab-use-after-free
inside __wake_up_common.

Fix this by restructuring the cleanup logic after the worker returns
from ksmbd_vfs_posix_lock_wait(). Move list_del(&smb_lock->llist) and
release_async_work(work) to the top of the cleanup block. This guarantees
that the async work is completely dequeued and serialized under
conn->request_lock before locks_free_lock(flock) is called, rendering
the flock unreachable for any concurrent smb2_cancel().

Cc: stable@vger.kernel.org
Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix use-after-free in same_client_has_lease()

same_client_has_lease() returns an opinfo pointer from ci->m_op_list
after dropping ci->m_lock without taking a reference.

smb_grant_oplock() then dereferences that pointer in copy_lease() and
when checking breaking_cnt. A concurrent close can remove the old lease
from ci->m_op_list and drop the last reference before the caller uses
the returned pointer, leading to a use-after-free.

Take a reference when same_client_has_lease() selects an existing lease,
drop any previous match while scanning, and release the returned
reference in smb_grant_oplock() after copying the lease state.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

ksmbd: fix out-of-bounds read in smb_check_perm_dacl()

The permission-check ACE walk in smb_check_perm_dacl() validates the ACE
header size and caps sid.num_subauth at SID_MAX_SUB_AUTHORITIES, but it
never checks that ace->size is actually large enough to contain
num_subauth sub-authorities before compare_sids() dereferences them.

CIFS_SID_BASE_SIZE covers the SID header up to but excluding the
sub_auth[] array, and offsetof(struct smb_ace, sid) is the ACE header,
so the existing guards only guarantee the 8-byte SID base, i.e. zero
sub-authorities. compare_sids() then reads ace->sid.sub_auth[i] for
i < min(local_sid->num_subauth, ace->sid.num_subauth). The local
comparison SIDs (sid_everyone, sid_unix_NFS_mode, and the id_to_sid()
result) always have at least one sub-authority, and an attacker controls
the ACE revision and authority bytes (which lie within the in-bounds SID
base), so they can match one of those SIDs and force the sub_auth read.

A crafted ACE with size == 16 and num_subauth >= 1 placed at the tail of
the security descriptor therefore causes a heap out-of-bounds read of up
to SID_MAX_SUB_AUTHORITIES * sizeof(__le32) bytes past the pntsd
allocation. The security descriptor is loaded by ksmbd_vfs_get_sd_xattr()
into a buffer sized exactly to the on-disk data (kzalloc(sd_size) in
ndr_decode_v4_ntacl()), so the read lands past the allocation. The
malformed descriptor can be stored verbatim via SMB2_SET_INFO (the DACL
is not normalised before being written to the security.NTACL xattr) and
the read fires on a subsequent SMB2_CREATE access check, making this
reachable by an authenticated client on a share that uses ACL xattrs.

Add the missing num_subauth-versus-ace_size check, mirroring the
identical guards already present in the sibling parsers parse_dacl() and
smb_inherit_dacl().

Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()")
Cc: stable@vger.kernel.org
Signed-off-by: Hem Parekh <hemparekh1596@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

dt-bindings: i2c: microchip,corei2c: permit resets

Both CoreI2C and the hardened versions of it on mpfs and pic64gx have a
reset pin. For the former, usually this is wired to a common fabric
reset not managed by software and for the latter two the platform
firmware takes them out of reset on first-party boards (or those using
modified versions of the vendor firmware), but not all boards may take
this approach. Permit providing a reset in devicetree for Linux, or
other devicetree-consuming software, to use.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260506-bronchial-kitten-e3697fb66ba7@spud

rust: bitfield: mark `Debug` impl as `#[inline]`

A `Debug` impl is for debugging and is normally not used, and therefore
should ideally not be code-generated unless used. However, Rust has no way
of knowing if a dependent crate is going to use the trait impl or not, so
unless it is marked as `#[inline]`, it will be code-generated in the
defining crate (as it is not generic).

Mark the impl generated by bitfield macro `#[inline]`, so they do not stay
in the binary unless used.

This reduces nova-core.o .text by 17% (from 151922 bytes to 125676 bytes).

Signed-off-by: Gary Guo <gary@garyguo.net>
Fixes: b7b8b4ccdad4 ("rust: extract `bitfield!` macro from `register!`")
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260611190555.2298991-1-gary@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

i2c: qcom: Unify user-visible "Qualcomm" name

Various names for Qualcomm as a company are used in user-visible config
options: QCOM, Qualcomm and Qualcomm Technologies. Switch to unified
"Qualcomm" so it will be easier for users to identify the options when
for example running menuconfig.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260423173550.92317-2-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>

i2c: ls2x-v2: return IRQ_HANDLED after servicing an error

The event ISR reads SR1 and, when an error flag (ARLO/AF/BERR) is set,
calls loongson2_i2c_isr_error() which clears the offending flag, issues
STOP for the AF case, records msg->result, masks every CR2 interrupt
enable and completes the waiter. The handler then returns IRQ_NONE,
declaring to the IRQ core that the device did not interrupt.

That report is wrong. The device did interrupt and the handler fully
serviced it. Because the IRQ is requested with IRQF_SHARED, the genirq
spurious-IRQ tracker counts each error as unhandled. A bus that emits
sporadic NACKs, arbitration losses or bus errors will therefore march
toward the spurious-IRQ threshold and the line can end up disabled,
wedging the controller.

Return IRQ_HANDLED on this path. The other IRQ_NONE site, taken when
neither an event nor an error bit is set, remains correct.

Fixes: 6d1b0785f6d5 ("i2c: ls2x-v2: Add driver for Loongson-2K0300 I2C controller")
Signed-off-by: David Carlier <devnexen@gmail.com>
Assisted-by: Codex:GPT-5.5
Reviewed-by: Binbin Zhou <zhoubinbin@loongson.cn>
Tested-by: Binbin Zhou <zhoubinbin@loongson.cn>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260506154015.94815-1-devnexen@gmail.com

ipv4: fib_rule: Move fib4_rules_exit() to ->exit().

syzbot reported use-after-free of net->ipv4.rules_ops. [0]

It can be reproduced with these commands:

  while true; do
   ip netns add ns1
   ip -n ns1 link set dev lo up
   ip -n ns1 address add 192.0.2.1/24 dev lo
   ip -n ns1 link add name dummy1 up type dummy
   ip -n ns1 address add 198.51.100.1/24 dev dummy1
   ip -n ns1 rule add ipproto tcp sport 12345 table 12345
   ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321
   ip netns del ns1
  done

The cited commit moved fib4_rules_exit() earlier to ->exit_rtnl(),
but the kernel socket destroyed in ->exit() could eventually reach
__fib_lookup().

I left fib4_rules_exit() in ->exit_rtnl() because fib4_rule_delete()
calls fib_unmerge(), which requires RTNL.

However, when ->delete() is called, ->configure() has already been
called, thus fib_unmerge() in ->delete() has no effect.

Let's remove fib_unmerge() in fib4_rule_delete() and move
fib4_rules_exit() to ->exit().

Many thanks to Ido Schimmel for providing the nice repro very quickly.

Note that we can make fib_rules_ops.delete() return void once
net-next opens.

[0]:
BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641

CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Workqueue: netns cleanup_net
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description+0x55/0x1e0 mm/kasan/report.c:378
print_report+0x58/0x70 mm/kasan/report.c:482
kasan_report+0x117/0x150 mm/kasan/report.c:595
fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321
__fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96
ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811
ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702
__ip_route_output_key include/net/route.h:169 [inline]
ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929
ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118
release_sock+0x206/0x260 net/core/sock.c:3861
inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950
udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
fou_release net/ipv4/fou_core.c:562 [inline]
fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230
ops_exit_list net/core/net_namespace.c:199 [inline]
ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252
cleanup_net+0x572/0x810 net/core/net_namespace.c:702
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397
worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478
kthread+0x389/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Fixes: 759923cf03b0 ("ipv4: fib: Convert fib_net_exit_batch() to ->exit_rtnl().")
Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a315824.b0403584.28d0ff.0000.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260616191359.4142661-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: serialize netif_running() check in enqueue_to_backlog()

Syzbot reported a KASAN slab-use-after-free in fib_rules_lookup().

The root cause is a race condition where packets can escape the backlog
flushing during device unregistration (e.g., during netns exit).

Commit e9e4dd3267d0 ("net: do not process device backlog during unregistration")
introduced a lockless netif_running() check in enqueue_to_backlog() to
prevent queuing packets to an unregistering device.

However, this creates a TOCTOU race window.

A lockless transmitter (like veth_xmit) can pass
the check before dev_close() clears IFF_UP. If the transmitter is then
delayed, flush_all_backlogs() can run and finish before the transmitter
grabs the backlog lock and queues the packet. The packet then escapes
the flush and triggers UAF later when processed.

Fix this by moving the netif_running() check inside the backlog lock.
This serializes the check with the flush work (which also grabs the lock).
We then either queue the packet before the flush runs (so it gets flushed),
or check netif_running() after the flush/close completes (so it gets dropped).

Fixes: e9e4dd3267d0 ("net: do not process device backlog during unregistration")
Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a315824.b0403584.28d0ff.0000.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260616141317.407791-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rust: prelude: add `zerocopy{,_derive}::IntoBytes`

In order to easily use `IntoBytes`, add it to the prelude.

This adds both the trait (`zerocopy::IntoBytes`) as well as the derive
macro (`zerocopy_derive::IntoBytes`).

[ This patch will simplify using the feature in several trees next cycle.

  Gary writes:

    This is wanted because I want to convert the upcoming I/O projection
    series to use `zerocopy` traits rather than keep using transmute
    module.

    It is most helpful for derives in doc tests; I do not want to
    explicitly use `#[derive(zerocopy_derive::IntoBytes)]` in the
    doc tests.

  For reference, the upcoming I/O projection series is at:

    https://lore.kernel.org/rust-for-linux/20260611-io_projection-v4-0-1f7224b02dcb@garyguo.net/

      - Miguel ]

Signed-off-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260611134802.2052296-1-gary@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

i2c: atr: annotate i2c_atr_adap_desc->aliases with __counted_by_ptr

Add the __counted_by_ptr() compiler attribute to ->aliases to improve
bounds checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260611215501.464405-3-thorsten.blum@linux.dev

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Merge in late fixes in preparation for the net-next PR.

Conflicts:

net/tls/tls_sw.c
  406e8a651a7b ("net: skmsg: preserve sg.copy across SG transforms")
  79511603a65b ("tls: remove dead sockmap (psock) handling from the SW path")

drivers/net/ethernet/microsoft/mana/mana_en.c
  f8fd56977eeea ("net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check")
  d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size")
https://lore.kernel.org/ajAPXu-C_PuTgV-a@sirena.org.uk

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

io_uring: Use system_dfl_wq instead of system_unbound_wq

Commit de7341ffe49e ("io_uring: switch normal task_work to a mpscq")
added a use of system_unbound_wq, which is deprecated in favor of
system_dfl_wq added by commit 128ea9f6ccfb ("workqueue: Add
system_percpu_wq and system_dfl_wq"). An upcoming warning in the
workqueue tree flags this with:

workqueue: work func io_tctx_fallback_work enqueued on deprecated workqueue. Use system_{percpu|dfl}_wq instead.

Switch to system_dfl_wq to clear up the warning.

Fixes: de7341ffe49e ("io_uring: switch normal task_work to a mpscq")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260616-io_uring-fix-wq-warning-v1-1-cfc9d934eedb@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

i2c: algo: bit: use str_plural helper in bit_xfer

Replace the manual ternary "s" pluralizations with str_plural() to
simplify the code.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260511104911.183606-3-thorsten.blum@linux.dev

net: skmsg: preserve sg.copy across SG transforms

The sk_msg sg.copy bitmap is part of the scatterlist entry ownership
state. A set bit tells sk_msg_compute_data_pointers() not to expose the
entry through writable BPF ctx->data. This protects entries backed by
pages that are not private to the sk_msg, such as splice-backed file
page-cache pages.

Several sk_msg transform paths move, copy, split, or compact
msg->sg.data[] entries without moving the matching sg.copy bit. This can
make an externally backed entry arrive at a new slot with a clear copy
bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable
ctx->data and BPF stores can modify the original page cache.

Keep sg.copy synchronized with sg.data[] whenever entries are
transferred, shifted, split, or copied into a new sk_msg. Clear the bit
when an entry is replaced by a newly allocated private page or freed.
This covers the BPF pull/push/pop helpers, sk_msg_shift_left/right(),
sk_msg_xfer(), and tls_split_open_record(), including the partial tail
entry created during TLS open-record splitting.

Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Cc: stable@vger.kernel.org
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Signed-off-by: Yiming Qian <yimingqian591@gmail.com>
Link: https://patch.msgid.link/20260610062137.49075-1-yimingqian591@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'appletalk-move-the-protocol-out-of-tree'

Jakub Kicinski says:

====================
appletalk: move the protocol out of tree

This tiny series moves appletalk out of tree, to:

https://github.com/linux-netdev/mod-orphan

Core maintainainers are unable to keep up with the rate of security
bug reports and fixes. Nobody seems to care about appletalk enough
to review the patches.

As Eric pointed out Mac OS dropped AppleTalk over a decade ago.
====================

Link: https://patch.msgid.link/20260615222935.947233-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

appletalk: move the protocol out of tree

AppleTalk has been removed in MacOS X 10.6 (Snow Leopard), in 2009,
according to Wikipedia. We recently got a burst of AI generated
fixes to this protocol which nobody is reviewing.

Let AppleTalk follow AX.25 and hamradio out of the Linux tree.
We we will maintain the code at: github.com/linux-netdev/mod-orphan
for anyone interested in playing with it.

Retain the uAPI for now. No strong reason, simply because I suspect
keeping it will be less controversial.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://patch.msgid.link/20260615222935.947233-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

appletalk: stop storing per-interface state in struct net_device

AppleTalk keeps its per-interface control block (struct atalk_iface)
directly in struct netdevice (dev->atalk_ptr). This is the only thing
tying the protocol into the core net_device layout and is the sole
blocker to moving AppleTalk out of tree.

Replace dev->atalk_ptr with a small ifindex-keyed hashtable internal
to ddp.c. The existing atalk_interfaces list stays the owner of the iface
objects; the hashtable is purely a fast dev->iface index and reuses
the same atalk_interfaces_lock.

AFAICT this patch does not make this code any more racy than it already
is, I'm sure Sashiko will point out some basically existing bugs.
AFAICT atalk_interfaces_lock is the innermost lock already.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://patch.msgid.link/20260615222935.947233-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

i3c: mipi-i3c-hci: Use named initializers for platform_device_id's .driver_data

The assignment in this driver uses a mixed way to initialize the
platform_device_id array. .name is assigned by name and .driver_data by
position. Unify that to use named assignment for both struct members.
This is needed for a planned change to struct platform_device_id
replacing .driver_data by an anonymous union.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://patch.msgid.link/7c006616f72748fb4deccd197ca2b6427c006f79.1781620397.git.ukleinek@kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

i3c: master: Use unsigned int for dev_nack_retry_count consistently

Use unsigned int for dev_nack_retry_count across the core and
controller drivers to match the type of master->dev_nack_retry_count.

Update the sysfs store path to use kstrtouint() and adjust the
->set_dev_nack_retry() callback prototype and callers accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260616113752.196140-4-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

i3c: master: Add missing runtime PM get in dev_nack_retry_count_store()

Ensure the device is runtime resumed while updating the retry
configuration to avoid accessing the controller while suspended.

Call i3c_master_rpm_get() before accessing the controller in
dev_nack_retry_count_store() and release it with
i3c_master_rpm_put() afterwards.

Fixes: 990c149c61ee4 ("i3c: master: Introduce optional Runtime PM support")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260616113752.196140-3-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

i3c: master: Update dev_nack_retry_count under maintenance lock

Protect master->dev_nack_retry_count against concurrent sysfs updates
by updating it while holding the bus maintenance lock.

Consequently, combine adjacent return statements into one.

For consistency, read dev_nack_retry_count while holding the bus normaluse
lock.

Fixes: b58f47eb39268 ("i3c: add sysfs entry and attribute for Device NACK Retry count")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260616113752.196140-2-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

block: respect iov_iter::nofault flag in bio_iov_iter_bounce_write()

For the incoming usage of IOMAP_DIO_BOUNCE in btrfs, btrfs has set
iov_iter::nofault to prevent deadlock when a page fault is needed to
read out the buffer.

However bio_iov_iter_bounce_write() doesn't respect iov_iter::nofault
flag, and just call a plain copy_from_iter() so it can still trigger
page fault and cause deadlock in btrfs.

Fix it by utilizing copy_folio_from_iter_atomic() if nofault flag is
set, otherwise use copy_folio_from_iter().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/9c165a314022b61566eb247852eb773ca6c70889.1781597506.git.wqu@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: revert the iov_iter after a short copy in bio_iov_iter_bounce_write()

For the incoming IOMAP_DIO_BOUNCE flag usage inside btrfs, it's pretty
easy to hit short copy inside bio_iov_iter_bounce_write().

This is because btrfs has disabled page fault to avoid certain deadlock
during direct writes, and instead btrfs manually fault in the pages then
retry.

And inside bio_iov_iter_bounce_write(), if we hit a short write, we
didn't revert the iov_iter, which can cause problems like unexpected
garbage for the next retry.

Revert the iov_iter after a short copy.

One thing to note is that, the folio is allocated then immediately
queued into the bio, so the proper revert size should be
(bi_size - this_len + copied).

Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/c400989f227343b134110773d5acaaacf7024574.1781597506.git.wqu@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

spi: dw: fix wrong BAUDR setting after resume

After resuming from suspend to ram, spi transfer stops working. Further
debugging shows that the BAUDR register isn't correctly set, this is
due to dws->current_freq doesn't match the HW BAUDR setting,
specifically, the dws->current_freq equals to speed_hz, but BAUDR is 0.
so the dw_spi_set_clk() in below code won't be called:

        if (dws->current_freq != speed_hz) {
                dw_spi_set_clk(dws, clk_div);
                dws->current_freq = speed_hz;
        }

The mismatch comes from dw_spi_shutdown_chip() when suspending.
Fix this mismatch by setting dws->current_freq to 0 as well when
clearing BAUDR reg in dw_spi_shutdown_chip().

Fixes: e24c74527207 ("spi: controller driver for Designware SPI core")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://patch.msgid.link/20260612002835.5240-1-jszhang@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: uniphier: Fix completion initialization order before devm_request_irq()

The driver calls devm_request_irq() before initializing the completion
used by the interrupt handler. Because the interrupt may occur immediately
after devm_request_irq(), the handler may execute before init_completion().

This may result in calling complete() on an uninitialized completion,
causing undefined behavior. This has been observed with KASAN.

Fix this by initializing the completion before registering the IRQ.

Reported-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
Reported-by: Kyungwook Boo <bookyungwook@gmail.com>
Fixes: 5ba155a4d4cc ("spi: add SPI controller driver for UniPhier SoC")
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260616011223.201357-1-hayashi.kunihiko@socionext.com
Signed-off-by: Mark Brown <broonie@kernel.org>

spi: Add NULL check for spi_get_device_id() in spi_get_device_match_data()

Prevent NULL pointer dereference when spi_get_device_id() returns NULL,
which can happen when using driver_override without matching SPI ID entry.

Signed-off-by: guoqi0226 <guoqi0226@163.com>
Link: https://patch.msgid.link/20260616103018.105612-3-guoqi0226@163.com
Signed-off-by: Mark Brown <broonie@kernel.org>

accel/amdxdna: Use caller client for debug BO sync

amdxdna_drm_sync_bo_ioctl() looks up args->handle in the ioctl caller's
drm_file. For SYNC_DIRECT_FROM_DEVICE, it then calls
amdxdna_hwctx_sync_debug_bo(), but passes abo->client.

amdxdna_hwctx_sync_debug_bo() uses the passed client both as the handle
namespace for debug_bo_hdl and as the owner of the hardware context xarray.
Those must match the file that supplied args->handle. The BO's stored
client pointer is object state, not the ioctl context.

Pass filp->driver_priv instead, matching the original handle lookup.

Fixes: 7ea046838021 ("accel/amdxdna: Support firmware debug buffer")
Cc: stable@vger.kernel.org # v6.19+
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/178155468039.81818.12173237984867749651@gmail.com

Merge branch 'for-7.2/bpf' into for-linus

Merge branch 'for-7.2/cleanup_driver_data' into for-linus

- semantic cleanup fixes for 'hid_device_id::driver_data' (Pawel Zalewski)

Merge branch 'for-7.2/core' into for-linus

Merge branch 'for-7.2/cp2112' into for-linus

- fwnode support for cp2112 (Danny Kaehn)
- fix for cp2112 firmware-based speed configuration, if available (Danny Kaehn)

Merge branch 'for-7.2/i2c-hid' into for-linus

Merge branch 'for-7.2/logitech' into for-linus

- fix for high resolution scrolling for Logitech HID++ 2.0 devices (Lauri Saurus)

Merge branch 'for-7.2/multitouch' into for-linus

- UX improvement fixes for Yoga Book 9 (Dave Carey)

Merge branch 'for-7.2/nintendo' into for-linus

- support for HORI Wireless Switch Pad (Hector Zelaya)

Merge branch 'for-7.2/oxp' into for-linus

- suport for OneXPlayer (Derek J. Clark)

Merge branch 'for-7.2/playstation' into for-linus

Merge branch 'for-7.2/rakk' into for-linus

- support for Rakk Dasig X (Karl Cayme)

Merge branch 'for-7.2/sony' into for-linus

Merge branch 'for-7.2/wacom' into for-linus

- memory corruption and scheduling while atomic and error fixes (Jinmo Yang)
- error handling fix (Myeonghun Pak)

Merge branch 'for-7.2/wiimote' into for-linus

ipmi: Drop unused assignment of platform_device_id driver data

The driver explicitly sets the .driver_data member of struct
platform_device_id to zero without relying on that value. Drop these
unused assignments.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Message-ID: <9afdb7b0894f51fba78c64612428f7bb117901d1.1781620139.git.u.kleine-koenig@baylibre.com>
Signed-off-by: Corey Minyard <corey@minyard.net>

RDMA/irdma: Replace waitqueue and flag with completion

The driver previously used a waitqueue along with an explicit
request_done flag, but without proper barriers around request_done.

An earlier patch by Gui-Dong Han <hanguidong02@gmail.com> attempted
to fix this by adding the missing memory barriers. Rather than
adding the barriers, this patch replaces the waitqueue+flag with
a completion, which is designed for this exact purpose.

Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions")
Link: https://patch.msgid.link/r/20260616155601.1081448-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/hns: Fix memory leak of bonding resources

In a corner case of concurrent driver removal and driver reset,
bonding resource is first released in hns_roce_hw_v2_exit() during
driver removal, and then is allocated again in hns_roce_register_device()
during driver reset. This leads to memory leak because the release
timing has already passed. This may also lead to a kernel panic
as below because of the leaked notifier callback:

Call trace:
  0xffffa20fccc04978 (P)
  raw_notifier_call_chain+0x20/0x38
  call_netdevice_notifiers_info+0x60/0xb8
  netdev_lower_state_changed+0x4c/0xb8

As Sashiko suggested, the teardown order of bonding resources should
be inverted to make sure the resources are released when the driver
is removed.

Fixes: b37ad2e290fc ("RDMA/hns: Initialize bonding resources")
Link: https://patch.msgid.link/r/20260613102045.811623-1-huangjunxian6@hisilicon.com
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sg

When the server answers an RTRS READ, rdma_write_sg() builds the source
scatter/gather entry for the IB_WR_RDMA_WRITE that returns data to the
peer. Its length is taken directly from the wire descriptor:

plist->length = le32_to_cpu(id->rd_msg->desc[0].len);

rd_msg points into the chunk buffer that the remote peer filled via
RDMA-WRITE-WITH-IMM (rtrs_srv_rdma_done() -> process_io_req() ->
process_read()), so desc[0].len is attacker-controlled and, before this
change, was only rejected when zero. The source address is the fixed
chunk start (dma_addr[msg_id]) and the source lkey is the PD-wide
local_dma_lkey, which is not tied to the chunk's MR mapping, so the verbs
layer does not constrain the transfer length to max_chunk_size. msg_id
and off are bounded against queue_depth and max_chunk_size in
rtrs_srv_rdma_done(), but desc[0].len is a separate field that was not
checked against the chunk size.

A peer that advertises desc[0].len larger than max_chunk_size can make
the posted RDMA write read past the chunk's mapped region. The resulting
behaviour depends on the IOMMU configuration: with no IOMMU or in
passthrough mode the read may extend into memory adjacent to the chunk
and be returned to the peer, which can disclose host memory; with a
translating IOMMU the out-of-range access is expected to fault and abort
the connection. In either case the transfer exceeds what the protocol
permits and is driven by a remote peer.

Reject a descriptor length above max_chunk_size, mirroring the existing
off >= max_chunk_size bound in rtrs_srv_rdma_done(). Legitimate clients
do not exceed it: the client sets desc[0].len to its MR length, which is
capped at the negotiated max_io_size (max_chunk_size - MAX_HDR_SIZE).

Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Link: https://patch.msgid.link/r/20260612-master-v1-1-70cde5c6fdc9@gmail.com
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Zhenhao Wan <whi4ed0g@gmail.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

docs: infiniband: correct name of option to enable the ib_uverbs module

The Infiniband documentation states that CONFIG_INFINIBAND_USER_VERBS
should be used to enable the ib_uverbs module. However, this option was
renamed to CONFIG_INFINIBAND_USER_ACCESS in commit 17781cd6186c
("[PATCH] IB: clean up user access config options"). Update the
documentation to reflect this.

Link: https://patch.msgid.link/r/20260616002027.67925-1-enelsonmoore@gmail.com
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocated

If a user calls BNXT_RE_METHOD_GET_TOGGLE_MEM on a device that does not
support the CQ/SRQ toggle feature, uctx_cq_page or uctx_srq_page will
be NULL.

Add an explicit -EOPNOTSUPP return after capturing the address from
uctx_cq_page / uctx_srq_page if the address is zero.

Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace")
Fixes: 181028a0d84c ("RDMA/bnxt_re: Share a page to expose per SRQ info with userspace")
Link: https://patch.msgid.link/r/20260615224751.232802-16-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/bnxt_re: Fail DBR related page allocation UAPIs if the feature is disabled

No need to support the DBR related page allocations if the pacing feature
is disabled. Fail the request if pacing is disabled.

Fixes: ea2224857882 ("RDMA/bnxt_re: Update alloc_page uapi for pacing")
Link: https://patch.msgid.link/r/20260615224751.232802-15-selvin.xavier@broadcom.com
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/bnxt_re: Avoid repeated requests to allocate WC pages

Applications can request multiple WC pages for the same ucontext.
As of now, only 1 WC page per ucontext is supported. Add a lock to
avoid concurrent access and a check to fail repeated requests.
Also, if the mmap entry insert fails for the WC, free the Doorbell
page index mapped for the WC page.

Fixes: eee6268421a2 ("RDMA/bnxt_re: Move the UAPI methods to a dedicated file")
Fixes: 360da60d6c6e ("RDMA/bnxt_re: Enable low latency push")
Link: https://patch.msgid.link/r/20260615224751.232802-12-selvin.xavier@broadcom.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

drm/xe: Add compact-PT and addr mask handling for page reclaim

Current implementation of generate_reclaim_entry() overlooks some
differences between the different page implementations: address masking
and compact 64K page handling.

Address masking of each leaf varies depending on the leaf entry size.
generate_reclaim_entry() is using XE_PTE_ADDR_MASK [51:12] for all leaf
entries. For 2MB PTEs, bit 12 (PAT) is part of the flags so the old mask
corrupts the physical address extraction.

64K pages can be represented as PS64 and a compact PT, which the latter
was not handled. Compact pages aren't walked by the unbind walker, so we
separately walk through the compact PT to ensure none of the leaf 64K
PTEs are dropped. Previously, compact PT were causing an abort since it
was considered covered and not descended into.

v2:
- Update 64K entry/unbind walker for 64K compact PT handling. (Matthew)
- Rework calculations of reclamation and address mask size.
- Add new func abstracting the error handling before generating the
   reclaim entry.

v3:
- Report finer addr granularity in abort debug print for compact.
   (Zongyao)
- Add comments for ADDR_MASK usage. (Zongyao)
- Drop existing phys_addr asserts, the new XE_PAGE_ADDR_MASK clears
   bits checked, so redundant asserts. (Sashiko)
- WARN_ON to verify compact pt and edge pt won't be possible.

Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind")
Assisted-by: Sashiko-Review:gemini-3.1-pro-preview
Cc: stable@vger.kernel.org
Cc: Matthew Auld <matthew.auld@intel.com>
Suggested-by: Zongyao Bai <zongyao.bai@intel.com>
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Zongyao Bai <zongyao.bai@intel.com>
Link: https://patch.msgid.link/20260605224257.2194194-2-brian3.nguyen@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 669252801a4aa4098fbc5dd9dd0bd93f0625abd7)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe/guc: Fix buffer overflow in steered register list allocation

The size calculation for the steered register extarray uses only the
geometry DSS mask (g_dss_mask) to determine the number of entries to
allocate:

total = bitmap_weight(gt->fuse_topo.g_dss_mask, ...) * steer_reg_num;

However, the filling loop uses for_each_dss_steering(), which iterates
over for_each_dss(), defined as the union of g_dss_mask and c_dss_mask
(geometry + compute DSS). On platforms with compute-only DSS bits, the
loop writes past the allocated buffer, corrupting adjacent slab objects.

This manifests as list_del corruption and SLUB redzone overwrites during
drm_managed_release on device unbind, since the overflow corrupts the
drmres list_head of neighboring allocations.

Fix by computing the allocation size using the union of both DSS masks,
matching the iteration pattern of for_each_dss_steering().

--
v2:
- use bitmap_weighted_or() (Zhanjun)

Fixes: b170d696c1e2 ("drm/xe/guc: Add XE_LP steered register lists")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/8049
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: stable@vger.kernel.org
Assisted-by: GitHub-Copilot:claude-opus-4.6
Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com>
Link: https://patch.msgid.link/20260612070401.543305-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 0a78a44f4901aa6c9263e66be7fce02282f1109f)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe: Set TTM device beneficial_order to 9 (2M)

Set the TTM device beneficial_order to 9 (2M), which is the sweet
spot for Xe when attempting reclaim on system memory BOs, as it matches
the large GPU page size. This ensures reclaim is attempted at the most
effective order for the driver.

This fixes an issue where an order-10 (4M) allocation cannot be found
despite an abundance of memory. The 4M allocation triggers reclaim,
unnecessarily evicting the working set and hurting performance. Since
the TTM infrastructure was introduced recently, we are tagging the TTM
patch as the Fixes target, even though this resolves an Xe-side problem.

Fixes: 7e9c548d3709 ("drm/ttm: Allow drivers to specify maximum beneficial TTM pool size")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260611235844.3725147-1-matthew.brost@intel.com
(cherry picked from commit 0d81db90d364cb3d733410829118759f28957c5a)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe: Fix wa_oob codegen recipe for external module builds

When building with 'make M=drivers/gpu/drm/xe modules', kbuild invokes
scripts/Makefile.build with obj=., causing $(obj) to expand to '.'.
Make normalizes './xe_gen_wa_oob' to 'xe_gen_wa_oob' when constructing
the $^ automatic variable (target name normalization), so the recipe
command becomes just 'xe_gen_wa_oob ...' without any path prefix, and
the shell cannot find the tool.

Fix by replacing $^ with explicit $(obj)/xe_gen_wa_oob and
$(src)/<rules-file> references in both wa_oob recipe commands.
In recipe strings, make does not apply target name normalization, so
$(obj)/xe_gen_wa_oob correctly expands to './xe_gen_wa_oob' and the
shell can execute it. This matches the pattern already used by other
DRM drivers (e.g. radeon's mkregtable).

Fixes: f037e0b78e6d ("drm/xe: add xe_device_wa infrastructure")
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-xe@lists.freedesktop.org
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260604074501.172129-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 3a11a63cc16660d514ff584e7551589655337e87)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe: fix job timeout recovery for unstarted jobs and kernel queues

A job that GuC never scheduled (never started) indicates a GuC
scheduling failure; previously such jobs were silently errored out
instead of triggering a GT reset to recover. Trigger a GT reset and
resubmit them, but only when the queue was not already killed or banned:
an unstarted job on an already banned queue is the ban working as
intended and must neither clear the ban nor kick off a reset, otherwise
a banned userspace queue could be resurrected and spam GT resets.

Kernel queues are always recovered this way and wedge the device once
recovery attempts are exhausted, since kernel work must not silently
fail. A started job that times out on a userspace VM bind queue stays
banned rather than being reset and retried.

The queue is banned early in the timeout handler to signal the G2H
scheduling-done handler so it wakes the disable-scheduling waiter;
without it the waiter sleeps the full 5s timeout. When a reset is
warranted the ban is cleared before rearming so that
guc_exec_queue_start() can resubmit jobs after the GT reset - a
still-banned queue would block resubmission and cause an infinite TDR
loop. The already-banned case is gated out before this point via
skip_timeout_check, so it is unaffected.

v2: (Himal) Do it for any queue type, not just kernel/migration
v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already
      killed/banned queues on unstarted-job timeout
    - Update commit message
    - (Matt) Add Fixes tag

Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Assisted-by: GitHub-Copilot:claude-opus-4.8
Tested-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe: fix refcount leak in xe_range_fence_insert()

xe_range_fence_insert() acquires a reference on fence via
dma_fence_get() and stores it in rfence->fence. It then calls
dma_fence_add_callback() and handles two cases: when the callback
is successfully registered (err == 0) the fence is transferred to
the tree for later cleanup; when the fence is already signaled
(err == -ENOENT) it manually drops the extra reference with
dma_fence_put(fence).

However, dma_fence_add_callback() can fail with other errors
(e.g. -EINVAL) and in that case the code falls through to the free:
label without releasing the acquired reference, leaking it.

Fix the leak by adding an else branch that calls dma_fence_put()
before jumping to free: for any error other than -ENOENT.

Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260610172705.3450560-1-matthew.brost@intel.com
(cherry picked from commit 98c4a4201290823c2c5c7ba21692bd9a64b61021)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe: include all registered queues in TLB invalidation

Context-based TLB invalidation currently selects only scheduling-active
exec queues via q->ops->active(). During rebind flows, queues may be
suspended (or transitioning through resume) while still owning valid
translations, causing them to be skipped from invalidation and leading
to missed TLB invalidations on LR rebinds.

The underlying issue is a TOCTOU: q->guc->state bits are flipped lock-free
from enable_scheduling(), disable_scheduling{,_deregister}(), the
suspend/resume sched-msg handlers, handle_sched_done(), and
guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes
against them, so any state-based predicate can race.

Include all the registered queues so that TLB invalidations are not
missed. This is race-free because list membership on vm->exec_queues.list
is stable under vm->exec_queues.lock held by the caller. The performance
impact is expected to be minimal and harmless. If it does turn out to be
a concern, we can come back with a race-safe solution to ignore certain
queues.

Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend")
Assisted-by: Claude:claude-opus-4.6
Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe/hw_error: Use HW_ERR prefix in log

Hardware errors should be logged with HW_ERR prefix. Make them
consistent with existing logs.

Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patch.msgid.link/20260602044919.702209-5-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit ad60a618c49fef07d1860bfb1091140d29f5eddb)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe/drm_ras: Add per node cleanup action

cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Add per node cleanup action which guarantees
cleanup on unwind and also simplifies the cleanup logic.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patch.msgid.link/20260602044919.702209-4-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 67fc5543d8274b2fcbef87734fad0469358f4478)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe/drm_ras: Make counter allocation drm managed

cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Fix this using drm managed allocation, which is
guaranteed to be cleaned up on unwind.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patch.msgid.link/20260602044919.702209-3-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 58d77c77ea0c5cb2b755ebe23e973c8272acd896)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe/multi_queue: skip submit when primary queue is suspended

Return early in submit path when the multi-queue primary exec
queue is suspended to avoid submitting while suspended.

v2: Remove idle_skip_suspend fix as that feature is being
reverted here https://patchwork.freedesktop.org/series/167262/

Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support")
Cc: stable@vger.kernel.org # v7.0+
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/xe: Clear pending_disable before signaling suspend fence

In the schedule-disable done path for suspend, we
signal the suspend fence before clearing pending_disable.

That wakeup can let suspend_wait complete and resume be queued
immediately. The resume path may then reach enable_scheduling()
while pending_disable is still set and hit the
!exec_queue_pending_disable(q) assertion.

Fix this by clearing pending_disable before signaling
the suspend fence, so any resumed transition observes a
consistent state.

Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com
(cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Revert "drm/xe: Skip exec queue schedule toggle if queue is idle during suspend"

This reverts commit 8533051ce92015e9cc6f75e0d52119b9d91610b6.

The idle-skip optimization bypasses GuC suspend, so the GPU may not
perform the context switch that flushes TLB entries for invalidated
userptr VMAs. In LR/preempt-fence VM mode, this can lead to missed TLB
invalidation and page faults during userptr invalidation tests.

Restore unconditional schedule toggling on suspend so the context-switch
TLB flush is always performed.

This optimization will be reintroduced with a fix that does not skip
suspend in LR/preempt-fence VM mode.

Fixes: 8533051ce920 ("drm/xe: Skip exec queue schedule toggle if queue is idle during suspend")
Cc: stable@vger.kernel.org # v7.0+
Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260603065217.3131066-2-tilak.tirumalesh.tangudu@intel.com
(cherry picked from commit 6a1e7934d9a6cf46aecae00a99c2603d1295e170)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Revert "drm/xe/nvls: Define GuC firmware for NVL-S"

This reverts commit 4e88de313ff4d1c67b644b1f39f9fb4089711b71.

The early GuC FW definition meant for our CI branch was accidentally
merged to the drm-xe-next branch instead. This GuC FW will never be
released to linux-firmware, so we do not want the definition to be
available in the mainline Linux codebase.

Fixes: 4e88de313ff4 ("drm/xe/nvls: Define GuC firmware for NVL-S")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org # v7.0+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260529193558.185436-11-daniele.ceraolospurio@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 65b8e0ac86e48cfc9128c04dfc53ea3395d030dd)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>

drm/i915/mtl+: Enable PPS before PLL

Enabling PPS after a display port's PLL is enabled leads to PLL / DDI
BUF timeouts during system resuming after a long (> 45 mins) suspended
state, at least on some ARL and MTL laptops, either all or some of them
also containing an Nvidia GPU. Enabling PPS first and then the PLL fixes
the problem for all the reporters.

A similar issue is seen when enabling an external DP output on PHY B
(vs. PHY A in the above eDP cases), where this change will not have any
effect (since no PPS is used in that case). There isn't any direct
connection between PPS and PLL, so the fix for eDP works by some
side-effect only. However Bspec does seem to require enabling PPS first,
so let's do that. Further investigation continues on the actual root
cause and a cure for external panels.

Fixes: 1a7fad2aea74 ("drm/i915/cx0: Enable dpll framework for MTL+")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16098
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16064
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16042
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: stable@vger.kernel.org # v7.0+
Tested-by: Jouni Högander <jouni.hogander@intel.com>
Tested-by: Marco Nenciarini <mnencia@kcore.it>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260612172617.3427027-1-imre.deak@intel.com
(cherry picked from commit 28783a274e886dd6da61419be6020bd9d0384e9f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: clear CRTC color blob pointers after dropping refs

intel_crtc_put_color_blobs() drops the CRTC color blob references, but
leaves the corresponding pointers unchanged.

This can matter in intel_crtc_prepare_cleared_state(), which frees the
old CRTC hw state before calling intel_dp_tunnel_atomic_clear_stream_bw().
The latter can fail while looking up the DP tunnel group state, for
example with -EDEADLK.

If that happens, the function returns without completing the cleared
state preparation. The failed atomic state will then be cleared by the
atomic core and intel_crtc_free_hw_state() can be called again for the
same state, dropping the same blob references again.

Clear the blob pointers after dropping the references so repeated cleanup
of the same CRTC hw state is safe.

Fixes: 77fcf58df15e ("drm/i915/dp_tunnel: Fix error handling when clearing stream BW in atomic state")
Suggested-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260612035310.3013066-1-lgs201920130244@gmail.com
(cherry picked from commit d5005addb5f68e8a0edce249506757bdc9e3d8c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915/mst: Call intel_pfit_compute_config() for sharpness filter

The sharpness filter property is on the CRTC (as opposed to the
connector) so the expectation is that it's usable on all output
types. Since the sharpness filter is now fully integrateds into
the normal pfit code intel_pfit_compute_config() must be called
from the encoder .compute_config() on all relevant output types.

Sharpness filter is supported on LNL+ so only HDMI and DP SST/MST
outputs are actually relevant. I already took care of HDMI and
DP SST, but (as usual) forgot about DP MST. Add the missing
intel_pfit_compute_config() call to make the sharpness filter
operational on DP MST as well.

Cc: Nemesa Garg <nemesa.garg@intel.com>
Fixes: d4686f34bbeb ("drm/i915/pfit: Call intel_pfit_compute_config() unconditionally on (e)DP/HDMI")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260409100841.1907-1-ville.syrjala@linux.intel.com
Reviewed-by: Nemesa Garg <nemesa.garg@intel.com>
(cherry picked from commit ca97f5546f191bf460b3f4b59ade3ea5e7378796)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

block: invalidate cached plug timestamp after task switch

blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime
and marks the task with PF_BLOCK_TS. That cache is only valid while the
task keeps running; if the task is switched out, wall-clock time
advances and the cached value must not be reused when the task runs again.

The existing invalidation covers explicit plug flushes through
__blk_flush_plug(), and the schedule() / rtmutex paths through
sched_update_worker(). It does not cover in-kernel preemption paths such
as preempt_schedule(), preempt_schedule_notrace(), and
preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and
return without calling sched_update_worker().

As a result, a task preempted while holding a plug with PF_BLOCK_TS set
can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost
then consumes that stale timestamp through ioc_now(), producing stale vnow
values for throttle decisions, and through ioc_rqos_done(), inflating
on-queue time and feeding false missed-QoS samples into vrate
adjustment.

Move the schedule-side invalidation to finish_task_switch(), which runs
for the scheduled-in task after every actual context switch regardless
of which schedule entry point was used. Keep __blk_flush_plug() as the
explicit flush/finish-plug invalidation path, and remove only the
PF_BLOCK_TS handling from sched_update_worker().

Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Cc: stable@vger.kernel.org
Signed-off-by: Usama Arif <usama.arif@linux.dev>
Link: https://patch.msgid.link/20260616141604.328820-3-usama.arif@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

kernel/fork: clear PF_BLOCK_TS in copy_process()

PF_BLOCK_TS is only set in blk_time_get_ns() when current->plug is
non-NULL, and blk_finish_plug() clears it via __blk_flush_plug()
before NULLing the plug pointer. copy_process() breaks the
invariant by inheriting PF_BLOCK_TS from the parent while resetting
the child's plug to NULL.

Clear PF_BLOCK_TS alongside that assignment so callers can rely on
"PF_BLOCK_TS set implies current->plug != NULL" and dereference
current->plug unguarded.

Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Cc: stable@vger.kernel.org
Signed-off-by: Usama Arif <usama.arif@linux.dev>
Link: https://patch.msgid.link/20260616141604.328820-2-usama.arif@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/register: preserve SQ array entries on resize

Ring resizing copies pending SQEs from the old SQE array into the new
one so submissions queued before the resize can still be consumed
afterwards.

That copy currently walks the SQ head/tail range directly. This is only
correct when there is no SQ array indirection. With a regular SQ array,
each pending SQ entry contains an index into the SQE array. After resize,
ctx->sq_array is repointed at the newly allocated array, so pending
entries lose their old logical-to-physical mapping and may submit the
wrong SQE.

Remember the old and new SQ arrays while migrating pending SQ entries. For
each pending entry, copy the SQE selected by the old array into the new
destination slot and rebuild the new array entry to point at the copied
SQE. Keep invalid user-provided entries invalid so the normal submission
path still drops them after resize.

Fixes: 79cfe9e59c2a1 ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: guzebing <guzebing1612@gmail.com>
Link: https://patch.msgid.link/20260608133316.3656440-1-guzebing1612@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: Remove redundant plug in __submit_bio()

The patch removes the automatic plug/unplug operations from __submit_bio()
that were added to cache nsecs time when no explicit plug is used.

The plug mechanism is most effective when batching multiple I/O
operations together. Creating a plug for every bio submission
provides minimal benefit while adding function call overhead and
stack usage for every I/O operation.

Below is performance comparison with the latest upstream kernel.

Iotype  qd nj  rmix  mpstat busy  mpstat busy without plug
Randrw  1  20  100       53%                 24%
Randrw  1  40  100       70%                 24%
Randrw  1  20  70        40%                 24%
Randrw  1  40  70        60%                 26%
Randrw  1  20  0         14%                 6%
Randrw  1  40  0         20%                 7%

Fixes: 060406c61c7c ("block: add plug while submitting IO")
Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260616143121.878021-1-wenxiong@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: fix IORING_URING_CMD_REISSUE flags check in blkdev_uring_cmd

blkdev_uring_cmd() checks IORING_URING_CMD_REISSUE to determine whether
this is the first issue. However, this flag lives in cmd->flags instead
of issue_flags.

Coincidentally, IO_URING_F_NONBLOCK shares bit 31 with
IORING_URING_CMD_REISSUE. As a result, the SQE read was never performed,
bic->len remained zero, and every BLOCK_URING_CMD_DISCARD failed with
-EINVAL.

Fix it by checking cmd->flags as intended.

Cc: stable@vger.kernel.org
Fixes: 212ec34e4e72 ("block: only read from sqe on initial invocation of blkdev_uring_cmd")
Signed-off-by: Yitang Yang <yi1tang.yang@gmail.com>
Link: https://patch.msgid.link/20260616155129.406057-1-yi1tang.yang@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'tls-reject-the-combination-of-tls-and-sockmap'

Jakub Kicinski says:

====================
tls: reject the combination of TLS and sockmap

There are no known TLS+sockmap users and it has some known
hard to solve bugs. Let's reject this configuration as we
discussed a number of times.
====================

Link: https://patch.msgid.link/20260614014102.461064-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: test that TLS crypto is rejected on a sockmap socket

TLS and sockmap are mutually exclusive. We already have a test
for the sockmap side rejecting kTLS, add the inverse test matching
patch 1 of this series.

Link: https://patch.msgid.link/20260614014102.461064-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: drop the unused kTLS program from test_sockmap

With the sockmap + kTLS tests gone, the BPF-side support in test_sockmap
is dead: the tls_sock_map map and bpf_prog3 (which redirected skbs into
it) are no longer referenced. Remove them, along with the now-unused
bpf_write_pass() helper.

bpf_prog3 was progs[2], so renumber the progs[] users in test_sockmap.c:
the sockops program drops to progs[2] and the sk_msg tx programs to
progs[3..7]. Shrink the map/prog arrays from 9 to 8 and drop the
tls_sock_map entry (the last one) from map_names[] to match.

Link: https://patch.msgid.link/20260614014102.461064-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/bpf: remove sockmap + ktls tests

The combination of sockmap and TLS is no longer supported - installing
the TLS ULP on a sockmap socket (and vice versa) is now rejected. Remove
the tests that exercise the combination along with their BPF program;
the file covered nothing but sockmap sockets holding kTLS contexts.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260614014102.461064-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: remove dead sockmap (psock) handling from the SW path

TLS and sockmap are now mutually exclusive. Try to delete the code
from sendmsg and recvmsg path which is now obviously dead.

The main goal is to delete enough code for AI security scanners
to no longer bother us with sockmap related bugs. At the same
time retain the code in case someone has the cycles to fix
all of this and make the integration work, again.

If the integration does not get restored we can wipe the rest
of the skmsg code from TLS in two or three releases.

The changes on the Tx side are deeper since that's where most
of the bugs are, Rx side simply takes the data from sockmap
and gives it to the user. On Tx split record handling and
rolling back the iterator were the two problem areas.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260614014102.461064-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: reject the combination of TLS and sockmap

TLS and sockmap (BPF psock) integration hides a lot of latent bugs.
Bugs which may be more or less relevant for real users but they
are definitely exploitable.

We could not find anyone actively using this integration so let's
reject this config. Adding a TLS socket to a sockmap was already
rejected by sk_psock_init() through the inet_csk_has_ulp() check.
We need to reject the attempts to configure the TLS keys (rather
than adding the ULP itself) because checking prior to the ULP
installation is tricky without risking a race with sockmap getting
added in parallel (sockmap does not hold the socket lock).

This patch is a minimal rejection of the feature. Subsequent patch
in the series will do a light dead code removal. Full cleanup would
require a major rewrite of the Tx path, we don't need skmsg any more.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260614014102.461064-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'atm-remove-more-dead-code'

Jakub Kicinski says:

====================
atm: remove more dead code

Commit 6deb53595092 ("net: remove unused ATM protocols and legacy
ATM device drivers") removed a good chunk of old ATM drivers.
Our goal going forward is to limit the ATM support to PPPoATM
used in ADSL deployments.

A recent burst of AI generated fixes for net/atm/signaling.c and
net/atm/svc.c made me look closer at the remaining code. PPPoATM runs
over permanent virtual circuits (PF_ATMPVC) with a statically
configured VPI/VCI. We can drop switched virtual circuits (SVCs)
and user-space signaling (atmsigd) support. While digging around
I noticed a few more obviously dead pieces of code.

Annoyingly, I have applied one "fix" to QoS config which will
now make net conflict with this series :/
====================

Link: https://patch.msgid.link/20260615194416.752559-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove orphaned uAPI for deleted drivers, protocols and SVCs

ATM removals have left a number of uAPI headers and ioctl
definitions with no in-kernel implementation behind them:

- device headers for adapters deleted with the legacy PCI/SBUS drivers:
   atm_eni.h, atm_he.h, atm_idt77105.h, atm_nicstar.h, atm_zatm.h and
   the atmtcp pair atm_tcp.h / <linux/atm_tcp.h>
- protocol headers for the removed CLIP, LANE and MPOA stacks:
   atmarp.h, atmclip.h, atmlec.h, atmmpc.h
- atmsvc.h and the SVC / p2mp / local-address ioctls in atmdev.h
   (ATM_{GET,RST,ADD,DEL}ADDR, ATM_{ADD,DEL,GET}LECSADDR,
   ATM_{ADD,DROP}PARTY) left behind by the SVC and address-registry
   removals

None of these are referenced by any remaining in-tree code.
Let's try to delete all this. Chances are nobody cares about
these headers any more. I'm keeping this separate from the
kernel side code changes for ease of revert, in case I am
proven wrong...

Link: https://patch.msgid.link/20260615194416.752559-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove unused ATM PHY operations

The PHY operations are vestiges of the SAR/framer split used by the
removed PCI/SBUS ATM adapters:

- atmdev_ops::phy_put / ::phy_get (register accessors) are never called
by the core and solos-pci only listed them as NULL
- struct atmphy_ops and atm_dev::phy have no users at all - nothing
assigns or dereferences them

Remove all of them. atm_dev::phy_data is kept: solos-pci repurposes it
to stash its per-port channel index.

Link: https://patch.msgid.link/20260615194416.752559-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove the unused pre_send and send_bh device operations

atmdev_ops::pre_send (a TX pre-processing hook) and ::send_bh (a
bottom-half capable send variant) have no implementation behind them:
no remaining ATM driver sets either, so vcc_sendmsg() always skipped
pre_send and the raw AAL0/AAL5 paths always fell back to ->send().
The drivers that used these hooks were removed with the legacy ATM
adapters.

Drop both operations and the dead branches that tested for them.

Link: https://patch.msgid.link/20260615194416.752559-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove the unused change_qos device operation

atmdev_ops::change_qos() was the hook for renegotiating the traffic
parameters of an already-connected VCC, driven from SO_ATMQOS on a
connected socket (and previously from the SVC as_modify path, now gone).
None of the ATM drivers left in tree implement it - solos-pci only listed
change_qos = NULL - so atm_change_qos() always returned -EOPNOTSUPP.

Drop the operation and return -EOPNOTSUPP directly.

Link: https://patch.msgid.link/20260615194416.752559-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove SVC socket support and the signaling daemon interface

ATM switched virtual circuits (SVCs) are set up and torn down by a
user-space signaling daemon (atmsigd) which the kernel talks to over
a dedicated "sigd" socket: the kernel marshals Q.2931-style requests
(as_connect, as_listen, as_accept, as_close, ...) to the daemon and
applies the results to PF_ATMSVC sockets. This is the machinery behind
classical SVC use and was the foundation for LANE / MPOA, all of which
have been removed.

DSL deployments do not use any of this. PPPoATM and BR2684 run over
permanent virtual circuits (PF_ATMPVC) with a statically configured
VPI/VCI; no atmsigd, no Q.2931. Neither remaining ATM driver
(solos-pci, the USB DSL modems) is reachable through the SVC path.

Remove the SVC socket family and the signaling interface:

- delete net/atm/svc.c, net/atm/signaling.c and signaling.h
- drop atmsvc_init()/atmsvc_exit() and the PF_ATMSVC registration and
module alias
- drop the ATMSIGD_CTRL ioctl (sigd_attach) and the /proc/net/atm/svc
file
- fold the SVC branch out of atm_change_qos(); all sockets are PVCs now

The obsolete ATM_SETSC ioctl stub is left in place (it already just
warns and returns 0), as is the struct atm_vcc SVC bookkeeping shared
with the queueing layer.

Link: https://patch.msgid.link/20260615194416.752559-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove the local ATM (NSAP) address registry

net/atm/addr.c maintained the per-device lists of local NSAP addresses
(dev->local) and ILMI-learned LECS addresses (dev->lecs). These exist
solely to serve SVC signaling: the lists are populated through the
ATM_{ADD,DEL,RST}ADDR / ATM_{ADD,DEL,GET}LECSADDR ioctls used by the
atmsigd / ILMI daemons, and consumed when registering addresses with the
signaling daemon. The LECS list belonged to LAN Emulation, which has
been removed.

With no SVC users in a DSL-only configuration these lists are always
empty, so drop the registry entirely:

- remove the ADDR/LECSADDR/RSTADDR ioctls
- drop the now-always-empty "atmaddress" sysfs attribute
- remove the dev->local / dev->lecs lists, structs and enums
- delete net/atm/addr.c and net/atm/addr.h

The device ESI ("MAC" address) and its ATM_{G,S}ETESI ioctls and
"address" sysfs attribute are retained - the USB DSL modems populate
the ESI.

Link: https://patch.msgid.link/20260615194416.752559-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove dead SONET PHY ioctls

The SONET_* ioctls are SONET/SDH PHY controls that atm_dev_ioctl() and
the compat path only ever forwarded to the driver's ->ioctl() handler.
The PHY drivers that implemented them (the S/UNI library and the framers
on the removed PCI/SBUS adapters) are gone, and neither surviving driver
services them: solos-pci has no ->ioctl, and usbatm handles only
ATM_QUERYLOOP. They now uniformly return an error regardless.

Drop the SONET compat passthrough and the SONET cases in atm_dev_ioctl(),
along with the now-unused linux/sonet.h includes. The SONET_* uAPI
definitions are untouched.

Link: https://patch.msgid.link/20260615194416.752559-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove the unused send_oam / push_oam callbacks

The atmdev_ops::send_oam device operation and the atm_vcc::push_oam
callback were the kernel's interface for raw F4/F5 OAM cell exchange.
Nothing assigns them a non-NULL value and nothing ever invokes them:
the core only ever initialises push_oam to NULL (in vcc_create() and the
AAL init helpers) and the Solos driver only lists send_oam = NULL for
documentation. The drivers that actually drove OAM through these hooks
were removed along with the legacy ATM adapters.

Drop both callbacks and the NULL initialisers.

Link: https://patch.msgid.link/20260615194416.752559-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: remove AAL3/4 transport support

AAL3/4 is an obsolete connection-oriented ATM adaptation layer that has
seen no real use since the SMDS-era hardware it was designed for (90s?).
We are only maintaining ATM support in-tree to keep PPPoATM running,
and PPPoATM runs over AAL5.

Drop the "raw" AAL3/4 transport (atm_init_aal34()) and the ATM_AAL34
cases in the connect and traffic-parameter paths. A vcc_connect() with
qos.aal == ATM_AAL34 now fails with -EPROTOTYPE.

uAPI cleanup is performed later, separately.

Link: https://patch.msgid.link/20260615194416.752559-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

io_uring, audit: don't log IORING_OP_RECV_ZC

IORING_OP_RECV_ZC is a read operation. Audit only tracks file/socket
creation, not subsequent reads. Set audit_skip to align with
audit-userspace uringop_table.h.

Fixes: 11ed914bbf94 ("io_uring/zcrx: add io_recvzc request")
Suggested-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20260616123632.3209545-1-rrobaina@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: get rid of tw_pending for !DEFER task work

The normal task_work path used a tw_pending bit to ensure the callback
was only added once: the mpscq drains incrementally, so a single
tctx_task_work() run can take the queue through empty -> non-empty
several times, and each transition would otherwise re-add the already
pending callback_head. This corrupts the task_work list, and is what
tw_pending protects again.

This can go away, if we stop running the task_work as soon as the queue
empties.

Suggested-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

s390/mm: Complete ptep_get() conversion

Finalize commit c33c794828f2 ("mm: ptep_get() conversion") and
replace direct page table entry dereferencing with the proper
accessors (ptep_get(), pmdp_get(), etc.).

Override the default getter implementations even though they are
currently identical: pud_clear(), p4d_clear(), and pgd_clear()
require corresponding architecture-specific getters, but these
are not yet defined. This avoids a dependency loop.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

RDMA/bnxt_re: Proper rollback if the ioremap fails

bnxt_qplib_alloc_dpi returns success even if ioremap fails.
Add the proper rollback when the ioremap fails and return
-ENOMEM status.

Fixes: 0ac20faf5d83 ("RDMA/bnxt_re: Reorg the bar mapping")
Fixes: 360da60d6c6e ("RDMA/bnxt_re: Enable low latency push")
Link: https://patch.msgid.link/r/20260615224751.232802-11-selvin.xavier@broadcom.com
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/bnxt_re: Add a max slot check for SQ

The variable WQE mode must be validated against
the maximum slots supported by HW. The max supported
value is 64K. Adding a max and min check and fail if user
supplied value is more than the max supported and zero.

Fixes: d8ea645d6984 ("RDMA/bnxt_re: Handle variable WQE support for user applications")
Link: https://patch.msgid.link/r/20260615224751.232802-10-selvin.xavier@broadcom.com
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/bnxt_re: Avoid displaying the kernel pointer

While dumping the info on MR using the rdma tool, we
dump the mr_hwq which is a kernel pointer. There is
no need to expose this value for end user. So avoid
it.

Fixes: 7363eb76b7f3 ("RDMA/bnxt_re: Support driver specific data collection using rdma tool")
Link: https://patch.msgid.link/r/20260615224751.232802-9-selvin.xavier@broadcom.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>