git.ipfire.org Git - thirdparty/linux.git/log

Merge tag 'kernel-7.2-rc1.task_exec_state' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull task_exec_state updates from Christian Brauner:
"This introduces a new per-task task_exec_state structure and relocates
  the dumpable mode and the user namespace captured at execve() from
  mm_struct onto it. It stays attached to the task for its full
  lifetime.

  __ptrace_may_access() and several /proc owner and visibility checks
  need to consult two pieces of state for any observable task, including
  zombies that have already gone through exit_mm(): the dumpable mode
  and the user namespace captured at execve(). Both live on mm_struct
  today, which exit_mm() clears from the task long before the task is
  reaped. A reader that races with do_exit() observes task->mm == NULL
  and either fails the check or falls back to init_user_ns - which
  denies legitimate access to non-dumpable zombies that were running in
  a nested user namespace.

  mm_struct loses ->user_ns and the dumpability bits in ->flags.
  MMF_DUMPABLE_BITS is reserved so the MMF_DUMP_FILTER_* layout exposed
  via /proc/<pid>/coredump_filter stays stable. task->user_dumpable and
  its exit_mm() snapshot are removed.

  task_exec_state is the privilege domain established by an execve().
  Within a thread group it is shared via refcount; across thread groups
  each task has its own:

   - CLONE_VM siblings (thread-group members, io_uring workers)
     refcount-share the parent's exec_state.

   - Non-CLONE_VM clones (fork(), vfork() without CLONE_VM) allocate a
     fresh exec_state inheriting the parent's dumpable mode and user_ns.

   - execve() in the child allocates a fresh instance and installs it
     under task_lock + exec_update_lock via task_exec_state_replace().

   - Credential changes (setresuid, capset, ...) and
     prctl(PR_SET_DUMPABLE) update dumpability on the current task's
     exec_state, i.e., on the thread group's shared instance.

  On top of this exec_mmap() no longer tears down the old mm while
  holding exec_update_lock for writing and cred_guard_mutex. Neither
  lock is needed for that: exec_update_lock only exists to make the mm
  swap atomic with the later commit_creds() and all its readers operate
  on the new mm; none looks at the detached old mm.

  The cost was real: __mmput() runs exit_mmap() over the entire old
  address space and can block in exit_aio() waiting for in-flight AIO,
  so execve() of a large process blocked ptrace_attach() and every
  exec_update_lock reader for the duration of the teardown.

  The old mm is now stashed in bprm->old_mm and released from
  setup_new_exec() after both locks are dropped, with a backstop in
  free_bprm() for the error paths"

* tag 'kernel-7.2-rc1.task_exec_state' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  exec: free the old mm outside the exec locks
  exec_state: relocate dumpable information
  ptrace: add ptracer_access_allowed()
  exec: introduce struct task_exec_state
  sched/coredump: introduce enum task_dumpable

Merge tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs casefolding updates from Christian Brauner:
"This exposes the case folding behavior of local filesystems so that
  file servers - nfsd, ksmbd, and user space file servers - can report
  the actual behavior to clients instead of guessing.

  Filesystems report case-insensitive and case-nonpreserving behavior
  via new file_kattr flags in their fileattr_get implementations. fat,
  exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are
  wired up. Local filesystems that are not explicitly handled default to
  the usual POSIX behavior of case-sensitive and case-preserving.

  nfsd uses this to report case folding via NFSv3 PATHCONF and to
  implement the NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
  attributes - both have been part of the NFS protocols for decades to
  support clients on non-POSIX systems - and ksmbd reports it via
  FS_ATTRIBUTE_INFORMATION. Exposing the information through the
  fileattr uapi covers user space file servers.

  The immediate motivation is interoperability: Windows NFS clients
  hard-require servers to report case-insensitivity for Win32
  applications to work correctly, and a client that knows the server is
  case-insensitive can avoid issuing multiple LOOKUP/READDIR requests
  searching for case variants.

  The Linux NFS client already grew support for case-insensitive shares
  years ago in support of the Hammerspace NFS server - negative dentry
  caching must be disabled (a lookup for "FILE.TXT" failing must not
  cache a negative entry when "file.txt" exists) and directory change
  invalidation must drop cached case-folded name variants. Such servers
  often operate in multi-protocol environments where a single file
  service instance caters to both NFS and SMB clients, and nfsd needs to
  report case folding properly to participate as a first-class citizen
  there.

  A follow-up series brings fixes for the initial work: the nfsd
  case-info probe now uses kernel credentials, maps -ESTALE to
  NFS3ERR_STALE, and has its cost capped across READDIR entries; the nfs
  client avoids transiently zeroed case capability bits during the probe
  and skips the pathconf probe when neither field is consumed; the
  FS_CASEFOLD_FL semantics are clarified in the UAPI header; and the
  tools UAPI headers are synced"

* tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  nfsd: Cap case-folding probe cost across READDIR entries
  nfsd: Map -ESTALE from case probe to NFS3ERR_STALE
  nfsd: Use kernel credentials for case-info probe
  fs: Clarify FS_CASEFOLD_FL semantics in UAPI header
  nfs: Skip pathconf probe when neither field is consumed
  nfs: Avoid transient zeroed case capability bits during probe
  tools headers UAPI: Sync case-sensitivity flags from linux/fs.h
  ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
  nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
  nfsd: Report export case-folding via NFSv3 PATHCONF
  isofs: Implement fileattr_get for case sensitivity
  vboxsf: Implement fileattr_get for case sensitivity
  nfs: Implement fileattr_get for case sensitivity
  cifs: Implement fileattr_get for case sensitivity
  xfs: Report case sensitivity in fileattr_get
  hfsplus: Report case sensitivity in fileattr_get
  hfs: Implement fileattr_get for case sensitivity
  ntfs3: Implement fileattr_get for case sensitivity
  exfat: Implement fileattr_get for case sensitivity
  fat: Implement fileattr_get for case sensitivity
  ...

Merge tag 'vfs-7.2-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs directory delegations from Christian Brauner:
"This contains the VFS prerequisites for supporting directory
  delegations in nfsd via CB_NOTIFY callbacks.

  The filelock core gains support for ignoring delegation breaks for
  directory change events together with an inode_lease_ignore_mask()
  helper, and fsnotify gains fsnotify_modify_mark_mask() and a
  FSNOTIFY_EVENT_RENAME data type.

  With this in place nfsd can request delegations on directories and set
  up inotify watches to trigger sending CB_NOTIFY events to clients
  instead of having every directory change break the delegation.

  New tracepoints are added to fsnotify() and to the start of
  break_lease(), and trace_break_lease_block() is passed the currently
  blocking lease instead of the new one.

  A follow-up fix moves the LEASE_BREAK_* flags out of
  #ifdef CONFIG_FILE_LOCKING to fix the build for CONFIG_FILE_LOCKING=n
  configurations"

* tag 'vfs-7.2-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  filelock: move LEASE_BREAK_* flags out of #ifdef CONFIG_FILE_LOCKING
  fsnotify: add FSNOTIFY_EVENT_RENAME data type
  fsnotify: add fsnotify_modify_mark_mask()
  fsnotify: new tracepoint in fsnotify()
  filelock: add an inode_lease_ignore_mask helper
  filelock: add a tracepoint to start of break_lease()
  filelock: add support for ignoring deleg breaks for dir change events
  filelock: pass current blocking lease to trace_break_lease_block() rather than "new_fl"

Merge tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode updates from Christian Brauner:
"This extends the lockless ->i_count handling.

  iput() could already decrement any value greater than one locklessly
  but acquiring a reference always required taking inode->i_lock. Now
  acquiring a reference is lockless as long as the count was already at
  least 1, i.e., only the 0->1 and 1->0 transitions take the lock.

  This avoids the lock for the common cases of nfs calling into the
  inode hash and btrfs using igrab(). Cleanup-wise icount_read_once() is
  added to line up with inode_state_read_once() and the open-coded
  ->i_count loads across the tree are converted, and ihold() is
  relocated and tidied up.

  On top of that some stale lock ordering annotations are retired from
  the inode hash code: iunique() no longer takes the hash lock since the
  inode hash became RCU-searchable and s_inode_list_lock is no longer
  taken under the hash lock either"

* tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: retire stale lock ordering annotations from inode hash
  fs: allow lockless ->i_count bumps as long as it does not transition 0->1
  fs: relocate and tidy up ihold()
  fs: add icount_read_once() and stop open-coding ->i_count loads

Merge tag 'vfs-7.2-rc1.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull exportfs updates from Christian Brauner:
"This cleans up the exportfs support for block-style layouts that
  provide direct block device access: the operations for layout-based
  block device access are split out of struct export_operations into a
  separate header, ->commit_blocks() no longer takes a struct iattr
  argument, and the way support for layout-based block device access is
  detected is reworked.

  nfsd's blocklayout code also stops honoring loca_time_modify. This is
  preparation for supporting export of more than a single device per
  file system"

* tag 'vfs-7.2-rc1.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  exportfs,nfsd: rework checking for layout-based block device access support
  exportfs: don't pass struct iattr to ->commit_blocks
  exportfs: split out the ops for layout-based block device access
  nfsd/blocklayout: always ignore loca_time_modify

Merge tag 'vfs-7.2-rc1.kfunc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull bpf filesystem kfunc fix from Christian Brauner:
"The bpf_set_dentry_xattr() and bpf_remove_dentry_xattr() kfuncs locked
  the inode of the supplied dentry without checking whether the dentry
  is negative.

  Passing a negative dentry (e.g., from security_inode_create) caused a
  NULL pointer dereference. Negative dentries now fail with EINVAL. The
  WARN_ON(!inode) in the bpf xattr permission helpers is dropped as well
  since it could be triggered the same way, amounting to a denial of
  service on systems with panic_on_warn enabled"

* tag 'vfs-7.2-rc1.kfunc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  bpf: fix crash in bpf_[set|remove]_dentry_xattr for negative dentries

bpf: Raise maximum call chain depth to 16 frames

Bump MAX_CALL_FRAMES from 8 to 16 to allow deeper call chains
that Rust-BPF requires and update selftests.

Link: https://lore.kernel.org/r/20260613180755.29671-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

smb: client: Use more common code in SMB2_tcon()

Use an additional label so that a bit of common code can be better reused
at the end of this function implementation.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: Use more common error handling code in smb3_reconfigure()

Use an additional label so that a bit of exception handling can be better
reused at the end of this function implementation.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: Fix error code in smb2_aead_req_alloc()

The "*num_sgs" variable is a u32 so "ERR_PTR(*num_sgs)" doesn't work.
We would have to do something similar to the previous line where it's
cast to int and then long. However, it's simpler to store the return in
an int ret variable.

This bug would eventually result in a crash when dereference the invalid
error pointer.

Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Cc: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: clean up a type issue in cifs_xattr_get()

The cifs_xattr_get() function returns type int, not ssize_t so
declare "rc" as int as well. This has no effect on runtime.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: allow FS_IOC_SETFLAGS to clear compression

The CIFS FS_IOC_SETFLAGS path can set FS_COMPR_FL now, but it cannot
clear it again. This can be reproduced on a share backed by a filesystem
that supports compression, for example btrfs exported by Samba:

[compress_share]
vfs objects = btrfs

$ touch test.bin
$ chattr +c test.bin
$ lsattr test.bin
$ chattr -c test.bin

The final chattr -c fails with EOPNOTSUPP, and leaves the remote object
with the compressed attribute still set, because the client always sends
FSCTL_SET_COMPRESSION with COMPRESSION_FORMAT_DEFAULT. That is correct
for setting FS_COMPR_FL, but clearing FS_COMPR_FL requires sending
COMPRESSION_FORMAT_NONE.

Fix this by passing the requested compression state through the
set_compression operation.  The SMB1 and SMB2 helpers no longer hard-code
COMPRESSION_FORMAT_DEFAULT.

When FS_COMPR_FL is set, send COMPRESSION_FORMAT_DEFAULT.  When it is
cleared, send COMPRESSION_FORMAT_NONE.  If the server accepts the request,
update the cached FILE_ATTRIBUTE_COMPRESSED bit under i_lock so
FS_IOC_GETFLAGS reports the new state.

Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: use writable handle for FS_IOC_SETFLAGS compression

Setting the compressed flag on a CIFS mount can fail with -EACCES:

[compress_share]
vfs objects = btrfs

        $ touch test.bin
        $ chattr +c test.bin
        chattr: Permission denied while setting flags on test.bin

This can be reproduced against a Samba share backed by a filesystem that
supports compression, such as btrfs.

FS_IOC_SETFLAGS is issued on the file handle opened by userspace.  chattr
opens the target read-only before setting FS_COMPR_FL, so the SMB client
currently sends FSCTL_SET_COMPRESSION on a handle that may not have
FILE_WRITE_DATA access.  Samba requires FILE_WRITE_DATA for
FSCTL_SET_COMPRESSION and rejects the request.

Use the current handle only if it already has FILE_WRITE_DATA.  Otherwise
try an existing writable handle for the inode.  If none is available, open
a temporary FILE_WRITE_DATA handle for the compression request.

After FSCTL_SET_COMPRESSION succeeds, update the cached compressed
attribute immediately, matching how smb2_set_sparse() updates
FILE_ATTRIBUTE_SPARSE_FILE after a successful FSCTL_SET_SPARSE.

Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: always return a value for FS_IOC_GETFLAGS

Currently, repeated lsattr calls on a regular CIFS file without the
compressed attribute may show random flags:

$ touch test.bin
$ lsattr test.bin
s-S-ia-A-EjI---------m test.bin
$ lsattr test.bin
------d-cEjI---------m test.bin

The lsattr reproducer depends on the previous contents of its userspace
buffer, so it may not reproduce on every setup. A deterministic
reproducer is to initialize the ioctl argument before FS_IOC_GETFLAGS
on a file without the compressed attribute:

int flags = 0x7fffffff;
ioctl(fd, FS_IOC_GETFLAGS, &flags);

On an affected kernel, flags remains 0x7fffffff. With the fix, it is
set to 0.

This happens because when the cached inode does not have the compressed
bit set, the CIFS fallback path in FS_IOC_GETFLAGS returns success
without calling put_user() to write the zero flags value into the user
buffer. As a result, the caller observes stale contents from its own
buffer.

Fix this by always writing the visible flags value back to the user
buffer before returning success, even when the value is zero.

Fixes: 64a5cfa6db94 ("Allow setting per-file compression via SMB2/3")
Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: update i_blocks after contiguous writes

When a lease allows CIFS to use cached inode attributes, getattr may
return the locally cached attributes instead of revalidating them from
the server. After local writes extend a file, the write path updates the
file size, but i_blocks can remain based on the old allocation size.

For example, while the file is still open after two contiguous writes,
the local block count can remain smaller than the written range:

        after first write:   st_size = 4096,  st_blocks = 7
        after second write:  st_size = 12288, st_blocks = 21
        after close:         st_size = 12288, st_blocks = 24

This can make a fully written file look sparse:

        i_blocks * 512 < i_size

and can cause swap activation to reject a valid write-created swapfile
as having holes. This results in xfstests skipping swap-related tests
on CIFS mounts:

generic/472         [not run] swapfiles are not supported
generic/494         [not run] swapfiles are not supported
generic/497         [not run] swapfiles are not supported
generic/569         [not run] swapfiles are not supported
generic/636         [not run] swapfiles are not supported
generic/643         [not run] swapfiles are not supported

Update the local i_blocks estimate after successful writes, but only
when the write starts at or before the currently known allocated range.
This lets sequential writes grow i_blocks while avoiding treating
write-past-EOF holes as allocated.

Skip the local estimate for files that are already marked sparse, since
their allocation needs to come from the server rather than from a
contiguous-write estimate.

Signed-off-by: Huiwen He <hehuiwen@kylinos.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix races in cifsd thread creation

The cifsd demultiplex thread can run and access tcp_ses before the parent
thread has finished populating tcp_ses, which the worker thread accesses
locklessly.

Also, the kthread_run macro may start the thread before returning the
thread pointer. Because the pointer is part of the structure that the
thread can access, if the kernel is preempted after the thread is spawned,
but before the thread pointer is populated and the thread attempts to exit,
it will sleep, waiting for a SIGKILL signal.

Fix this by moving creation of the thread to after all of tcp_ses'es
fields are populated, and spawning the thread last, using a split
kthread_create/wake_up_process logic.

Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: validate full SID length in security descriptors

parse_sid() only verified that the fixed SID header fit in the
returned security descriptor, but did not verify that the full SID
body described by num_subauth was present.

A malicious server can return a truncated owner or group SID whose
header lies within the descriptor buffer while sub_auth[] extends
past the end of the allocation, leading to an out-of-bounds read
when the client later parses or copies that SID.

Validate the full SID body in parse_sid(), centralize owner/group SID
lookup and bounds checking in sid_from_sd(), and use that validation
in parse_sec_desc(), build_sec_desc(), and copy_sec_desc() before
sub_auth[] is accessed.

Signed-off-by: Qihang <q.h.hack.winter@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: resolve SWN tcon from live registrations

cifs_swn_notify() looks up a witness registration by id under
cifs_swnreg_idr_mutex, drops the mutex, and then uses the registration's
cached tcon pointer.  That pointer is not a lifetime reference, and it is
not a stable representative once cifs_get_swn_reg() lets multiple tcons
for the same net/share name share one registration id.

A same-share second mount can keep the cifs_swn_reg alive after the first
tcon unregisters and is freed.  The registration then still points at the
freed first tcon, so taking tc_lock or incrementing tc_count through
swnreg->tcon only moves the use-after-free earlier.  Taking tc_lock while
holding cifs_swnreg_idr_mutex also violates the documented CIFS lock
order.

Fix this by making the registration store only the stable witness
identity: id, net name, share name, and notify flags.  When a notify
arrives, copy that identity under cifs_swnreg_idr_mutex, drop the mutex,
then find and pin a live witness tcon that currently matches the net/share
pair under the normal cifs_tcp_ses_lock -> tc_lock order.  The notification
path uses that pinned tcon directly and drops the reference when done.

Registration and unregister messages now use the live tcon passed by the
caller instead of a cached tcon in the registration.  The final unregister
send is folded into cifs_swn_unregister() while the registration is still
protected by cifs_swnreg_idr_mutex.  This removes the previous
find/drop/reacquire raw-pointer window.  The release path only removes the
idr entry and frees the stable identity strings.

This preserves the intended one-registration/many-tcon behavior: a
registration id represents a net/share pair, and notify handling acts on a
live representative selected at use time.  It also preserves CLIENT_MOVE
ordering for the representative tcon because the old-IP unregister is sent
before cifs_swn_register() sends the new-IP register.

Fixes: fed979a7e082 ("cifs: Set witness notification handler for messages from userspace daemon")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: remove all cifs files before kill super

Cifs files may be put into fileinfo_put_wq during umounting cifs.
After umount done, cifsFileInfo_put_final is called, which cause
following BUG:

BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[  134.222152]  list_lru_add+0x64/0x1a0
[  134.222399]  ? cifs_put_tcon+0x171/0x340 [cifs]
[  134.222772]  d_lru_add+0x44/0x60
[  134.222997]  dput+0x1fc/0x210
[  134.223213]  cifsFileInfo_put_final+0x11a/0x140 [cifs]
[  134.223576]  process_one_work+0x17c/0x320
[  134.223843]  worker_thread+0x188/0x280
[  134.224084]  ? __pfx_worker_thread+0x10/0x10
[  134.224366]  kthread+0xcc/0x100
[  134.224576]  ? __pfx_kthread+0x10/0x10
[  134.224827]  ret_from_fork+0x30/0x50
[  134.225063]  ? __pfx_kthread+0x10/0x10
[  134.225328]  ret_from_fork_asm+0x1b/0x30

This can be reproduce by following:
unshare -n bash -c "
mkdir -p ${CIFS_MNT}
ip netns attach root 1
ip link add eth0 type veth peer veth0 netns root
ip link set eth0 up
ip -n root link set veth0 up
ip addr add 192.168.0.2/24 dev eth0
ip -n root addr add 192.168.0.1/24 dev veth0
ip route add default via 192.168.0.1 dev eth0
ip netns exec root sysctl net.ipv4.ip_forward=1
ip netns exec root iptables -t nat -A POSTROUTING -s 192.168.0.2 -o
${DEV} -j MASQUERADE
mount -t cifs ${CIFS_PATH} ${CIFS_MNT} -o
vers=3.0,sec=ntlmssp,credentials=${CIFS_CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1
touch ${CIFS_MNT}/a.txt
ip netns exec root iptables -t nat -D POSTROUTING -s 192.168.0.2 -o
${DEV} -j MASQUERADE
"
umount ${CIFS_MNT}

Fixes: 340cea84f691 ("cifs: open files should not hold ref on superblock")
Signed-off-by: Jian Zhang <zhangjian496@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: fix conflicting option validation for new mount API

Apply conflicting option validation consistently across all the new
mount API paths, for both mount and remount.

Some checks were only applied during initial mount validation, while
others were handled during option parsing, causing mount and
remount/reconfigure to behave differently.

Move the conflicting option checks into smb3_handle_conflicting_options()
and call it from the common validation paths, including for
multichannel/max_channels handling.

Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api")
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: invalidate cfid on unlink/rename/rmdir

Today we do not invalidate the cached_dirent or the entire
parent cfid when a dentry in a dir has been removed/moved.

This change invalidates the parent cfid so that we don't serve
directory contents from the cache.

Cc: <stable@vger.kernel.org>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

selftests/landlock: Add tests for invalid use of quiet flag

Make sure that these calls return EINVAL.

Test coverage for security/landlock is 91.6% of 2347 lines according to
LLVM 22.

Assisted-by: GitHub-Copilot:claude-opus-4.8
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://patch.msgid.link/9401d5c6468675863d944d6c26640d97db1a1f31.1781228815.git.m@maowtm.org
[mic: Add test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

selftests/landlock: Add tests for quiet flag with scope

Enhance scoped_audit.connect_to_child and audit_flags.signal to test
interaction with various quiet flag settings.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://patch.msgid.link/032849ca97bd45b2e14f96192b61537ed9405a0d.1781228815.git.m@maowtm.org
[mic: Fix comment formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

selftests/landlock: Add tests for quiet flag with net rules

Tests that:
- Quiet flag works on network rules
- Quiet flag applied to unrelated ports has no effect
- Denied access not in quiet_access_net is still logged

This is not as thorough as the fs tests, but given the shared logic it
should be sufficient. There is also no "optional" access for network
rules.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Assisted-by: GitHub-Copilot:claude-opus-4.7 copilot-review
Link: https://patch.msgid.link/364fbd08081318d64bc23049d3a7721f0a3a3624.1781228815.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>

selftests/landlock: Add tests for quiet flag with fs rules

Test various interactions of the quiet flag with filesystem rules:
- Non-optional access (tested with open and rename).
- Optional access (tested with truncate and ioctl).
- Behaviour around mounts matches with normal Landlock rules.
- Behaviour around disconnected directories matches with normal Landlock
  rules (test expected behaviour of 9a868cdbe66a ("landlock: Fix
  handling of disconnected directories") applied to the collected quiet
  flag).
- Multiple layers works as expected.

Assisted-by: GitHub-Copilot:claude-opus-4.6 copilot-review
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://patch.msgid.link/0f304507dd3ebccc753e1580456bdfc909012357.1781228815.git.m@maowtm.org
[mic: Fix comment formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

selftests/landlock: Replace hard-coded 16 with a constant

The next commit will reuse this number. Make it a shared constant to
future-proof changes.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://patch.msgid.link/eff35caa9b4ac51aa83a88d67c4dd67f4f8b3a4a.1781228815.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>

samples/landlock: Add quiet flag support to sandboxer

Adds ability to set which access bits to quiet via LL_*_QUIET_ACCESS
(FS, NET or SCOPED), and attach quiet flags to individual objects via
LL_*_QUIET for FS and NET.

Assisted-by: GitHub-Copilot:claude-opus-4.8 copilot-reviepickw
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://patch.msgid.link/59b94997565032bc9870044f021214a2ed6df213.1781228815.git.m@maowtm.org
[mic: Fix comment formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Suppress logging when quiet flag is present

The quietness behaviour is as documented in the previous patch.

For optional accesses, since the existing deny_masks can only store
2x4bit of layer index, with no way to represent "no layer", we need to
either expand it or have another field to correctly handle quieting of
those. This commit uses the latter approach - we add another field to
store which optional access (of the 2) are covered by quiet rules in
their respective layers as stored in deny_masks.

Assisted-by: GitHub-Copilot:claude-opus-4.8 copilot-review
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://patch.msgid.link/2510a357a94183683eefc49917dcb2240d67be96.1781228815.git.m@maowtm.org
[mic: Cosmetic fixes]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Add API support and docs for the quiet flags

Adds the UAPI for the quiet flags feature (but not the implementation
yet).

Even though currently LANDLOCK_ADD_RULE_QUIET only affects audit
logging, in the future this can also be used as part of a supervisor
mechanism, where it will also suppress denial notifications on a
per-object basis. Thus the name is deliberately generic, as opposed to
e.g. LANDLOCK_ADD_RULE_LOG_QUIET.

According to pahole, even after adding the struct access_masks
quiet_masks in struct landlock_hierarchy, the u32 log_* bitfield still
only has a size of 2 bytes, so there's minimal wasted space.

Assisted-by: GitHub-Copilot:claude-opus-4.8
Signed-off-by: Tingmao Wang <m@maowtm.org>
[mic: Update date, fix comment formatting]
Link: https://patch.msgid.link/031184748a8e74c0bb02f1fa13d7a3f10918c627.1781228815.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Add a place for flags to layer rules

To avoid unnecessarily increasing the size of struct landlock_layer, we
make the layer level a u8 and use the space to store the flags struct.

struct layer_access_masks is renamed to struct layer_masks, and a new
field is added to track whether a quiet flag rule is seen for each
layer. Through use of bitfields, this does not increase the size of the
struct.

Cc: Justin Suess <utilityemal77@gmail.com>
Assisted-by: GitHub-Copilot:claude-opus-4.8 copilot-review
Signed-off-by: Tingmao Wang <m@maowtm.org>
Co-developed-by: Justin Suess <utilityemal77@gmail.com>
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
Tested-by: Justin Suess <utilityemal77@gmail.com>
Link: https://patch.msgid.link/be3fec3927bc9faaacd4ce0e7f0d1ff5474e2210.1781228815.git.m@maowtm.org
[mic: Fix comment formatting]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

landlock: Add documentation for UDP support

Add example of UDP usage, without detailing the two access right.
Slightly change the example used in code blocks: build a ruleset for a
DNS client, so that it uses both TCP and UDP.

Test coverage for security/landlock is 91.3% of 2245 lines according to
LLVM 22.

Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://patch.msgid.link/20260611162107.49278-7-matthieu@buffet.re
[mic: Fix doc formatting, update audit doc, add test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>

ALSA: timer: Fix racy timeri->timer changes with rwlock

Although we've covered the races around the timer object assignment
and release for timer instances, there are still races at starting or
stopping the timer instance.  They refer to timeri->timer without
lock, hence they can still trigger UAFs.

For addressing it, this patch changes the existing slave_active_lock
spinlock to timeri_lock rwlock.  It's a global rwlock applied as
read-lock when snd_timer_start() & co are called as well as
snd_timeri_timer_get() is called.  In turn, the places where
timeri->timer is assigned or released are covered by the write-lock.

The patch replaces spinlock_irqsave with spinlock in a couple of
spaces because they are now already protected by timeri_lock, too.

Reported-by: Kyle Zeng <kylebot@openai.com>
Link: https://patch.msgid.link/20260614090714.773216-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: core: Fix unintuitive behavior of snd_power_ref_and_wait()

snd_power_ref_and_wait() takes the power refcount and doesn't leave it
no matter whether it returns an error or not. However, the majority
of callers don't expect but just returns without unreferencing in the
caller side upon errors.

For addressing the potential refcount unbalance, rather correct the
behavior of snd_power_ref_wait() to unreference upon returning an
error.

Note that the problem above is likely negligible; the function returns
an error only when the sound card is being shutdown, hence it doesn't
matter about the power refcount any longer at such a state.

Fixes: e94fdbd7b25d ("ALSA: control: Track in-flight control read/write/tlv accesses")
Reported-by: WenTao Liang <vulab@iscas.ac.cn>
Closes: https://lore.kernel.org/20260612022121.14329-1-vulab@iscas.ac.cn
Link: https://patch.msgid.link/20260614090507.772540-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Linux 7.1

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM fixes from Russell King:

- Avoid KASAN instrumentation of half-word IO

- Use a byte load for KASAN shadow stack

- Fix kexec and hibernation with PAN

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9476/1: mm: fix kexec and hibernation with CONFIG_CPU_TTBR0_PAN
  ARM: 9475/1: entry: use byte load for KASAN VMAP stack shadow
  ARM: 9474/1: io: avoid KASAN instrumentation of raw halfword I/O

geneve: Fix off-by-one comparing with GRO_LEGACY_MAX_SIZE

GRO_LEGACY_MAX_SIZE = 65536; total_len being 65536 is too big to fit
into a u16. As can be seen in skb_gro_receive, packets bigger or equal
to gro_max_size (or GRO_LEGACY_MAX_SIZE) are dropped with -E2BIG. Apply
the same boundary to geneve_post_decap_hint to avoid writing 65536 to a
16-bit iph->tot_len field with an overflow.

Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260611192955.604661-3-alice.kernel@fastmail.im
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net/sched: act_csum: don't mangle UDP tunnel GSO packets

Similar to commit add641e7dee3 ("sched: act_csum: don't mangle TCP and
UDP GSO packets"), UDP tunnel GSO packets going through act_csum
shouldn't have their checksum calculated at this point, because it will
be done after segmentation. Setting the checksum in act_csum modifies
skb->ip_summed and prevents inner IP csum offload from kicking in,
resulting in a packet with a bad checksum.

Add UDP tunnel GSO packets to the exceptions, and also add UDP GSO
(SKB_GSO_UDP_L4), as the same logic as in the commit mentioned above
applies to UDP GSO too.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260611192955.604661-2-alice.kernel@fastmail.im
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'for-next/sysregs' into for-next/core

* for-next/sysregs:
arm64/sysreg: Add HDBSS related register information

Merge branch 'for-next/selftests' into for-next/core

* for-next/selftests:
  kselftest/arm64: Add 2025 dpISA coverage to hwcaps
  kselftest/arm64: Add tests for POR_EL0 save/reset/restore
  kselftest/arm64: Move/add POE helpers to test_signals_utils.h
  kselftest/arm64: Add POE as a feature in the signal tests
  selftests/mm: Fix resv_sz when parsing arm64 signal frame

Merge branch 'for-next/perf' into for-next/core

* for-next/perf:
  perf/arm-cmn: Fix DVM node events
  perf: qcom: Unify user-visible "Qualcomm" name
  MAINTAINERS: Update HiSilicon PMU driver maintainer to Yushan Wang

Merge branch 'for-next/mpam' into for-next/core

* for-next/mpam:
arm_mpam: Update architecture version check for MPAM MSC
arm64: cpufeature: Add support for the MPAM v0.1 architecture version

Merge branch 'for-next/mm' into for-next/core

* for-next/mm: (24 commits)
  Revert "arm64: mm: Unmap kernel data/bss entirely from the linear map"
  Revert "arm64: mm: Defer remap of linear alias of data/bss"
  arm64/mm: Rename ptdesc_t
  arm64: mm: Defer remap of linear alias of data/bss
  KVM: arm64: Omit tag sync on stage-2 mappings of the zero page
  arm64: Avoid double evaluation of __ptep_get()
  kasan: Move generic KASAN page tables out of BSS too
  arm64: Rename page table BSS section to .bss..pgtbl
  arm64: mm: Unmap kernel data/bss entirely from the linear map
  arm64: mm: Map the kernel data/bss read-only in the linear map
  mm: Make empty_zero_page[] const
  sh: Drop cache flush of the zero page at boot
  powerpc/code-patching: Avoid r/w mapping of the zero page
  arm64: mm: Don't abuse memblock NOMAP to check for overlaps
  arm64: Move fixmap and kasan page tables to end of kernel image
  arm64: mm: Permit contiguous attribute for preliminary mappings
  arm64: kfence: Avoid NOMAP tricks when mapping the early pool
  arm64: mm: Permit contiguous descriptors to be manipulated
  arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
  arm64: mm: Preserve existing table mappings when mapping DRAM
  ...

Merge branch 'for-next/misc' into for-next/core

* for-next/misc:
  arm64: arch_timer: reuse arch_timer_read_cnt{p,v}ct_el0() helpers
  arm64: patching: replace min_t with min in __text_poke
  ARM64: remove unnecessary architecture-specific <asm/device.h>
  arm64: Implement _THIS_IP_ using inline asm
  arm64: panic from init_IRQ if IRQ handler stacks cannot be allocated
  arm64: smp: Do not mark secondary CPUs possible under nosmp
  arm64/daifflags: Make local_daif_*() helpers __always_inline

Merge branch 'for-next/fpsimd-cleanups' into for-next/core

* for-next/fpsimd-cleanups:
  arm64: fpsimd: Remove <asm/fpsimdmacros.h>
  arm64: fpsimd: Move SME save/restore inline
  arm64: fpsimd: Move sve_flush_live() inline
  arm64: fpsimd: Move SVE save/restore inline
  arm64: fpsimd: Use opaque type for SME state
  arm64: fpsimd: Use opaque type for SVE state
  arm64: fpsimd: Move fpsimd save/restore inline
  arm64: fpsimd: Split FPSR/FPCR from SVE save/restore
  arm64: sysreg: Add FPCR and FPSR
  arm64: fpsimd: Move sve_get_vl() and sme_get_vl() inline
  arm64: fpsimd: Use assembler for baseline SME instructions
  arm64: fpsimd: Use assembler for SVE instructions
  arm64: fpsimd: Remove sve_set_vq() and sme_set_vq()
  arm64: fpsimd: Fold sve_init_regs() into do_sve_acc()
  KVM: arm64: pkvm: Remove struct cpu_sve_state
  KVM: arm64: pkvm: Save host FPMR in host cpu context
  KVM: arm64: Don't override FFR save/restore argument
  KVM: arm64: Don't include <asm/fpsimdmacros.h>
  arm64: fpsimd: Fix type mismatch in sme_{save,load}_state()
  arm64: fpsimd: Fix type mismatch in sve_{save,load}_state()

Merge branch 'for-next/errata' into for-next/core

* for-next/errata:
  arm64: errata: Mitigate TLBI errata on Microsoft Azure Cobalt 100 CPU
  arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU
  arm64: errata: Mitigate TLBI errata on various Arm CPUs
  arm64: cputype: Add C1-Premium definitions
  arm64: cputype: Add C1-Ultra definitions
  arm64: kernel: Disable CNP on HiSilicon HIP09
  arm64: cpufeature: Add WORKAROUND_DISABLE_CNP capability
  arm64: proton-pack: use sysfs_emit in sysfs show functions
  arm64: errata: Reformat table for IDs

Merge branch 'for-next/cpufeature' into for-next/core

* for-next/cpufeature:
arm64: Document SVE constraints on new hwcaps
arm64/cpufeature: Define hwcaps for 2025 dpISA features

netfilter: nf_dup_netdev: add nf_dev_xmit_recursion*() helpers and use them

Update nft_dup and nft_fwd to use the nf_dev_xmit_recursion() helpers.
This patch also disables BH when transmitting the skb to address a
possible migration to different CPU leading to imbalanced decrementation
of the recursion counters.

This is modeled after Florian Westphal's dev_xmit_recursion*() API
available since commit 97cdcf37b57e ("net: place xmit recursion in
softnet data") according to its current state in the tree.

Fixes: 1d47b55b36d2 ("netfilter: nft_fwd_netdev: use recursion counter in neigh egress path")
Fixes: f37ad9127039 ("netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix doc syntax for conn_max sysctl

Fix the docutils error reported by kernel test robot
for the new conn_max sysctl:

Documentation/networking/ipvs-sysctl.rst:76: WARNING: Block quote ends
without a blank line; unexpected unindent. [docutils]
Documentation/networking/ipvs-sysctl.rst:76: ERROR: Unexpected section
title or transition.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606071851.Dc1H7hOO-lkp@intel.com/
Fixes: 4a15044a2b06 ("ipvs: add conn_max sysctl to limit connections")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: bail out if forward path cannot be discovered

If forward path discovery fails for any reason or netdevice is not
registered for this flowtable, then bail out to classic forwarding path
rather than providing incomplete forwarding path.

Update the existing forward path parser functions to report an error
so the flow_offload expressions gives up on setting up the flowtable
entry.

Link: https://sashiko.dev/#/patchset/20260607094954.48892-15-pablo%40netfilter.org?part=14
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: conntrack: check NULL when retrieving ct extension

nf_ct_ext_find() might return NULL if ct extension is not found.

Add also the null checks to:

- nfct_help()
- nfct_help_data()
- nfct_seqadj()
- nfct_nat()

This is defensive, for safety reasons.

nf_ct_ext_find() used to return NULL if the extension is stale for
unconfirmed conntracks if the genid validation fails.

Skip NULL check in nf_nat_inet_fn() given this is valid to be NULL
for non-initialized ct nat extensions.

While at it, fetch ct helper area in nf_ct_expect_related_report() only
once and pass it on to other ancilliary functions. Replace WARN_ON()
by WARN_ON_ONCE() in nf_ct_unlink_expect_report().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: gc and rcu fixes

Another drive-by AI review:

1) tree_gc_worker fails to wrap around after it can't find more pending
   work.  Update data->gc_tree unconditionally.  If its 0, start from
   the first pending tree (which can be 0).

2) tree_gc_worker() iterates the rbtree without lock. This is never
   safe.  Move iteration under the spinlock.  If this takes too long
   (resched needed), save key of next node, drop lock, resched, re-lock,
   then search for the key (node).  In very rare cases this node might
   no longer exist, in that case we can just wait for next gc.

3) use disable_work_sync(), we don't want any restarts.

4) module exit function needs rcu_barrier before we zap the kmem cache.

Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
Closes: https://sashiko.dev/#/patchset/20260525182924.28456-1-fw%40strlen.de
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: add sequence counter to detect tree modifications

There a two issues with traversal:
1. Key lookup (tree search) cannot detect concurrent modifications and may
not find a result in case of parallel modification.

2. Worker does a lockless iteration. This is never safe.

Add a sequence counter and re-do the lookup under lock in case the
tree was modified / seqcount changed.

gc_worker bugs are addressed in the next patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: split count_tree_node rbtree walk into helper

Add find_tree_node() helper that fetches a matching rbtree node.

This is used by followup patch to optionally search the tree again while
preventing concurrent updates via tree lock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: use per nf_conncount_data spinlocks

This change replaces the rb_root with a new container structure.
Instead of an array of locks shared by all nf_conncount_data objects,
each tree gains its own dedicated lock.

Downside: nf_conncount_data increases in size.  Before this change:
struct nf_conncount_data {
        [..]
        /* --- cacheline 33 boundary (2112 bytes) was 16 bytes ago --- */
        unsigned int               gc_tree;              /*  2128     4 */
        /* size: 2136, cachelines: 34, members: 7 */
        /* padding: 4 */

After:
        /* size: 4184, cachelines: 66, members: 7 */
        /* padding: 4 */

On LOCKDEP enabled kernels, this is even worse:
/* size: 18560, cachelines: 290, members: 7 */

(due to lockdep map in each spinlock).

For this reason also switch to kvzalloc.  The zeroing variant is needed
to not start with random (heap memory content) in the ->pending_trees
bitmap.

Followup patch will add and use a sequence counter.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: callers must hold rcu read lock

rcu_derefence_raw() should not have been used here, it concealed this bug.
Its used because struct rb_node lacks __rcu annotated pointers, so plain
rcu_derefence causes sparse warnings.

The major tradeoff is that rcu_derefence_raw() doesn't warn when the caller
isn't in a rcu read section.

Extend the rcu read lock scope accordingly and cause sparse warnings,
those warnings are the lesser evil.

Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Closes: https://sashiko.dev/#/patchset/20260603230610.7900-1-fw%40strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: use DEBUG_NET_WARN_ON_ONCE in packet and control paths

Replace raw warning macros with DEBUG_NET_WARN_ON_ONCE across the
nf_tables API, core engine, and expression evaluations. This prevents
unnecessary system panics when panic_on_warn=1 is enabled in production
systems.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: Replace use of system_unbound_wq with system_dfl_long_wq

This patch continues the effort to refactor workqueue APIs, which has
begun with the changes introducing new workqueues and a new
alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior
of workqueues to become unbound by default so that their workload
placement is optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better
named new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can
be removed in the future.

This specific work is considered long, so enqueue it using
system_dfl_long_wq instead of system_dfl_wq.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ALSA: seq: avoid stale FIFO cells during resize

snd_seq_fifo_resize() still needs to publish the replacement pool
before it waits for FIFO users. A blocking snd_seq_read() holds
f->use_lock while it sleeps, so concurrent senders must be able to
queue to the new pool and wake that reader instead of failing against a
closing old pool.

However, snd_seq_fifo_event_in() duplicates an event before it takes
f->lock, and snd_seq_read() can dequeue a cell and later call
snd_seq_fifo_cell_putback() if copy_to_user() or
snd_seq_expand_var_event() fails. If resize swaps f->pool and detaches
oldhead in between, either path can relink an old-pool cell after the
snapshot. That stale cell sits outside the drained oldhead list, keeps
oldpool->counter elevated, and can leave snd_seq_pool_delete() waiting
for the retired pool to drain.

Keep the existing swap-before-wait ordering in snd_seq_fifo_resize(),
but reject stale cells before any FIFO relink. Revalidate event-in cells
under f->lock and retry them against the published replacement pool, and
free stale putback cells instead of linking them back into the FIFO.

The buggy scenario involves two paths, with each column showing the
order within that path:

resize path:                    relink path:
1. Allocate newpool.             1. Take f->use_lock.
2. Swap f->pool to newpool and   2. Duplicate or dequeue an old-pool
   detach oldhead.                  cell before oldpool closes.
3. Mark oldpool closing and      3. Reach a later relink point after
   wait for FIFO users.             resize published newpool.
4. Free oldhead and delete       4. Relink the old-pool cell after
   oldpool.                         resize detached oldhead.
                                 5. Drop f->use_lock.

The reproducer reports a resize ioctl blocked in the expected pool
teardown path:

signal: resize iteration=98 target_pool=4 exceeded 250ms
        (elapsed=251ms)
diagnostic: resize_tid=651 wchan=snd_seq_pool_done
diagnostic: resize_tid=651 stack=
  snd_seq_pool_done+0x5b/0x140
  snd_seq_pool_delete+0x7a/0x90
  snd_seq_fifo_resize+0x193/0x1e0
  snd_seq_ioctl_set_client_pool+0x214/0x260
  snd_seq_ioctl+0x119/0x540
  __x64_sys_ioctl+0xd1/0x120
  do_syscall_64+0xbb/0x2f0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

A second run with larger pools hit the same target path:

signal: resize iteration=32 target_pool=64 exceeded 250ms
        (elapsed=251ms)
diagnostic: resize_tid=663 wchan=snd_seq_pool_done
diagnostic: resize_tid=663 stack=
  snd_seq_pool_done+0x5b/0x140
  snd_seq_pool_delete+0x7a/0x90
  snd_seq_fifo_resize+0x193/0x1e0
  snd_seq_ioctl_set_client_pool+0x214/0x260
  snd_seq_ioctl+0x119/0x540
  __x64_sys_ioctl+0xd1/0x120
  do_syscall_64+0xbb/0x2f0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 2d7d54002e39 ("ALSA: seq: Fix race during FIFO resize")
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Link: https://patch.msgid.link/20260614004801.3507773-2-zzzccc427@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: seq: oss: Serialize readq reset state with q->lock

snd_seq_oss_readq_clear() resets qlen, head, and tail without
q->lock even though the normal reader and producer paths serialize the
same ring state under that spinlock. A reset can therefore race
snd_seq_oss_readq_free() or snd_seq_oss_readq_put_event() and leave
stale records in the queue, drop freshly queued ones, or report the
wrong readiness after wakeup. KCSAN reports a data race between
snd_seq_oss_readq_clear() and snd_seq_oss_readq_free().

Take q->lock while clearing the ring and resetting input_time. Factor
the enqueue logic into a caller-locked helper so
snd_seq_oss_readq_put_timestamp() updates its suppression state under
the same lock instead of racing the reset path.

The buggy scenario involves two paths, with each column showing the
order within that path:

reset path:                      locked readq updater:
1. snd_seq_oss_reset() or        1. A reader or callback producer
   release reaches                  takes q->lock on the same queue.
   snd_seq_oss_readq_clear().
2. snd_seq_oss_readq_clear()     2. The updater tests or modifies
   resets qlen, head, tail,         qlen, head, and tail.
   and input_time.
3. snd_seq_oss_readq_clear()     3. The updater completes its
   wakes sleepers on                read-modify-write sequence.
   q->midi_sleep.
4. Without q->lock, the reset    4. The resulting ring state drives
   can overlap the locked           later reads and readiness.
   update.

KCSAN reports:

BUG: KCSAN: data-race in snd_seq_oss_readq_clear /
snd_seq_oss_readq_free

write to 0xffff8881069fe608 of 4 bytes by task 120516 on cpu 0:
  snd_seq_oss_readq_free+0x6c/0x80
  snd_seq_oss_read+0xcb/0x250
  odev_read+0x38/0x60
  vfs_read+0xff/0x600
  ksys_read+0xb4/0x140
  __x64_sys_read+0x46/0x60
  do_syscall_64+0xbb/0x2f0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff8881069fe608 of 4 bytes by task 120517 on cpu 1:
  snd_seq_oss_readq_clear+0x1f/0x90
  snd_seq_oss_reset+0xa7/0xf0
  snd_seq_oss_ioctl+0x6f6/0x7e0
  odev_ioctl+0x56/0xc0
  __x64_sys_ioctl+0xd1/0x120
  do_syscall_64+0xbb/0x2f0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x00000001 -> 0x00000000

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Link: https://patch.msgid.link/20260614004801.3507773-1-zzzccc427@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

kcm: use WRITE_ONCE() when changing lower socket callbacks

kcm_attach() replaces a live lower TCP socket's sk_data_ready and
sk_write_space callbacks with KCM handlers, and kcm_unattach() restores
them later. Those callback-pointer updates are still plain stores even
though the same fields can be read and invoked concurrently on other
CPUs.

If another CPU observes an older callback snapshot after the live field
has already been restored, callback execution can run with a mismatched
target and sk_user_data state, leading to stale or misdirected wakeups.

Use WRITE_ONCE() for the callback replacement and restore operations so
these shared callback fields follow the same visibility contract already
established by the earlier 4022 fixes.

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Link: https://patch.msgid.link/20260611053543.2429462-1-runyu.xiao@seu.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

riscv: kvm: Use endian-specific __lelong for NACL shared memory

When compiling with sparse enabled (C=2), bitwise type warnings are
triggered in the RISC-V KVM implementation. This occurs because the
user-space data unboxing macro '__get_user_asm' performs implicit
casting on restricted types without forcing the compiler's compliance.

Additionally, raw 'unsigned long *' pointers are used to access the
SBI NACL shared memory, whereas the RISC-V SBI specification mandates
that these structures must follow little-endian byte ordering.

Fix these by:
1. Adding a '__force' cast to '__get_user_asm()' to safely suppress
   implicit cast warnings during user-space data fetching.
2. Introducing the '__lelong' type macro, which dynamically resolves to
   '__le32' or '__le64' depending on XLEN, and replacing 'unsigned long *'
   with '__lelong *' to enforce proper compile-time endianness checks.

Signed-off-by: Sean Chang <seanwascoding@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260608155252.4292-1-seanwascoding@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"Fixes for the Qualcomm and Google GS101 clk drivers:

   - Skip parking clks on some Qualcomm platforms so that the recovery
     console keeps working

   - Fix Google GS101 resume by using the correct div register"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: dispcc-sc8280xp: Don't park mdp_clk_src at registration time
  clk: samsung: gs101: Fix missing USI7_USI DIV clock in peric0_clk_regs
  clk: qcom: x1e80100-dispcc: Stop disp_cc_mdss_mdp_clk_src from getting parked

Merge branch 'net-hns3-enhance-tc-flow-offload-support'

Jijie Shao says:

====================
net: hns3: enhance tc flow offload support

This patchset enhances the tc flow offload support for hns3 driver:

- Patch 1: Refactor hclge_add_cls_flower() to support more actions
- Patch 2: Improve unused_tuple parameter setting for separate src/dst configuration
- Patch 3: Add support for HCLGE_FD_ACTION_SELECT_QUEUE and HCLGE_FD_ACTION_DROP_PACKET actions
- Patch 4: Add support for FLOW_DISSECTOR_KEY_IP and FLOW_DISSECTOR_KEY_ENC_KEYID dissectors
- Patch 5: Add debugfs support for dumping FD rules
- Patch 6: Move FD code to a separate file (hclge_fd.c) for better code organization
====================

Link: https://patch.msgid.link/20260610060618.834987-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: move fd code to a separate file

The hclge_main.c file has become very large,
so the fd code has been moved to a separate hclge_fd.c file.
This patch only moves the code and does not modify any functionality.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: debugfs support for dumping fd rules

Currently, the tc tool only supports adding and deleting rules from
the driver but does not support querying rules from the driver.

This patch adds a rule dump file in debugfs to check whether the driver's
configuration matches the configuration issued by tc flow.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-6-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: support IP and tunnel VNI dissectors for tc flow

Currently, the driver does not support FLOW_DISSECTOR_KEY_IP and
FLOW_DISSECTOR_KEY_ENC_KEYID. But the hardware supports
ip_tos (FLOW_DISSECTOR_KEY_IP) and
outer_tun_vni (FLOW_DISSECTOR_KEY_ENC_KEYID).

This patch adds support for FLOW_DISSECTOR_KEY_IP and
FLOW_DISSECTOR_KEY_ENC_KEYID.

Additionally, since tc flow cannot effectively support
l2_user_def, l3_user_def, and l4_user_def,
this patch explicitly sets them to not be used.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: support two more actions for tc flow

Currently, the driver supports only one action:HCLGE_FD_ACTION_SELECT_TC.

This patch adds support for HCLGE_FD_ACTION_SELECT_QUEUE and
HCLGE_FD_ACTION_DROP_PACKET.

A rule can have only one action. Therefore, the driver intercepts rules
that have multiple actions or no action.

Note: The driver considers cls_flower->classid as an action:
HCLGE_FD_ACTION_SELECT_TC.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: improve the unused_tuple parameter setting

Currently, when the tc tool is used to set flow table rules, the IP address
and MAC address can be configured separately, for example, src_xx or dst_xx
can be configured separately.

Therefore, the driver needs to check whether the mask is all zero in
keys, such as FLOW_DISSECTOR_KEY_IPV4_ADDRS, FLOW_DISSECTOR_KEY_IPV6_ADDRS,
and FLOW_DISSECTOR_KEY_ETH_ADDRS.
If the mask is all zero, the tuple is not configured.
In this case, the driver adds the tuple to unused_tuple.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: refactor add_cls_flower to prepare for multiple actions

Remove the tc parameter from the add_cls_flower() ops callback and
refactor action parsing to support future extensions for SELECT_QUEUE
and DROP_PACKET actions.

Changes:
* Remove the tc parameter from the add_cls_flower() callback signature.
* Extract TC-based action parsing into hclge_get_tc_flower_action().
* Move the dissector->used_keys check from hclge_parse_cls_flower() to
  hclge_check_cls_flower(), and restrict ETH_ADDRS to
  HCLGE_FD_MODE_DEPTH_2K_WIDTH_400B_STAGE_1 mode since hardware only
  supports MAC matching there.
* Migrate error reporting from dev_err() to netlink extended ACK (extack).

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'dpaa2-switch-fdb-management-refactoring'

Ioana Ciornei says:

====================
dpaa2-switch: FDB management refactoring

The FDB management done by the dpaa2_switch_port_set_fdb() function is
hard to follow even by trained eyes. This series tries to make it easier
to read and understand it by factoring out some code blocks into helper
functions and unifying the join and leave paths in terms of FDB
management.
====================

Link: https://patch.msgid.link/20260610150912.1788482-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-switch: unify the FDB update logic in dpaa2_switch_port_set_fdb()

For both the join and leave paths, the logic goes through the following
steps: determines which FDB should be used on a port after the current
changeupper change, populate the private port structures with the new
FDB and, if necessary, make as not used the old FDB.
Instead of having two distinct paths inside the
dpaa2_switch_port_set_fdb() for linking=true and linking=false, unify
them. This will hopefully help in making this function easier to read.

No behavior changes are expected.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-6-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-switch: move FDB selection for leave path into a helper

Move the FDB selection for when a port leaves bridge into a new helper -
dpaa2_switch_fdb_for_leave(). This will hopefully make the
dpaa2_switch_port_set_fdb() function easier to read and follow. The new
helper only determines the FDB to be used, any updates into the private
port structure still gets done in the set_fdb() function.

No changes in the actual behavior are intended.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-5-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-switch: move FDB selection for join path into a helper

The dpaa2_switch_port_set_fdb() function handles the setup of the FDB
for both changeupper cases: join and leave. Move the code block which
handles the join path into a new helper - dpaa2_switch_fdb_for_join() -
with the hope that the entire function will become easier to read and
extend with other use cases in the future.

This new helper just determines and returns what FDB should be used for
a specific port, the cleanup of the old FDB and the actual setup in the
per port structure remains in the dpaa2_switch_port_set_fdb() function.

No changes in the actual behavior are intended.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-4-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-switch: factor out the FDB in-use check into a helper

The dpaa2_switch_port_set_fdb() function is hard to follow and
open-coding the in-use check into it makes it even harder to read.
Factor out that code block into a new helper -
dpaa2_switch_fdb_in_use_by_others().

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-3-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dpaa2-switch: change dpaa2_switch_port_set_fdb() function prototype

Since there dpaa2_switch_port_set_fdb() never fails and its return value
was never checked, change its prototype to return void.

Also, instead of determining if the DPAA2 port is joining or leaving an
upper based on the value of the 'bridge_dev' parameter, add the
'linking' parameter to explicitly specify the action.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-2-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/tc-testing: Verify IFE can handle truncated inner Ethernet header

Add a tdc test that exercises the act_ife decode path with a malformed
IFE packet whose encapsulated inner Ethernet header is truncated.

The injected frame has a valid outer Ethernet header (ethertype 0xED3E)
and a minimal IFE header (metalen 2, i.e. no metadata TLVs), but the
payload that should hold the original frame is a single byte instead of
a full Ethernet header. Once ife_decode() strips the outer header and
the IFE metadata, fewer than ETH_HLEN bytes are left, which previously
let eth_type_trans() read past the end of the linear data.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260610183814.1648888-3-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ife: require ETH_HLEN to be pullable in ife_decode()

ife decode may return after making only the outer IFE header and
metadata pullable. The caller then passes the decapsulated packet to
eth_type_trans(), which expects the inner Ethernet header to be
accessible from the linear data area.

With a malformed IFE frame, the inner Ethernet header may still be
shorter than ETH_HLEN in the linear area, which can lead to a crash in
the original code.

Fix this by extending the pull check in ife_decode() so that the inner
Ethernet header is also guaranteed to be pullable before returning.

Fixes: ef6980b6becb ("introduce IFE action")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yong Wang <edragain@163.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Link: https://patch.msgid.link/20260610183814.1648888-2-n05ec@lzu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

spi: Fix mismatched DT property access types

The SPI drivers read properties whose bindings use normal uint32 cells.
Using boolean or u16 helpers makes the access look like a different DT
encoding and causes the property checker to flag the call sites.

Use presence checks for unsupported properties and read numeric cell
properties through u32 helpers before assigning to driver fields.

Assisted-by: Codex:gpt-5-5
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260612215017.1884893-1-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: Fix RT5677 "realtek,gpio-config" type

"realtek,gpio-config" is described as six 8-bit GPIO configuration
values, and the RT5677 driver stores and reads those values as bytes.
The binding incorrectly documented the property as a uint32 array.

Document "realtek,gpio-config" as a uint8-array so the generated
schema matches the hardware definition and the existing driver helper.

Assisted-by: Codex:gpt-5-5
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260612214911.1883234-1-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge branch 'intel-wired-lan-driver-updates-2026-06-09-idpf-ice-i40e-iavf-ixgbe-igc-igb-e1000e-e1000'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2026-06-09 (idpf, ice, i40e, iavf, ixgbe, igc, igb, e1000e, e1000)

Marco Crivellari replaces obsolete use of system_unbound_wq to
system_dfl_wq for idpf.

Natalia removes redundant PTP checks on ice.

Corinna Vinschen removes redundant MAC address check on iavf.

Jakub Raczynski replaces open-coded array size calculation to use
ARRAY_SIZE for i40e and iavf drivers.

Piotr removes a couple redundant assignments on ixgbe.

Alex utilizes ktime_get_* helpers for igb and e1000e.

Daiki Harada replaces napi_schedule() to, more appropriate,
napi_schedule_irqoff() call in igb and igc.

Matt Vollrath does the same on e1000e.

Agalakov Daniil skips unnecessary endian conversions on e1000e and
e1000.

Maximilian Pezzullo fixes some typos on igb and igc.
====================

Link: https://patch.msgid.link/20260609213559.178657-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igc: fix typos in comments

Fix spelling errors in code comments:
- igc_diag.c: 'autonegotioation' -> 'autonegotiation'
- igc_main.c: 'revisons' -> 'revisions' (two occurrences)

Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-16-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igb: fix typos in comments

Fix spelling errors in code comments:
- e1000_nvm.c: 'likley' -> 'likely'
- e1000_mac.c: 'auto-negotitation' -> 'auto-negotiation'
- e1000_mbx.h: 'exra' -> 'extra'
- e1000_defines.h: 'Aserted' -> 'Asserted'

Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-15-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e1000e: limit endianness conversion to boundary words

[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().

The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.

Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Co-developed-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Agalakov Daniil <ade@amicon.ru>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-14-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e1000: limit endianness conversion to boundary words

[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().

The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.

Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Co-developed-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Agalakov Daniil <ade@amicon.ru>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-13-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e1000e: Use __napi_schedule_irqoff()

The __napi_schedule_irqoff() macro is intended to bypass saving and
restoring IRQ state when scheduling is requested from an IRQ handler,
where hard interrupts are already disabled. Use this macro in all three
interrupt handlers.

This was tested on a system with an I218-V and MSI interrupts. Because
this is an optimization, I was interested in measuring the impact, so I
added ktime_get() time measurement to e1000_intr_msi and a print of the
last sample in the watchdog task. For each test case I ran a
bi-directional iperf3 to saturate the line. With some help from awk,
here are the statistics.

49 samples each, all units ns
previous: min 678 max 1265 mean 879.429 median 806 stddev 137.188
noirq: min 707 max 1165 mean 811.857 median 790 stddev 89.486

According to this informal comparison, the mean time to handle an
interrupt from start to finish is improved by about 8% under load.

Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Michal Cohen <michalx.cohen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-12-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igc: use napi_schedule_irqoff() instead of napi_schedule()

Replace napi_schedule() with napi_schedule_irqoff()
in the interrupt handler path in igc driver
Tested on Intel Corporation Ethernet Controller I226-V.

Suggested-by: Kohei Enju <kohei@enjuk.jp>
Signed-off-by: Daiki Harada <daiky0325@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-11-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igb: use napi_schedule_irqoff() instead of napi_schedule()

Replace napi_schedule() with napi_schedule_irqoff()
in the interrupt handler path in igb driver

Tested on QEMU with igb NIC emulation (-nic user,model=igb)

Suggested-by: Kohei Enju <kohei@enjuk.jp>
Signed-off-by: Daiki Harada <daiky0325@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-10-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e1000e: use ktime_get_real_ns() in e1000e_systim_reset()

Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
ktime_get_real_ns() in e1000e_systim_reset(). Using the combined helper
avoids the unnecessary intermediate ktime_t variable and makes the
intent clearer.

Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-9-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igb: use ktime_get_real helpers in igb_ptp_reset()

Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
ktime_get_real_ns() and ktime_to_timespec64(ktime_get_real()) with
ktime_get_real_ts64() in igb_ptp_reset(). Using the combined helpers
makes the intent clearer.

Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-8-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: e610: remove redundant assignment

Remove unnecessary code. No functional impact.

Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/intel: Replace manual array size calculation with ARRAY_SIZE

There are still places in the code where manual calculation of array size
exist, but it is good to enforce usage of single macro through the whole
code as it makes code bit more readable.
While at it, beautify condition surrounding it by reversing check and remove
unnecessary casting.

Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

iavf: iavf_virtchnl_completion: drop duplicate ether_addr_equal() test

This is just a simple cleanup fix. Commit 35a2443d0910f ("iavf: Add
waiting for response from PF in set mac") introduced a duplicate
ether_addr_equal() check, so the current code tests the new MAC twice
against the former MAC.

Remove the outer ether_addr_equal() test, remnant of commit c5c922b3e09b
("iavf: fix MAC address setting for VFs when filter is rejected")

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: remove redundant checks from PTP init

Remove unnecessary condition checks in ice_ptp_setup_adapter() and
ice_ptp_init(). They are duplicated in ice_pf_src_tmr_owned().

Change ice_ptp_setup_adapter() to return void.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Natalia Wochtman <natalia.wochtman@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: Replace use of system_unbound_wq with system_dfl_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'octeontx2-af-npc-enhancements'

Ratheesh Kannoth says:

====================
octeontx2-af: npc: Enhancements.

This series extends Marvell octeontx2-af support for CN20K NPC (MCAM
debuggability, allocation policy, default-rule lifetime, optional KPU
profiles from firmware files, X2/X4 MCAM keyword handling in flows and
defaults, and dynamic CN20K NPC private state), adds a devlink mechanism
for multi-value parameters, and moves devlink_nl_param_fill() temporaries
to the heap so stack usage stays reasonable once union devlink_param_value
grows (patch 3).

Patch 1 enforces a single RVU admin-function PCI device in the kernel.
On Octeon series SoCs, hardware resources such as NPC, NIX and related
blocks are global and coordinated by the AF driver; PFs and VFs request
them through AF mailbox messages. Firmware exposes only one AF PCI
function at boot, so two AF driver instances cannot both own that state.
rvu_probe() rejects a second bind with -EBUSY, logs a warning, clears the
probe gate on early allocation failures, and aligns the driver model with
hardware so reviewers and automation can rely on exactly one bound AF.

Patch 2 improves CN20K MCAM visibility in debugfs: mcam_layout marks
enabled entries, dstats reports per-entry hit deltas (baseline updated in
software after each read; hardware counters are not cleared), and mismatch
lists enabled entries without a PF mapping.

Patch 3 allocates the per-configuration-mode union devlink_param_value
buffers and struct devlink_param_gset_ctx used by devlink_nl_param_fill()
with kcalloc()/kzalloc_obj() and funnels failures through a single cleanup
path so the netlink reply path stays safe as the union grows.

Patch 4 (Saeed) introduces DEVLINK_PARAM_TYPE_U64_ARRAY and nested
DEVLINK_ATTR_PARAM_VALUE_DATA attributes so drivers and user space can
exchange bounded u64 arrays; YAML, uapi, and netlink validation are
updated.

Patch 5 adds a runtime devlink parameter srch_order to reorder CN20K
subbank search during MCAM allocation (the param uses the u64 array type
from patch 4).

Patch 6 ties default MCAM entries to NIX LF alloc/free on CN20K, adds
NIX_LF_DONT_FREE_DFT_IDXS for PF teardown paths that must not drop default
NPC indexes while the driver still owns state, and tightens nix_lf_alloc
error propagation.

Patch 7 allows loading a custom KPU profile from /lib/firmware/kpu via
module parameter kpu_profile, with cam2 / ptype_mask wiring and helpers
that share firmware-sourced vs filesystem-sourced profile layouts.

Patch 8 makes default-rule allocation, AF flow install, and PF-side RSS,
defaults, and ethtool flows respect the active CN20K MCAM keyword width
(X2 vs X4), including X4 reference-index masking and -EOPNOTSUPP when a
flow needs X4 keys on an X2-only profile.

Patch 9 replaces file-scope npc_priv and static dstats with allocation
sized from discovered bank/subbank geometry, threads npc_priv_get()
through CN20K NPC paths, and allocates dstats via devm_kzalloc for the
debugfs helper.

Patch 1 is ordered first so later patches assume a single bound AF.
Heap-backed devlink_nl_param_fill() sits immediately before the U64 array
param work so incremental builds stay stack-safe as the union grows; the
CN20K patches keep srch_order ahead of NIX LF coordination, optional KPU
profile load from firmware files, X2/X4 handling, and the npc_priv refactor
that touches the same files heavily.
====================

Link: https://patch.msgid.link/20260609040453.711932-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.

Replace the file-scope static npc_priv with a kcalloc'd struct filled
from hardware bank/subbank geometry at init (num_banks is no longer a
const compile-time constant; drop init_done and use a non-NULL
npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
through the CN20K NPC code paths, extend teardown to kfree the root
struct on failure and in npc_cn20k_deinit, and adjust MCAM section
setup to use the discovered subbank count.

Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
and use the allocated backing store consistently when computing deltas
(including the counter rollover compare).

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-10-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc

Default CN20K NPC rule allocation now keys off the active MCAM keyword
width: use X4 with a bank-masked reference index when the silicon uses
X4 keys, and X2 with the raw index otherwise (replacing the previous
always-X2 / eidx + 1 behaviour).

In the AF flow-install path, flows that need more than 256 key bits
query the NPC profile; if the platform is fixed to X2 entries, fail
with -EOPNOTSUPP instead of requesting X4. Otherwise select X4 for the
MCAM alloc.

On the PF, cache and pass the profile kw_type from npc_get_pfl_info
through otx2_mcam_pfl_info_get(), and use it when allocating MCAM
entries for RSS/defaults and when installing ethtool flows on CN20K,
including masking the reference index for X4 slot layout.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-9-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: npc: Support for custom KPU profile from filesystem

Flashing updated firmware on deployed devices is cumbersome. Provide a
mechanism to load a custom KPU (Key Parse Unit) profile directly from
the filesystem at module load time.

When the rvu_af module is loaded with the kpu_profile parameter, the
specified profile is read from /lib/firmware/kpu and programmed into
the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
used by filesystem-loaded profiles and support ptype/ptype_mask in
npc_config_kpucam when profile->from_fs is set.

Usage:
  1. Copy the KPU profile file to /lib/firmware/kpu.
  2. Build OCTEONTX2_AF as a module.
  3. Load: insmod rvu_af.ko kpu_profile=<profile_name>

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-8-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle

Add NIX_LF_DONT_FREE_DFT_IDXS so the PF can send NIX LF free during hw
reinit or teardown without the AF freeing CN20K default NPC rule indexes
while the driver still owns that state (otx2_init_hw_resources and
otx2_free_hw_resources).

On CN20K, allocate default NPC rules from NIX LF alloc before
nix_interface_init, roll back with npc_cn20k_dft_rules_free on failure,
and free from NIX LF free when the new flag is not set. Tighten
rvu_mbox_handler_nix_lf_alloc error handling: use a single rc, propagate
qmem_alloc and other errors, and set -ENOMEM only when kcalloc fails
(remove the blanket -ENOMEM at the free_mem path).

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-7-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-af: npc: cn20k: add subbank search order control

CN20K NPC MCAM is split into 32 subbanks that are searched in a
predefined order during allocation. Lower-numbered subbanks have
higher priority than higher-numbered ones.

Add a runtime "srch_order" to control the order in which
subbanks are searched during MCAM allocation.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-6-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>