git.ipfire.org Git - thirdparty/kernel/stable.git/log

Merge tag 'gfs2-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

- Partially revert "gfs2: do_xmote fixes" to ignore dlm_lock() errors
   during withdraw; passing on those errors doesn't help

- Change the LM_FLAG_TRY and LM_FLAG_TRY_1CB logic in add_to_queue() to
   check if the holder would actually block

- Move some more dlm specific code from glock.c to lock_dlm.c

- Remove the unused dlm alternate locking mode code

- Add proper locking to make sure that dlm lockspaces are never used
   after being released

- Various other cleanups

* tag 'gfs2-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix unlikely race in gdlm_put_lock
  gfs2: Add proper lockspace locking
  gfs2: Minor run_queue fixes
  gfs2: run_queue cleanup
  gfs2: Simplify do_promote
  gfs2: Get rid of GLF_INVALIDATE_IN_PROGRESS
  gfs2: Fix GLF_INVALIDATE_IN_PROGRESS flag clearing in do_xmote
  gfs2: Remove duplicate check in do_xmote
  gfs2: Fix LM_FLAG_TRY* logic in add_to_queue
  gfs2: Remove DLM_LKF_ALTCW / DLM_LKF_ALTPR code
  gfs2: Further sanitize lock_dlm.c
  gfs2: Do not use atomic operations unnecessarily
  gfs2: Sanitize gfs2_meta_check, gfs2_metatype_check, gfs2_io_error
  gfs2: Turn gfs2_withdraw into a void function
  gfs2: Partially revert "gfs2: do_xmote fixes"
  gfs2: Simplify refcounting in do_xmote
  gfs2: do_xmote cleanup
  gfs2: Remove space before newline
  gfs2: Remove unused sd_withdraw_wait field
  gfs2: Remove unused GIF_FREE_VFS_INODE flag

Remove bcachefs core code

bcachefs was marked 'externally maintained' in 6.17 but the code
remained to make the transition smoother.

It's now a DKMS module, making the in-kernel code stale, so remove
it to avoid any version confusion.

Link: https://lore.kernel.org/linux-bcachefs/yokpt2d2g2lluyomtqrdvmkl3amv3kgnipmenobkpgx537kay7@xgcgjviv3n7x/T/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Unbreak 'make tools/*' for user-space targets

This pattern isn't very documented, and apparently not used much outside
of 'make tools/help', but it has existed for over a decade (since commit
ea01fa9f63ae: "tools: Connect to the kernel build system").

However, it doesn't work very well for most cases, particularly the
useful "tools/all" target, because it overrides the LDFLAGS value with
an empty one.

And once overridden, 'make' will then not honor the tooling makefiles
trying to change it - which then makes any LDFLAGS use in the tooling
directory break, typically causing odd link errors.

Remove that LDFLAGS override, since it seems to be entirely historical.
The core kernel makefiles no longer modify LDFLAGS as part of the build,
and use kernel-specific link flags instead (eg 'KBUILD_LDFLAGS' and
friends).

This allows more of the 'make tools/*' cases to work. I say 'more',
because some of the tooling build rules make various other assumptions
or have other issues, so it's still a bit hit-or-miss. But those issues
tend to show up with the 'make -C tools xyz' pattern too, so now it's no
longer an issue of this particular 'tools/*' build rule being special.

Acked-by: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs async directory updates from Christian Brauner:
"This contains further preparatory changes for the asynchronous directory
  locking scheme:

   - Add lookup_one_positive_killable() which allows overlayfs to
     perform lookup that won't block on a fatal signal

   - Unify the mount idmap handling in struct renamedata as a rename can
     only happen within a single mount

   - Introduce kern_path_parent() for audit which sets the path to the
     parent and returns a dentry for the target without holding any
     locks on return

   - Rename kern_path_locked() as it is only used to prepare for the
     removal of an object from the filesystem:

kern_path_locked()    => start_removing_path()
kern_path_create()    => start_creating_path()
user_path_create()    => start_creating_user_path()
user_path_locked_at() => start_removing_user_path_at()
done_path_create()    => end_creating_path()
NA                    => end_removing_path()"

* tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  debugfs: rename start_creating() to debugfs_start_creating()
  VFS: rename kern_path_locked() and related functions.
  VFS/audit: introduce kern_path_parent() for audit
  VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
  VFS: discard err2 in filename_create()
  VFS/ovl: add lookup_one_positive_killable()

Merge tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs writeback updates from Christian Brauner:
"This contains work adressing lockups reported by users when a systemd
  unit reading lots of files from a filesystem mounted with the lazytime
  mount option exits.

  With the lazytime mount option enabled we can be switching many dirty
  inodes on cgroup exit to the parent cgroup. The numbers observed in
  practice when systemd slice of a large cron job exits can easily reach
  hundreds of thousands or millions.

  The logic in inode_do_switch_wbs() which sorts the inode into
  appropriate place in b_dirty list of the target wb however has linear
  complexity in the number of dirty inodes thus overall time complexity
  of switching all the inodes is quadratic leading to workers being
  pegged for hours consuming 100% of the CPU and switching inodes to the
  parent wb.

  Simple reproducer of the issue:

      FILES=10000
      # Filesystem mounted with lazytime mount option
      MNT=/mnt/
      echo "Creating files and switching timestamps"
      for (( j = 0; j < 50; j ++ )); do
          mkdir $MNT/dir$j
          for (( i = 0; i < $FILES; i++ )); do
              echo "foo" >$MNT/dir$j/file$i
          done
          touch -a -t 202501010000 $MNT/dir$j/file*
      done
      wait
      echo "Syncing and flushing"
      sync
      echo 3 >/proc/sys/vm/drop_caches

      echo "Reading all files from a cgroup"
      mkdir /sys/fs/cgroup/unified/mycg1 || exit
      echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
      for (( j = 0; j < 50; j ++ )); do
          cat /mnt/dir$j/file* >/dev/null &
      done
      wait
      echo "Switching wbs"
      # Now rmdir the cgroup after the script exits

  This can be solved by:

   - Avoiding contention on the wb->list_lock when switching inodes by
     running a single work item per wb and managing a queue of items
     switching to the wb

   - Allowing rescheduling when switching inodes over to a different
     cgroup to avoid softlockups

   - Maintaining the b_dirty list ordering instead of sorting it"

* tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  writeback: Add tracepoint to track pending inode switches
  writeback: Avoid excessively long inode switching times
  writeback: Avoid softlockup when switching many inodes
  writeback: Avoid contention on wb->list_lock when switching inodes

Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
"This contains a larger set of changes around the generic namespace
  infrastructure of the kernel.

  Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
  ns_common which carries the reference count of the namespace and so
  on.

  We open-coded and cargo-culted so many quirks for each namespace type
  that it just wasn't scalable anymore. So given there's a bunch of new
  changes coming in that area I've started cleaning all of this up.

  The core change is to make it possible to correctly initialize every
  namespace uniformly and derive the correct initialization settings
  from the type of the namespace such as namespace operations, namespace
  type and so on. This leaves the new ns_common_init() function with a
  single parameter which is the specific namespace type which derives
  the correct parameters statically. This also means the compiler will
  yell as soon as someone does something remotely fishy.

  The ns_common_init() addition also allows us to remove ns_alloc_inum()
  and drops any special-casing of the initial network namespace in the
  network namespace initialization code that Linus complained about.

  Another part is reworking the reference counting. The reference
  counting was open-coded and copy-pasted for each namespace type even
  though they all followed the same rules. This also removes all open
  accesses to the reference count and makes it private and only uses a
  very small set of dedicated helpers to manipulate them just like we do
  for e.g., files.

  In addition this generalizes the mount namespace iteration
  infrastructure introduced a few cycles ago. As reminder, the vfs makes
  it possible to iterate sequentially and bidirectionally through all
  mount namespaces on the system or all mount namespaces that the caller
  holds privilege over. This allow userspace to iterate over all mounts
  in all mount namespaces using the listmount() and statmount() system
  call.

  Each mount namespace has a unique identifier for the lifetime of the
  systems that is exposed to userspace. The network namespace also has a
  unique identifier working exactly the same way. This extends the
  concept to all other namespace types.

  The new nstree type makes it possible to lookup namespaces purely by
  their identifier and to walk the namespace list sequentially and
  bidirectionally for all namespace types, allowing userspace to iterate
  through all namespaces. Looking up namespaces in the namespace tree
  works completely locklessly.

  This also means we can move the mount namespace onto the generic
  infrastructure and remove a bunch of code and members from struct
  mnt_namespace itself.

  There's a bunch of stuff coming on top of this in the future but for
  now this uses the generic namespace tree to extend a concept
  introduced first for pidfs a few cycles ago. For a while now we have
  supported pidfs file handles for pidfds. This has proven to be very
  useful.

  This extends the concept to cover namespaces as well. It is possible
  to encode and decode namespace file handles using the common
  name_to_handle_at() and open_by_handle_at() apis.

  As with pidfs file handles, namespace file handles are exhaustive,
  meaning it is not required to actually hold a reference to nsfs in
  able to decode aka open_by_handle_at() a namespace file handle.
  Instead the FD_NSFS_ROOT constant can be passed which will let the
  kernel grab a reference to the root of nsfs internally and thus decode
  the file handle.

  Namespaces file descriptors can already be derived from pidfds which
  means they aren't subject to overmount protection bugs. IOW, it's
  irrelevant if the caller would not have access to an appropriate
  /proc/<pid>/ns/ directory as they could always just derive the
  namespace based on a pidfd already.

  It has the same advantage as pidfds. It's possible to reliably and for
  the lifetime of the system refer to a namespace without pinning any
  resources and to compare them trivially.

  Permission checking is kept simple. If the caller is located in the
  namespace the file handle refers to they are able to open it otherwise
  they must hold privilege over the owning namespace of the relevant
  namespace.

  The namespace file handle layout is exposed as uapi and has a stable
  and extensible format. For now it simply contains the namespace
  identifier, the namespace type, and the inode number. The stable
  format means that userspace may construct its own namespace file
  handles without going through name_to_handle_at() as they are already
  allowed for pidfs and cgroup file handles"

* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
  ns: drop assert
  ns: move ns type into struct ns_common
  nstree: make struct ns_tree private
  ns: add ns_debug()
  ns: simplify ns_common_init() further
  cgroup: add missing ns_common include
  ns: use inode initializer for initial namespaces
  selftests/namespaces: verify initial namespace inode numbers
  ns: rename to __ns_ref
  nsfs: port to ns_ref_*() helpers
  net: port to ns_ref_*() helpers
  uts: port to ns_ref_*() helpers
  ipv4: use check_net()
  net: use check_net()
  net-sysfs: use check_net()
  user: port to ns_ref_*() helpers
  time: port to ns_ref_*() helpers
  pid: port to ns_ref_*() helpers
  ipc: port to ns_ref_*() helpers
  cgroup: port to ns_ref_*() helpers
  ...

Merge tag 'vfs-6.18-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull afs updates from Christian Brauner:
"This contains the change to enable afs to support RENAME_NOREPLACE and
RENAME_EXCHANGE if the server supports it"

* tag 'vfs-6.18-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
afs: Add support for RENAME_NOREPLACE and RENAME_EXCHANGE

Merge tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull copy_process updates from Christian Brauner:
"This contains the changes to enable support for clone3() on nios2
  which apparently is still a thing.

  The more exciting part of this is that it cleans up the inconsistency
  in how the 64-bit flag argument is passed from copy_process() into the
  various other copy_*() helpers"

[ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ]

* tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  nios2: implement architecture-specific portion of sys_clone3
  arch: copy_thread: pass clone_flags as u64
  copy_process: pass clone_flags as u64 across calltree
  copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)

Merge tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs workqueue updates from Christian Brauner:
"This contains various workqueue changes affecting the filesystem
  layer.

  Currently if a user enqueue a work item using schedule_delayed_work()
  the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
  WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies
  to schedule_work() that is using system_wq and queue_work(), that
  makes use again of WORK_CPU_UNBOUND.

  This replaces the use of system_wq and system_unbound_wq. system_wq is
  a per-CPU workqueue which isn't very obvious from the name and
  system_unbound_wq is to be used when locality is not required.

  So this renames system_wq to system_percpu_wq, and system_unbound_wq
  to system_dfl_wq.

  This also adds a new WQ_PERCPU flag to allow the fs subsystem users to
  explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and
  WQ_PERCPU flags coexist for one release cycle to allow callers to
  transition their calls. WQ_UNBOUND will be removed in a next release
  cycle"

* tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: WQ_PERCPU added to alloc_workqueue users
  fs: replace use of system_wq with system_percpu_wq
  fs: replace use of system_unbound_wq with system_dfl_wq

Merge tag 'vfs-6.18-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rust updates from Christian Brauner:
"This contains a few minor vfs rust changes:

   - Add the pid namespace Rust wrappers to the correct MAINTAINERS
     entry

   - Use to_result() in the Rust file error handling code

   - Update imports for fs and pid_namespce Rust wrappers"

* tag 'vfs-6.18-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  rust: file: use to_result for error handling
  pid: add Rust files to MAINTAINERS
  rust: fs: update ARef and AlwaysRefCounted imports from sync::aref
  rust: pid_namespace: update AlwaysRefCounted imports from sync::aref

Merge tag 'vfs-6.18-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pidfs updates from Christian Brauner:
"This just contains a few changes to pid_nr_ns() to make it more robust
  and cleans up or improves a few users that ab- or misuse it currently"

* tag 'vfs-6.18-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  pid: change task_state() to use task_ppid_nr_ns()
  pid: change bacct_add_tsk() to use task_ppid_nr_ns()
  pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers
  pid: Add a judgment for ns null in pid_nr_ns

Merge tag 'vfs-6.18-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iomap updates from Christian Brauner:
"This contains minor fixes and cleanups to the iomap code.

  Nothing really stands out here"

* tag 'vfs-6.18-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: error out on file IO when there is no inline_data buffer
  iomap: trace iomap_zero_iter zeroing activities

Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs inode updates from Christian Brauner:
"This contains a series I originally wrote and that Eric brought over
  the finish line. It moves out the i_crypt_info and i_verity_info
  pointers out of 'struct inode' and into the fs-specific part of the
  inode.

  So now the few filesytems that actually make use of this pay the price
  in their own private inode storage instead of forcing it upon every
  user of struct inode.

  The pointer for the crypt and verity info is simply found by storing
  an offset to its address in struct fsverity_operations and struct
  fscrypt_operations. This shrinks struct inode by 16 bytes.

  I hope to move a lot more out of it in the future so that struct inode
  becomes really just about very core stuff that we need, much like
  struct dentry and struct file, instead of the dumping ground it has
  become over the years.

  On top of this are a various changes associated with the ongoing inode
  lifetime handling rework that multiple people are pushing forward:

   - Stop accessing inode->i_count directly in f2fs and gfs2. They
     simply should use the __iget() and iput() helpers

   - Make the i_state flags an enum

   - Rework the iput() logic

     Currently, if we are the last iput, and we have the I_DIRTY_TIME
     bit set, we will grab a reference on the inode again and then mark
     it dirty and then redo the put. This is to make sure we delay the
     time update for as long as possible

     We can rework this logic to simply dec i_count if it is not 1, and
     if it is do the time update while still holding the i_count
     reference

     Then we can replace the atomic_dec_and_lock with locking the
     ->i_lock and doing atomic_dec_and_test, since we did the
     atomic_add_unless above

   - Add an icount_read() helper and convert everyone that accesses
     inode->i_count directly for this purpose to use the helper

   - Expand dump_inode() to dump more information about an inode helping
     in debugging

   - Add some might_sleep() annotations to iput() and associated
     helpers"

* tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: add might_sleep() annotation to iput() and more
  fs: expand dump_inode()
  inode: fix whitespace issues
  fs: add an icount_read helper
  fs: rework iput logic
  fs: make the i_state flags an enum
  fs: stop accessing ->i_count directly in f2fs and gfs2
  fsverity: check IS_VERITY() in fsverity_cleanup_inode()
  fs: remove inode::i_verity_info
  btrfs: move verity info pointer to fs-specific part of inode
  f2fs: move verity info pointer to fs-specific part of inode
  ext4: move verity info pointer to fs-specific part of inode
  fsverity: add support for info in fs-specific part of inode
  fs: remove inode::i_crypt_info
  ceph: move crypt info pointer to fs-specific part of inode
  ubifs: move crypt info pointer to fs-specific part of inode
  f2fs: move crypt info pointer to fs-specific part of inode
  ext4: move crypt info pointer to fs-specific part of inode
  fscrypt: add support for info in fs-specific part of inode
  fscrypt: replace raw loads of info pointer with helper function

Merge tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs mount updates from Christian Brauner:
"This contains some work around mount api handling:

   - Output the warning message for mnt_too_revealing() triggered during
     fsmount() to the fscontext log. This makes it possible for the
     mount tool to output appropriate warnings on the command line.

     For example, with the newest fsopen()-based mount(8) from
     util-linux, the error messages now look like:

       # mount -t proc proc /tmp
       mount: /tmp: fsmount() failed: VFS: Mount too revealing.
              dmesg(1) may have more information after failed mount system call.

   - Do not consume fscontext log entries when returning -EMSGSIZE

     Userspace generally expects APIs that return -EMSGSIZE to allow for
     them to adjust their buffer size and retry the operation.

     However, the fscontext log would previously clear the message even
     in the -EMSGSIZE case.

     Given that it is very cheap for us to check whether the buffer is
     too small before we remove the message from the ring buffer, let's
     just do that instead.

   - Drop an unused argument from do_remount()"

* tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: fs/namespace.c: remove ms_flags argument from do_remount
  selftests/filesystems: add basic fscontext log tests
  fscontext: do not consume log entries when returning -EMSGSIZE
  vfs: output mount_too_revealing() errors to fscontext
  docs/vfs: Remove mentions to the old mount API helpers
  fscontext: add custom-prefix log helpers
  fs: Remove mount_bdev
  fs: Remove mount_nodev

mount: handle NULL values in mnt_ns_release()

When calling in listmount() mnt_ns_release() may be passed a NULL
pointer. Handle that case gracefully.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...

Fix CC_HAS_ASM_GOTO_OUTPUT on non-x86 architectures

There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.

And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).

This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.

Just use an empty asm statement instead.

Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e2ffa15b9baa ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17")
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Linux 6.17

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM fix from Russell King:
"Just one fix to the module freeing function that was declared __weak
when it should not have been. Thanks to Petr Pavlu for spotting this"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9458/1: module: Ensure the override of module_arch_freeing_init()

Merge tag 'i2c-for-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- various MAINTAINERS updates

- fix an off-by-one error in riic

- fix k1 DT schema to allow validation

- rtl9300: fix faulty merge conflict resolution

* tag 'i2c-for-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rtl9300: Drop unsupported I2C_FUNC_SMBUS_I2C_BLOCK
  MAINTAINERS: add entry for SpacemiT K1 I2C driver
  MAINTAINERS: Add me as maintainer of Synopsys DesignWare I2C driver
  MAINTAINERS: delete email for Tharun Kumar P
  dt-bindings: i2c: spacemit: extend and validate all properties
  i2c: riic: Allow setting frequencies lower than 50KHz
  MAINTAINERS: Remove myself as Synopsys DesignWare I2C maintainer
  MAINTAINERS: Update email address for Qualcomm's I2C GENI maintainers

Merge tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix buffer overflow in osnoise_cpu_write()

   The allocated buffer to read user space did not add a nul terminating
   byte after copying from user the string. It then reads the string,
   and if user space did not add a nul byte, the read will continue
   beyond the string.

   Add a nul terminating byte after reading the string.

- Fix missing check for lockdown on tracing

   There's a path from kprobe events or uprobe events that can update
   the tracing system even if lockdown on tracing is activate. Add a
   check in the dynamic event path.

- Add a recursion check for the function graph return path

   Now that fprobes can hook to the function graph tracer and call
   different code between the entry and the exit, the exit code may now
   call functions that are not called in entry. This means that the exit
   handler can possibly trigger recursion that is not caught and cause
   the system to crash.

   Add the same recursion checks in the function exit handler as exists
   in the entry handler path.

* tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fgraph: Protect return handler from recursion loop
  tracing: dynevent: Add a missing lockdown check on dynevent
  tracing/osnoise: Fix slab-out-of-bounds in _parse_integer_limit()

Merge tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few final driver specific fixes that have been sitting in -next for
  a bit.

  The OMAP issue is likely to come up very infrequently since mixed
  configuration SPI buses are rare and the Cadence issue is specific to
  SoCFPGA systems"

* tag 'spi-fix-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: omap2-mcspi: drive SPI_CLK on transfer_setup()
  spi: cadence-qspi: defer runtime support on socfpga if reset bit is enabled

Merge tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues
  or aren't considered necessary for -stable kernels. 6 of these fixes
  are for MM.

  All singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines
  mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
  mailmap: add entry for Bence Csókás
  fs/proc/task_mmu: check p->vec_buf for NULL
  kmsan: fix out-of-bounds access to shadow memory
  mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
  mm/hugetlb: fix folio is still mapped when deleted

i2c: rtl9300: Drop unsupported I2C_FUNC_SMBUS_I2C_BLOCK

While applying the patch for commit ede965fd555a ("i2c: rtl9300: remove
broken SMBus Quick operation support"), a conflict was incorrectly solved
by adding the I2C_FUNC_SMBUS_I2C_BLOCK feature flag. But the code to handle
I2C_SMBUS_I2C_BLOCK_DATA requests will be added by a separate commit.

Fixes: ede965fd555a ("i2c: rtl9300: remove broken SMBus Quick operation support")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

MAINTAINERS: add entry for SpacemiT K1 I2C driver

Add a MAINTAINERS entry for the SpacemiT K1 I2C driver and its DT binding.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

MAINTAINERS: Add me as maintainer of Synopsys DesignWare I2C driver

I volunteered as maintainer of the DesignWare I2C driver, so update my
entry from reviewer to maintainer.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

MAINTAINERS: delete email for Tharun Kumar P

The email address bounced. I couldn't find a newer one in recent git history,
so delete this email entry.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Merge tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rtla tool fixes from Steven Rostedt:

- Fix a buffer overflow in actions_parse()

   The "trigger_c" variable did not account for the nul byte when
   determining its size

- Fix a compare that had the values reversed

   actions_destroy() is supposed to reallocate when len is greater than
   the current size, but the compare was testing if size is greater than
   the new length

* tag 'trace-tools-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla/actions: Fix condition for buffer reallocation
  rtla: Fix buffer overflow in actions_parse

tracing: fgraph: Protect return handler from recursion loop

function_graph_enter_regs() prevents itself from recursion by
ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(),
which is called at the exit, does not prevent such recursion.
Therefore, while it can prevent recursive calls from
fgraph_ops::entryfunc(), it is not able to prevent recursive calls
to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop.
This can lead an unexpected recursion bug reported by Menglong.

is_endbr() is called in __ftrace_return_to_handler -> fprobe_return
-> kprobe_multi_link_exit_handler -> is_endbr.

To fix this issue, acquire ftrace_test_recursion_trylock() in the
__ftrace_return_to_handler() after unwind the shadow stack to mark
this section must prevent recursive call of fgraph inside user-defined
fgraph_ops::retfunc().

This is essentially a fix to commit 4346ba160409 ("fprobe: Rewrite
fprobe on function-graph tracer"), because before that fgraph was
only used from the function graph tracer. Fprobe allowed user to run
any callbacks from fgraph after that commit.

Reported-by: Menglong Dong <menglong8.dong@gmail.com>
Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/175852292275.307379.9040117316112640553.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Menglong Dong <menglong8.dong@gmail.com>
Acked-by: Menglong Dong <menglong8.dong@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rtla/actions: Fix condition for buffer reallocation

The condition to check if the actions buffer needs to be resized was
incorrect. The check `self->size >= self->len` would evaluate to
true on almost every call to `actions_new()`, causing the buffer to
be reallocated unnecessarily each time an action was added.

Fix the condition to `self->len >= self.size`, ensuring
that the buffer is only resized when it is actually full.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com
Fixes: 6ea082b171e00 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

rtla: Fix buffer overflow in actions_parse

Currently, tests 3 and 13-22 in tests/timerlat.t fail with error:

    *** buffer overflow detected ***: terminated
    timeout: the monitored command dumped core

The result of running `sudo make check` is

    tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11)
      Failed tests:  3, 13-22
    Files=3, Tests=34, 140 wallclock secs ( 0.07 usr  0.01 sys + 27.63 cusr
    27.96 csys = 55.67 CPU)
    Result: FAIL

Fix buffer overflow in actions_parse to avoid this error. After this
change, the tests results are

    tests/hwnoise.t ... ok
    tests/osnoise.t ... ok
    tests/timerlat.t .. ok
    All tests successful.
    Files=3, Tests=34, 186 wallclock secs ( 0.06 usr  0.01 sys + 41.10 cusr
    44.38 csys = 85.55 CPU)
    Result: PASS

Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com
Fixes: 6ea082b171e0 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- A race-free implementation of pudp_huge_get_and_clear() (based on the
   x86 code)

- A MAINTAINERS update to my E-mail address

* tag 'riscv-for-linus-v6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  MAINTAINERS: Update Paul Walmsley's E-mail address
  riscv: Use an atomic xchg in pudp_huge_get_and_clear()

Merge tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Fix a CPU topology code regression that caused the mishandling of
  certain boot command line options, and re-enable CONFIG_PTDUMP on i386
  that was mistakenly turned off in the Kconfig"

* tag 'x86-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology: Implement topology_is_core_online() to address SMT regression
  x86/Kconfig: Reenable PTDUMP on i386

Merge tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Fix two dl_server regressions: a race that can end up leaving the
  dl_server stuck, and a dl_server throttling bug causing lag to fair
  tasks"

* tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix dl_server behaviour
  sched/deadline: Fix dl_server getting stuck

Merge tag 'locking-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
"Fix a PI-futexes race, and fix a copy_process() futex cleanup bug"

* tag 'locking-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Use correct exit on failure from futex_hash_allocate_default()
futex: Prevent use-after-free during requeue-PI

Merge tag 'core-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fix from Ingo Molnar:
"Fix a CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y bug on older Clang versions"

* tag 'core-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17

Merge tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix unlink bug

- Fix potential out of bounds access in processing compound requests

* tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix wrong index reference in smb2_compound_op()
smb: client: handle unlink(2) of files open by different clients

Merge tag 'vfs-6.17-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Prevent double unlock in netfs

- Fix a NULL pointer dereference in afs_put_server()

- Fix a reference leak in netfs

* tag 'vfs-6.17-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: fix reference leak
  afs: Fix potential null pointer dereference in afs_put_server
  netfs: Prevent duplicate unlocking

Merge tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fix from Ulf Hansson:

- mediatek: Make sure MT8195 AUDIO power domain isn't left powered-on

* tag 'pmdomain-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: mediatek: set default off flag for MT8195 AUDIO power domain

Merge tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and New HW Supoort

   - amd/pmc: Use 8042 quirk for Stellaris Slim Gen6 AMD

   - dell: Set USTT mode according to BIOS after reboot

   - dell-lis3lv02d: Add Latitude E6530

   - lg-laptop: Fix setting the fan mode"

* tag 'platform-drivers-x86-v6.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()
  platform/x86: dell-lis3lv02d: Add Latitude E6530
  platform/x86/dell: Set USTT mode according to BIOS after reboot
  platform/x86/amd/pmc: Add Stellaris Slim Gen6 AMD to spurious 8042 quirks list

Merge tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- allow looking up GPIOs by the secondary firmware node too

- fix memory leak in gpio-regmap

* tag 'gpio-fixes-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: regmap: fix memory leak of gpio_regmap structure
gpiolib: Extend software-node support to support secondary software-nodes

Merge tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:
"A regression fix for this series where an attempt to silence an EOD
  error got messed up a bit, and then a change of git trees for the
  block and io_uring trees.

  Switching the git trees to kernel.org now, as I've just about had it
  trying to battle AI bots that bring the box to its knees, continually.
  At least I don't have to maintain the kernel.org side"

* tag 'block-6.17-20250925' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  MAINTAINERS: update io_uring and block tree git trees
  block: fix EOD return for device with nr_sectors == 0

Merge tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, some fbcon font handling fixes, then amdgpu/xe/i915 with
  a few, and a few misc fixes for other drivers. Seems about right for
  this stage, and I don't know of anything outstanding.

  fbcon:
   - fix OOB access in font allocation
   - fix integer overflow in font handling

  amdgpu:
   - Backlight fix
   - DC preblend fix
   - DCN 3.5 fix
   - Cleanup output_tf_change

  xe:
   - Don't expose sysfs attributes not applicable for VFs
   - Fix build with CONFIG_MODULES=n
   - Don't copy pinned kernel bos twice on suspend

  i915:
   - Set O_LARGEFILE in __create_shmem()
   - Guard reg_val against a INVALID_TRANSCODER [ddi]

  ast:
   - sleeps causing cpu stall fix

  panthor:
   - scheduler race condition fix

  gma500:
   - NULL ptr deref in hdmi teardown fix"

* tag 'drm-fixes-2025-09-26' of https://gitlab.freedesktop.org/drm/kernel:
  drm/panthor: Defer scheduler entitiy destruction to queue release
  drm/amd/display: remove output_tf_change flag
  drm/amd/display: Init DCN35 clocks from pre-os HW values
  drm/amd/display: Use mpc.preblend flag to indicate preblend
  drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume
  fbcon: Fix OOB access in font allocation
  drm/i915/ddi: Guard reg_val against a INVALID_TRANSCODER
  drm/i915: set O_LARGEFILE in __create_shmem()
  drm/xe: Don't copy pinned kernel bos twice on suspend
  drm/xe: Fix build with CONFIG_MODULES=n
  drm/xe/vf: Don't expose sysfs attributes not applicable for VFs
  fbcon: fix integer overflow in fbcon_do_set_font
  drm/gma500: Fix null dereference in hdmi teardown
  drm/ast: Use msleep instead of mdelay for edid read

smb: client: fix wrong index reference in smb2_compound_op()

In smb2_compound_op(), the loop that processes each command's response
uses wrong indices when accessing response bufferes.

This incorrect indexing leads to improper handling of command results.
Also, if incorrectly computed index is greather than or equal to
MAX_COMPOUND, it can cause out-of-bounds accesses.

Fixes: 3681c74d342d ("smb: client: handle lack of EA support in smb2_query_path_info()") # 6.14
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

fcntl: trim arguments

Remove superfluous argument from fcntl_{get/set}_rw_hint.
No functional change.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

listmount: don't call path_put() under namespace semaphore

Massage listmount() and make sure we don't call path_put() under the
namespace semaphore. If we put the last reference we're fscked.

Fixes: b4c2bea8ceaa ("add listmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Christian Brauner <brauner@kernel.org>

statmount: don't call path_put() under namespace semaphore

Massage statmount() and make sure we don't call path_put() under the
namespace semaphore. If we put the last reference we're fscked.

Fixes: 46eae99ef733 ("add statmount(2) syscall")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Christian Brauner <brauner@kernel.org>

netfs: fix reference leak

Commit 20d72b00ca81 ("netfs: Fix the request's work item to not
require a ref") modified netfs_alloc_request() to initialize the
reference counter to 2 instead of 1.  The rationale was that the
requet's "work" would release the second reference after completion
(via netfs_{read,write}_collection_worker()).  That works most of the
time if all goes well.

However, it leaks this additional reference if the request is released
before the I/O operation has been submitted: the error code path only
decrements the reference counter once and the work item will never be
queued because there will never be a completion.

This has caused outages of our whole server cluster today because
tasks were blocked in netfs_wait_for_outstanding_io(), leading to
deadlocks in Ceph (another bug that I will address soon in another
patch).  This was caused by a netfs_pgpriv2_begin_copy_to_cache() call
which failed in fscache_begin_write_operation().  The leaked
netfs_io_request was never completed, leaving `netfs_inode.io_count`
with a positive value forever.

All of this is super-fragile code.  Finding out which code paths will
lead to an eventual completion and which do not is hard to see:

- Some functions like netfs_create_write_req() allocate a request, but
  will never submit any I/O.

- netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read()
  and then netfs_put_request(); however, netfs_unbuffered_read() can
  also fail early before submitting the I/O request, therefore another
  netfs_put_request() call must be added there.

A rule of thumb is that functions that return a `netfs_io_request` do
not submit I/O, and all of their callers must be checked.

For my taste, the whole netfs code needs an overhaul to make reference
counting easier to understand and less fragile & obscure.  But to fix
this bug here and now and produce a patch that is adequate for a
stable backport, I tried a minimal approach that quickly frees the
request object upon early failure.

I decided against adding a second netfs_put_request() each time
because that would cause code duplication which obscures the code
further.  Instead, I added the function netfs_put_failed_request()
which frees such a failed request synchronously under the assumption
that the reference count is exactly 2 (as initially set by
netfs_alloc_request() and never touched), verified by a
WARN_ON_ONCE().  It then deinitializes the request object (without
going through the "cleanup_work" indirection) and frees the allocation
(with RCU protection to protect against concurrent access by
netfs_requests_seq_start()).

All code paths that fail early have been changed to call
netfs_put_failed_request() instead of netfs_put_request().
Additionally, I have added a netfs_put_request() call to
netfs_unbuffered_read() as explained above because the
netfs_put_failed_request() approach does not work there.

Fixes: 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'drm-xe-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Don't expose sysfs attributes not applicable for VFs (Michal)
- Fix build with CONFIG_MODULES=n (Lucas)
- Don't copy pinned kernel bos twice on suspend (Thomas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aNU-FkJEcA3T4aDB@intel.com

Merge tag 'drm-misc-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A CPU stall fix for ast, a NULL pointer dereference fix for gma500, an
OOB and overflow fixes for fbcon, and a race condition fix for panthor.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250925-smilodon-of-luxurious-genius-4ebee7@penduick

Merge tag 'drm-intel-fixes-2025-09-25' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Set O_LARGEFILE in __create_shmem() (Taotao Chen)
- Guard reg_val against a INVALID_TRANSCODER [ddi] (Suraj Kandpal)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://lore.kernel.org/r/aNTxWfhsMkFZ3Q-a@linux

Merge tag 'amd-drm-fixes-6.17-2025-09-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.17-2025-09-24:

amdgpu:
- Backlight fix
- DC preblend fix
- DCN 3.5 fix
- Cleanup output_tf_change

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250924200632.531102-1-alexander.deucher@amd.com

include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines

commit c519c3c0a113 ("mm/kasan: avoid lazy MMU mode hazards") introduced
the use of arch_enter_lazy_mmu_mode(), which results in the compiler
complaining about "statement has no effect", when
__HAVE_ARCH_LAZY_MMU_MODE is not defined in include/linux/pgtable.h

The exact warning/error is:

In file included from ./include/linux/kasan.h:37,
                 from mm/kasan/shadow.c:14:
mm/kasan/shadow.c: In function kasan_populate_vmalloc_pte:
./include/linux/pgtable.h:247:41: error: statement with no effect [-Werror=unused-value]
  247 | #define arch_enter_lazy_mmu_mode()      (LAZY_MMU_DEFAULT)
      |                                         ^
mm/kasan/shadow.c:322:9: note: in expansion of macro arch_enter_lazy_mmu_mode>   322 |         arch_enter_lazy_mmu_mode();
     |         ^~~~~~~~~~~~~~~~~~~~~~~~

switching these "functions" to static inlines fixes this up.

Fixes: c519c3c0a113 ("mm/kasan: avoid lazy MMU mode hazards")
Reported-by: Balbir Singh <balbirs@nvidia.com>
Closes: https://lkml.kernel.org/r/20250912235515.367061-1-balbirs@nvidia.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()

The callback return value is ignored in damon_sysfs_damon_call(), which
means that it is not possible to detect invalid user input when writing
commands such as 'commit' to
/sys/kernel/mm/damon/admin/kdamonds/<K>/state. Fix it.

Link: https://lkml.kernel.org/r/20250920132546.5822-1-akinobu.mita@gmail.com
Fixes: f64539dcdb87 ("mm/damon/sysfs: use damon_call() for update_schemes_stats")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: add entry for Bence Csókás

I will be leaving Prolan this week. You can reach me by my personal email
for now.

Link: https://lkml.kernel.org/r/20250915-mailmap-v1-1-9ebdea93c6a7@prolan.hu
Signed-off-by: Bence Csókás <bence98@sch.bme.hu>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/proc/task_mmu: check p->vec_buf for NULL

When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches
pagemap_scan_backout_range(), kernel panics with null-ptr-deref:

[   44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[   44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none)
[   44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80

<snip registers, unreliable trace>

[   44.946828] Call Trace:
[   44.947030]  <TASK>
[   44.949219]  pagemap_scan_pmd_entry+0xec/0xfa0
[   44.952593]  walk_pmd_range.isra.0+0x302/0x910
[   44.954069]  walk_pud_range.isra.0+0x419/0x790
[   44.954427]  walk_p4d_range+0x41e/0x620
[   44.954743]  walk_pgd_range+0x31e/0x630
[   44.955057]  __walk_page_range+0x160/0x670
[   44.956883]  walk_page_range_mm+0x408/0x980
[   44.958677]  walk_page_range+0x66/0x90
[   44.958984]  do_pagemap_scan+0x28d/0x9c0
[   44.961833]  do_pagemap_cmd+0x59/0x80
[   44.962484]  __x64_sys_ioctl+0x18d/0x210
[   44.962804]  do_syscall_64+0x5b/0x290
[   44.963111]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are
allocated and p->vec_buf remains set to NULL.

This breaks an assumption made later in pagemap_scan_backout_range(), that
page_region is always allocated for p->vec_buf_index.

Fix it by explicitly checking p->vec_buf for NULL before dereferencing.

Other sites that might run into same deref-issue are already (directly or
transitively) protected by checking p->vec_buf.

Note:
From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output
is requested and it's only the side effects caller is interested in,
hence it passes check in pagemap_scan_get_args().

This issue was found by syzkaller.

Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Penglei Jiang <superman.xpt@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kmsan: fix out-of-bounds access to shadow memory

Running sha224_kunit on a KMSAN-enabled kernel results in a crash in
kmsan_internal_set_shadow_origin():

    BUG: unable to handle page fault for address: ffffbc3840291000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0
    Oops: 0000 [#1] SMP NOPTI
    CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G                 N  6.17.0-rc3 #10 PREEMPT(voluntary)
    Tainted: [N]=TEST
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
    RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100
    [...]
    Call Trace:
    <TASK>
    __msan_memset+0xee/0x1a0
    sha224_final+0x9e/0x350
    test_hash_buffer_overruns+0x46f/0x5f0
    ? kmsan_get_shadow_origin_ptr+0x46/0xa0
    ? __pfx_test_hash_buffer_overruns+0x10/0x10
    kunit_try_run_case+0x198/0xa00

This occurs when memset() is called on a buffer that is not 4-byte aligned
and extends to the end of a guard page, i.e.  the next page is unmapped.

The bug is that the loop at the end of kmsan_internal_set_shadow_origin()
accesses the wrong shadow memory bytes when the address is not 4-byte
aligned.  Since each 4 bytes are associated with an origin, it rounds the
address and size so that it can access all the origins that contain the
buffer.  However, when it checks the corresponding shadow bytes for a
particular origin, it incorrectly uses the original unrounded shadow
address.  This results in reads from shadow memory beyond the end of the
buffer's shadow memory, which crashes when that memory is not mapped.

To fix this, correctly align the shadow address before accessing the 4
shadow bytes corresponding to each origin.

Link: https://lkml.kernel.org/r/20250911195858.394235-1-ebiggers@kernel.org
Fixes: 2ef3cec44c60 ("kmsan: do not wipe out origin when doing partial unpoisoning")
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count

commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
count") introduced ->pt_share_count dedicated to hugetlb PMD share count
tracking, but omitted fixing copy_hugetlb_page_range(), leaving the
function relying on page_count() for tracking that no longer works.

When lazy page table copy for hugetlb is disabled, that is, revert commit
bcd51a3c679d ("hugetlb: lazy page table copies in fork()") fork()'ing with
hugetlb PMD sharing quickly lockup -

[  239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
[  239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
[  239.446631] Call Trace:
[  239.446633]  <TASK>
[  239.446636]  _raw_spin_lock+0x3f/0x60
[  239.446639]  copy_hugetlb_page_range+0x258/0xb50
[  239.446645]  copy_page_range+0x22b/0x2c0
[  239.446651]  dup_mmap+0x3e2/0x770
[  239.446654]  dup_mm.constprop.0+0x5e/0x230
[  239.446657]  copy_process+0xd17/0x1760
[  239.446660]  kernel_clone+0xc0/0x3e0
[  239.446661]  __do_sys_clone+0x65/0xa0
[  239.446664]  do_syscall_64+0x82/0x930
[  239.446668]  ? count_memcg_events+0xd2/0x190
[  239.446671]  ? syscall_trace_enter+0x14e/0x1f0
[  239.446676]  ? syscall_exit_work+0x118/0x150
[  239.446677]  ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
[  239.446681]  ? clear_bhb_loop+0x30/0x80
[  239.446684]  ? clear_bhb_loop+0x30/0x80
[  239.446686]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

There are two options to resolve the potential latent issue:
  1. warn against PMD sharing in copy_hugetlb_page_range(),
  2. fix it.
This patch opts for the second option.
While at it, simplify the comment, the details are not actually relevant
anymore.

Link: https://lkml.kernel.org/r/20250916004520.1604530-1-jane.chu@oracle.com
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/hugetlb: fix folio is still mapped when deleted

Migration may be raced with fallocating hole.  remove_inode_single_folio
will unmap the folio if the folio is still mapped.  However, it's called
without folio lock.  If the folio is migrated and the mapped pte has been
converted to migration entry, folio_mapped() returns false, and won't
unmap it.  Due to extra refcount held by remove_inode_single_folio,
migration fails, restores migration entry to normal pte, and the folio is
mapped again.  As a result, we triggered BUG in filemap_unaccount_folio.

The log is as follows:
BUG: Bad page cache in process hugetlb  pfn:156c00
page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00
head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0
aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file"
flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f4(hugetlb)
page dumped because: still mapped when deleted
CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
  <TASK>
  dump_stack_lvl+0x4f/0x70
  filemap_unaccount_folio+0xc4/0x1c0
  __filemap_remove_folio+0x38/0x1c0
  filemap_remove_folio+0x41/0xd0
  remove_inode_hugepages+0x142/0x250
  hugetlbfs_fallocate+0x471/0x5a0
  vfs_fallocate+0x149/0x380

Hold folio lock before checking if the folio is mapped to avold race with
migration.

Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com
Fixes: 4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, IPsec and CAN.

  No known regressions at this point.

  Current release - regressions:

   - xfrm: xfrm_alloc_spi shouldn't use 0 as SPI

  Previous releases - regressions:

   - xfrm: fix offloading of cross-family tunnels

   - bluetooth: fix several races leading to UaFs

   - dsa: lantiq_gswip: fix FDB entries creation for the CPU port

   - eth:
       - tun: update napi->skb after XDP process
       - mlx: fix UAF in flow counter release

  Previous releases - always broken:

   - core: forbid FDB status change while nexthop is in a group

   - smc: fix warning in smc_rx_splice() when calling get_page()

   - can: provide missing ndo_change_mtu(), to prevent buffer overflow.

   - eth:
       - i40e: fix VF config validation
       - broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl"

* tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()
  net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port
  net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()
  libie: fix string names for AQ error codes
  net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD
  net/mlx5: HWS, ignore flow level for multi-dest table
  net/mlx5: fs, fix UAF in flow counter release
  selftests: fib_nexthops: Add test cases for FDB status change
  selftests: fib_nexthops: Fix creation of non-FDB nexthops
  nexthop: Forbid FDB status change while nexthop is in a group
  net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS
  bnxt_en: correct offset handling for IPv6 destination address
  ptp: document behavior of PTP_STRICT_FLAGS
  broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl
  broadcom: fix support for PTP_PEROUT_DUTY_CYCLE
  Bluetooth: MGMT: Fix possible UAFs
  Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync
  Bluetooth: hci_event: Fix UAF in hci_conn_tx_dequeue
  Bluetooth: hci_sync: Fix hci_resume_advertising_sync
  Bluetooth: Fix build after header cleanup
  ...

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"virtio,vhost: last minute fixes

  More small fixes. Most notably this fixes crashes and hangs in
  vhost-net"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS, mailmap: Update address for Peter Hilber
  virtio_config: clarify output parameters
  uapi: vduse: fix typo in comment
  vhost: Take a reference on the task in struct vhost_task.
  vhost-net: flush batched before enabling notifications
  Revert "vhost/net: Defer TX queue re-enable until after sendmsg"
  vhost-net: unbreak busy polling
  vhost-scsi: fix argument order in tport allocation error message

platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()

When WMAB is called to set the fan mode, the new mode is read from either
bits 0-1 or bits 4-5 (depending on the value of some other EC register).
Thus when WMAB is called with bits 4-5 zeroed and called again with
bits 0-1 zeroed, the second call undoes the effect of the first call.
This causes writes to /sys/devices/platform/lg-laptop/fan_mode to have
no effect (and causes reads to always report a status of zero).

Fix this by calling WMAB once, with the mode set in bits 0,1 and 4,5.
When the fan mode is returned from WMAB it always has this form, so
there is no need to preserve the other bits. As a bonus, the driver
now supports the "Performance" fan mode seen in the LG-provided Windows
control app, which provides less aggressive CPU throttling but louder
fan noise and shorter battery life.

Also, correct the documentation to reflect that 0 corresponds to the
default mode (what the Windows app calls "Optimal") and 1 corresponds
to the silent mode.

Fixes: dbf0c5a6b1f8 ("platform/x86: Add LG Gram laptop special features driver")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204913#c4
Signed-off-by: Daniel Lee <dany97@live.ca>
Link: https://patch.msgid.link/MN2PR06MB55989CB10E91C8DA00EE868DDC1CA@MN2PR06MB5598.namprd06.prod.outlook.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()

This code calls kfree_rcu(new_node, rcu) and then dereferences "new_node"
and then dereferences it on the next line. Two lines later, we take
a mutex so I don't think this is an RCU safe region. Re-order it to do
the dereferences before queuing up the free.

Fixes: 68fbff68dbea ("octeontx2-pf: Add police action for TC flower")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/aNKCL1jKwK8GRJHh@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/panthor: Defer scheduler entitiy destruction to queue release

Commit de8548813824 ("drm/panthor: Add the scheduler logical block")
handled destruction of a group's queues' drm scheduler entities early
into the group destruction procedure.

However, that races with the group submit ioctl, because by the time
entities are destroyed (through the group destroy ioctl), the submission
procedure might've already obtained a group handle, and therefore the
ability to push jobs into entities. This is met with a DRM error message
within the drm scheduler core as a situation that should never occur.

Fix by deferring drm scheduler entity destruction to queue release time.

Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250919164436.531930-1-adrian.larumbe@collabora.com

Merge branch 'lantiq_gswip-fixes'

Vladimir Oltean says:

====================
lantiq_gswip fixes

This is a small set of fixes which I believe should be backported for
the lantiq_gswip driver. Daniel Golle asked me to submit them here:
https://lore.kernel.org/netdev/aLiDfrXUbw1O5Vdi@pidgin.makrotopia.org/

As mentioned there, a merge conflict with net-next is expected, due to
the movement of the driver to the 'drivers/net/dsa/lantiq' folder there.
Good luck :-/

Patch 2/2 fixes an old regression and is the minimal fix for that, as
discussed here:
https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/

Patch 1/2 was identified by me through static analysis, and I consider
it to be a serious deficiency. It needs a test tag.
====================

Link: https://patch.msgid.link/20250918072142.894692-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port

The blamed commit and others in that patch set started the trend
of reusing existing DSA driver API for a new purpose: calling
ds->ops->port_fdb_add() on the CPU port.

The lantiq_gswip driver was not prepared to handle that, as can be seen
from the many errors that Daniel presents in the logs:

[  174.050000] gswip 1e108000.switch: port 2 failed to add fa:aa:72:f4:8b:1e vid 1 to fdb: -22
[  174.060000] gswip 1e108000.switch lan2: entered promiscuous mode
[  174.070000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 0 to fdb: -22
[  174.090000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 1 to fdb: -22
[  174.090000] gswip 1e108000.switch: port 2 failed to delete fa:aa:72:f4:8b:1e vid 1 from fdb: -2

The errors are because gswip_port_fdb() wants to get a handle to the
bridge that originated these FDB events, to associate it with a FID.
Absolutely honourable purpose, however this only works for user ports.

To get the bridge that generated an FDB entry for the CPU port, one
would need to look at the db.bridge.dev argument. But this was
introduced in commit c26933639b54 ("net: dsa: request drivers to perform
FDB isolation"), first appeared in v5.18, and when the blamed commit was
introduced in v5.14, no such API existed.

So the core DSA feature was introduced way too soon for lantiq_gswip.
Not acting on these host FDB entries and suppressing any errors has no
other negative effect, and practically returns us to not supporting the
host filtering feature at all - peacefully, this time.

Fixes: 10fae4ac89ce ("net: dsa: include bridge addresses which are local in the host fdb list")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Closes: https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-3-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()

A port added to a "single port bridge" operates as standalone, and this
is mutually exclusive to being part of a Linux bridge. In fact,
gswip_port_bridge_join() calls gswip_add_single_port_br() with
add=false, i.e. removes the port from the "single port bridge" to enable
autonomous forwarding.

The blamed commit seems to have incorrectly thought that ds->ops->port_enable()
is called one time per port, during the setup phase of the switch.

However, it is actually called during the ndo_open() implementation of
DSA user ports, which is to say that this sequence of events:

1. ip link set swp0 down
2. ip link add br0 type bridge
3. ip link set swp0 master br0
4. ip link set swp0 up

would cause swp0 to join back the "single port bridge" which step 3 had
just removed it from.

The correct DSA hook for one-time actions per port at switch init time
is ds->ops->port_setup(). This is what seems to match the coder's
intention; also see the comment at the beginning of the file:

* At the initialization the driver allocates one bridge table entry for
~~~~~~~~~~~~~~~~~~~~~
* each switch port which is used when the port is used without an
* explicit bridge.

Fixes: 8206e0ce96b3 ("net: dsa: lantiq: Add VLAN unaware bridge offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250918072142.894692-2-vladimir.oltean@nxp.com
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

sched/deadline: Fix dl_server behaviour

John reported undesirable behaviour with the dl_server since commit:
cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling").

When starving fair tasks on purpose (starting spinning FIFO tasks),
his fair workload, which often goes (briefly) idle, would delay fair
invocations for a second, running one invocation per second was both
unexpected and terribly slow.

The reason this happens is that when dl_se->server_pick_task() returns
NULL, indicating no runnable tasks, it would yield, pushing any later
jobs out a whole period (1 second).

Instead simply stop the server. This should restore behaviour in that
a later wakeup (which restarts the server) will be able to continue
running (subject to the CBS wakeup rules).

Notably, this does not re-introduce the behaviour cccb45d7c4295 set
out to solve, any start/stop cycle is naturally throttled by the timer
period (no active cancel).

Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>

sched/deadline: Fix dl_server getting stuck

John found it was easy to hit lockup warnings when running locktorture
on a 2 CPU VM, which he bisected down to: commit cccb45d7c429
("sched/deadline: Less agressive dl_server handling").

While debugging it seems there is a chance where we end up with the
dl_server dequeued, with dl_se->dl_server_active. This causes
dl_server_start() to return without enqueueing the dl_server, thus it
fails to run when RT tasks starve the cpu.

When this happens, dl_server_timer() catches the
'!dl_se->server_has_tasks(dl_se)' case, which then calls
replenish_dl_entity() and dl_server_stopped() and finally return
HRTIMER_NO_RESTART.

This ends in no new timer and also no enqueue, leaving the dl_server
'dead', allowing starvation.

What should have happened is for the bandwidth timer to start the
zero-laxity timer, which in turn would enqueue the dl_server and cause
dl_se->server_pick_task() to be called -- which will stop the
dl_server if no fair tasks are observed for a whole period.

IOW, it is totally irrelevant if there are fair tasks at the moment of
bandwidth refresh.

This removes all dl_se->server_has_tasks() users, so remove the whole
thing.

Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>

Merge patch series "ns: tweak ns common handling"

Christian Brauner <brauner@kernel.org> says:

This contains three minor tweaks for namespace handling:

* Make struct ns_tree private. There's no need for anything to access
  that directly.

* Drop a debug assert that would trigger in conditions that are benign.

* Move the type of the namespace out of struct proc_ns_operations and
  into struct ns_common. This eliminates a pointer dereference and also
  allows assertions to work when the namespace type is disabled and the
  operations field set to NULL.

* patches from https://lore.kernel.org/20250924-work-namespaces-fixes-v1-0-8fb682c8678e@kernel.org:
  ns: drop assert
  ns: move ns type into struct ns_common
  nstree: make struct ns_tree private

Signed-off-by: Christian Brauner <brauner@kernel.org>

ns: drop assert

Otherwise we warn when e.g., no namespaces are configured but the
initial namespace for is still around.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

ns: move ns type into struct ns_common

It's misplaced in struct proc_ns_operations and ns->ops might be NULL if
the namespace is compiled out but we still want to know the type of the
namespace for the initial namespace struct.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

nstree: make struct ns_tree private

Don't expose it directly. There's no need to do that.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

afs: Add support for RENAME_NOREPLACE and RENAME_EXCHANGE

Add support for RENAME_NOREPLACE and RENAME_EXCHANGE, if the server
supports them.

The default is translated to YFS.Rename_Replace, falling back to
YFS.Rename; RENAME_NOREPLACE is translated to YFS.Rename_NoReplace and
RENAME_EXCHANGE to YFS.Rename_Exchange, both of which fall back to
reporting EINVAL.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/740476.1758718189@warthog.procyon.org.uk
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Dan Carpenter <dan.carpenter@linaro.org>
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

afs: Fix potential null pointer dereference in afs_put_server

afs_put_server() accessed server->debug_id before the NULL check, which
could lead to a null pointer dereference. Move the debug_id assignment,
ensuring we never dereference a NULL server pointer.

Fixes: 2757a4dc1849 ("afs: Fix access after dec in put functions")
Cc: stable@vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

Merge tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

- fprobe: Even if there is a memory allocation failure, try to remove
   the addresses recorded until then from the filter. Previously we just
   skipped it.

- tracing: dynevent: Add a missing lockdown check on dynevent. This
   dynevent is the interface for all probe events. Thus if there is no
   check, any probe events can be added after lock down the tracefs.

* tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: dynevent: Add a missing lockdown check on dynevent
  tracing: fprobe: Fix to remove recorded module addresses from filter

libie: fix string names for AQ error codes

The LIBIE_AQ_STR macro() introduced by commit 5feaa7a07b85 ("libie: add
adminq helper for converting err to str") is used in order to generate
strings for printing human readable error codes. Its definition is missing
the separating underscore ('_') character which makes the resulting strings
difficult to read. Additionally, the string won't match the source code,
preventing search tools from working properly.

Add the missing underscore character, fixing the error string names.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 5feaa7a07b85 ("libie: add adminq helper for converting err to str")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250923205657.846759-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: af_alg - Fix incorrect boolean values in af_alg_ctx

Commit 1b34cbbf4f01 ("crypto: af_alg - Disallow concurrent writes in
af_alg_sendmsg") changed some fields from bool to 1-bit bitfields of
type u32.

However, some assignments to these fields, specifically 'more' and
'merge', assign values greater than 1. These relied on C's implicit
conversion to bool, such that zero becomes false and nonzero becomes
true.

With a 1-bit bitfields of type u32 instead, mod 2 of the value is taken
instead, resulting in 0 being assigned in some cases when 1 was intended.

Fix this by restoring the bool type.

Fixes: 1b34cbbf4f01 ("crypto: af_alg - Disallow concurrent writes in af_alg_sendmsg")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'soc-fixes-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"There are a few minor code fixes for tegra firmware, i.MX firmware
  and the eyeq reset controller, and a MAINTAINERS update as Alyssa
  Rosenzweig moves on to non-kernel projects.

  The other changes are all for devicetree files:

   - Multiple Marvell Armada SoCs need changes to fix PCIe, audio and
     SATA

   - A socfpga board fails to probe the ethernet phy

   - The two temperature sensors on i.MX8MP are swapped

   - Allwinner devicetree files cause build-time warnings

   - Two Rockchip based boards need corrections for headphone detection
     and SPI flash"

* tag 'soc-fixes-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  MAINTAINERS: remove Alyssa Rosenzweig
  firmware: tegra: Do not warn on missing memory-region property
  arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports
  arm64: dts: marvell: cn9132-clearfog: disable eMMC high-speed modes
  arm64: dts: marvell: cn913x-solidrun: fix sata ports status
  ARM: dts: kirkwood: Fix sound DAI cells for OpenRD clients
  arm64: dts: imx8mp: Correct thermal sensor index
  ARM: imx: Kconfig: Adjust select after renamed config option
  firmware: imx: Add stub functions for SCMI CPU API
  firmware: imx: Add stub functions for SCMI LMM API
  firmware: imx: Add stub functions for SCMI MISC API
  riscv: dts: allwinner: rename devterm i2c-gpio node to comply with binding
  arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
  arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6
  ARM: dts: socfpga: sodia: Fix mdio bus probe and PHY address
  reset: eyeq: fix OF node leak
  ARM64: dts: mcbin: fix SATA ports on Macchiatobin
  ARM: dts: armada-370-db: Fix stereo audio input routing on Armada 370
  ARM: dts: allwinner: Minor whitespace cleanup

Merge tag 'pm-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Rafael:
"Fix a locking issue in the cpufreq core introduced recently and caught
by lockdep (Christian Loehle)"

* tag 'pm-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Initialize cpufreq-based invariance before subsys

Merge tag 'for-6.17-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
"One more regression fix for a problem in zoned mode: mounting would
  fail if the number of open and active zones reached a common limit
  that didn't use to be checked"

* tag 'for-6.17-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: don't fail mount needlessly due to too many active zones

Merge tag '6.17-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- free_transport fix for disconnect races

- minor delayed work fix

* tag '6.17-rc7-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
smb: server: use disable_work_sync in transport_rdma.c
smb: server: don't use delayed_work for post_recv_credits_work

tracing: dynevent: Add a missing lockdown check on dynevent

Since dynamic_events interface on tracefs is compatible with
kprobe_events and uprobe_events, it should also check the lockdown
status and reject if it is set.

Link: https://lore.kernel.org/all/175824455687.45175.3734166065458520748.stgit@devnote2/
Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org

tracing: fprobe: Fix to remove recorded module addresses from filter

Even if there is a memory allocation failure in fprobe_addr_list_add(),
there is a partial list of module addresses. So remove the recorded
addresses from filter if exists.
This also removes the redundant ret local variable.

Fixes: a3dc2983ca7b ("tracing: fprobe: Cleanup fprobe hash when module unloading")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Menglong Dong <menglong8.dong@gmail.com>

kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17

clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:

     {
      __label__ local_lbl;
...
unsafe_get_user(uval, uaddr, local_lbl);
...
return 0;
local_lbl:
return -EFAULT;
     }

when two such scopes exist in the same function:

  error: cannot jump from this asm goto statement to one of its possible targets

There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.

That issue prevents using local labels for a cleanup based user access
mechanism.

After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.

But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.

The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:

void bar(void **);
void* baz(void);

int  foo (void) {
    {
    asm goto("jmp %l0"::::l0);
    return 0;
l0:
    return 1;
    }
    void *x __attribute__((cleanup(bar))) = baz();
    {
    asm goto("jmp %l0"::::l1);
    return 42;
l1:
    return 0xff;
    }
}

Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.

That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.

Thanks to Nathan for pointing to the relevant clang issue!

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: https://github.com/llvm/llvm-project/commit/f023f5cdb2e6c19026f04a15b5a935c041835d14

futex: Use correct exit on failure from futex_hash_allocate_default()

copy_process() uses the wrong error exit path from futex_hash_allocate_default().
After exiting from futex_hash_allocate_default(), neither tasklist_lock
nor siglock has been acquired. The exit label bad_fork_core_free unlocks
both of these locks which is wrong.

The next exit label, bad_fork_cancel_cgroup, is the correct exit.
sched_cgroup_fork() did not allocate any resources that need to freed.

Use bad_fork_cancel_cgroup on error exit from futex_hash_allocate_default().

Fixes: 7c4f75a21f636 ("futex: Allow automatic allocation of process wide futex hash")
Reported-by: syzbot+80cb3cc5c14fad191a10@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/68cb1cbd.050a0220.2ff435.0599.GAE@google.com

MAINTAINERS: Update Paul Walmsley's E-mail address

My experiment with using corporate Gmail for Linux kernel list
interaction has come to an end. For my MAINTAINERS entries that
use that E-mail address, let's switch those to use the k.org E-mail
forwarding.

Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>

riscv: Use an atomic xchg in pudp_huge_get_and_clear()

Make sure we return the right pud value and not a value that could
have been overwritten in between by a different core.

Fixes: c3cc2a4a3a23 ("riscv: Add support for PUD THP")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250814-dev-alex-thp_pud_xchg-v1-1-b4704dfae206@rivosinc.com
[pjw@kernel.org: use xchg rather than atomic_long_xchg; avoid atomic op for !CONFIG_SMP like x86]
Signed-off-by: Paul Walmsley <pjw@kernel.org>

Merge branch 'mlx5-misc-fixes-2025-09-22'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-09-22

This patchset provides misc bug fixes from the team to the mlx5 Eth
and core drivers.
====================

Link: https://patch.msgid.link/1758525094-816583-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD

Include MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD in the FEC RS stats
handling. This addresses a gap introduced when adding support for
200G/lane link modes.

Fixes: 4e343c11efbb ("net/mlx5e: Support FEC settings for 200G per lane link modes")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: HWS, ignore flow level for multi-dest table

When HWS creates multi-dest FW table and adds rules to
forward to other tables, ignore the flow level enforcement
in FW, because HWS is responsible for table levels.

This fixes the following error:

  mlx5_core 0000:08:00.0: mlx5_cmd_out_err:818:(pid 192306):
     SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
     status bad parameter(0x3), syndrome (0x6ae84c), err(-22)

Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: fs, fix UAF in flow counter release

Fix a kernel trace [1] caused by releasing an HWS action of a local flow
counter in mlx5_cmd_hws_delete_fte(), where the HWS action refcount and
mutex were not initialized and the counter struct could already be freed
when deleting the rule.

Fix it by adding the missing initializations and adding refcount for the
local flow counter struct.

[1] Kernel log:
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  mlx5_fs_put_hws_action.part.0.cold+0x21/0x94 [mlx5_core]
  mlx5_fc_put_hws_action+0x96/0xad [mlx5_core]
  mlx5_fs_destroy_fs_actions+0x8b/0x152 [mlx5_core]
  mlx5_cmd_hws_delete_fte+0x5a/0xa0 [mlx5_core]
  del_hw_fte+0x1ce/0x260 [mlx5_core]
  mlx5_del_flow_rules+0x12d/0x240 [mlx5_core]
  ? ttwu_queue_wakelist+0xf4/0x110
  mlx5_ib_destroy_flow+0x103/0x1b0 [mlx5_ib]
  uverbs_free_flow+0x20/0x50 [ib_uverbs]
  destroy_hw_idr_uobject+0x1b/0x50 [ib_uverbs]
  uverbs_destroy_uobject+0x34/0x1a0 [ib_uverbs]
  uobj_destroy+0x3c/0x80 [ib_uverbs]
  ib_uverbs_run_method+0x23e/0x360 [ib_uverbs]
  ? uverbs_finalize_object+0x60/0x60 [ib_uverbs]
  ib_uverbs_cmd_verbs+0x14f/0x2c0 [ib_uverbs]
  ? do_tty_write+0x1a9/0x270
  ? file_tty_write.constprop.0+0x98/0xc0
  ? new_sync_write+0xfc/0x190
  ib_uverbs_ioctl+0xd7/0x160 [ib_uverbs]
  __x64_sys_ioctl+0x87/0xc0
  do_syscall_64+0x59/0x90

Fixes: b581f4266928 ("net/mlx5: fs, manage flow counters HWS action sharing by refcount")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758525094-816583-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'nexthop-various-fixes'

Ido Schimmel says:

====================
nexthop: Various fixes

Patch #1 fixes a NPD that was recently reported by syzbot.

Patch #2 fixes an issue in the existing FIB nexthop selftest.

Patch #3 extends the selftest with test cases for the bug that was fixed
in the first patch.
====================

Link: https://patch.msgid.link/20250921150824.149157-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_nexthops: Add test cases for FDB status change

Add the following test cases for both IPv4 and IPv6:

* Can change from FDB nexthop to non-FDB nexthop and vice versa.
* Can change FDB nexthop address while in a group.
* Cannot change from FDB nexthop to non-FDB nexthop and vice versa while
  in a group.

Output without "nexthop: Forbid FDB status change while nexthop is in a
group":

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

IPv6 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
[...]

IPv4 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [FAIL]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [FAIL]
[...]

Tests passed:  36
Tests failed:   4
Tests skipped:  0

Output with "nexthop: Forbid FDB status change while nexthop is in a
group":

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal"

IPv6 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
[...]

IPv4 fdb groups functional
--------------------------
[...]
TEST: Replace FDB nexthop to non-FDB nexthop                        [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop                        [ OK ]
TEST: Replace FDB nexthop address while in a group                  [ OK ]
TEST: Replace FDB nexthop to non-FDB nexthop while in a group       [ OK ]
TEST: Replace non-FDB nexthop to FDB nexthop while in a group       [ OK ]
[...]

Tests passed:  40
Tests failed:   0
Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: fib_nexthops: Fix creation of non-FDB nexthops

The test creates non-FDB nexthops without a nexthop device which leads
to the expected failure, but for the wrong reason:

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 63 via 2001:db8:91::4
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 64 via 2001:db8:91::5
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 63/64 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
[...]

IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-nRsN3E nexthop add id 14 via 172.16.1.2
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 15 via 172.16.1.3
Error: Device attribute required for non-blackhole and non-fdb nexthops.
COMMAND: ip -netns me-nRsN3E nexthop add id 103 group 14/15 fdb
Error: Invalid nexthop id.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

COMMAND: ip -netns me-nRsN3E nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-nRsN3E nexthop add id 104 group 14/15
Error: Invalid nexthop id.
TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
[...]
COMMAND: ip -netns me-0dlhyd ro add 172.16.0.0/22 nhid 15
Error: Nexthop id does not exist.
TEST: Route add with fdb nexthop                                    [ OK ]

In addition, as can be seen in the above output, a couple of IPv4 test
cases used the non-FDB nexthops (14 and 15) when they intended to use
the FDB nexthops (16 and 17). These test cases only passed because
failure was expected, but they failed for the wrong reason.

Fix the test to create the non-FDB nexthops with a nexthop device and
adjust the IPv4 test cases to use the FDB nexthops instead of the
non-FDB nexthops.

Output after the fix:

# ./fib_nexthops.sh -t "ipv6_fdb_grp_fcnal ipv4_fdb_grp_fcnal" -v

IPv6 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 63 via 2001:db8:91::4 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 64 via 2001:db8:91::5 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 63/64 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]
[...]

IPv4 fdb groups functional
--------------------------
[...]
COMMAND: ip -netns me-lNzfHP nexthop add id 14 via 172.16.1.2 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 15 via 172.16.1.3 dev veth1
COMMAND: ip -netns me-lNzfHP nexthop add id 103 group 14/15 fdb
Error: FDB nexthop group can only have fdb nexthops.
TEST: Fdb Nexthop group with non-fdb nexthops                       [ OK ]

COMMAND: ip -netns me-lNzfHP nexthop add id 16 via 172.16.1.2 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 17 via 172.16.1.3 fdb
COMMAND: ip -netns me-lNzfHP nexthop add id 104 group 16/17
Error: Non FDB nexthop group cannot have fdb nexthops.
TEST: Non-Fdb Nexthop group with fdb nexthops                       [ OK ]
[...]
COMMAND: ip -netns me-lNzfHP ro add 172.16.0.0/22 nhid 16
Error: Route cannot point to a fdb nexthop.
TEST: Route add with fdb nexthop                                    [ OK ]
[...]
Tests passed:  30
Tests failed:   0
Tests skipped:  0

Fixes: 0534c5489c11 ("selftests: net: add fdb nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nexthop: Forbid FDB status change while nexthop is in a group

The kernel forbids the creation of non-FDB nexthop groups with FDB
nexthops:

# ip nexthop add id 1 via 192.0.2.1 fdb
# ip nexthop add id 2 group 1
Error: Non FDB nexthop group cannot have fdb nexthops.

And vice versa:

# ip nexthop add id 3 via 192.0.2.2 dev dummy1
# ip nexthop add id 4 group 3 fdb
Error: FDB nexthop group can only have fdb nexthops.

However, as long as no routes are pointing to a non-FDB nexthop group,
the kernel allows changing the type of a nexthop from FDB to non-FDB and
vice versa:

# ip nexthop add id 5 via 192.0.2.2 dev dummy1
# ip nexthop add id 6 group 5
# ip nexthop replace id 5 via 192.0.2.2 fdb
# echo $?
0

This configuration is invalid and can result in a NPD [1] since FDB
nexthops are not associated with a nexthop device:

# ip route add 198.51.100.1/32 nhid 6
# ping 198.51.100.1

Fix by preventing nexthop FDB status change while the nexthop is in a
group:

# ip nexthop add id 7 via 192.0.2.2 dev dummy1
# ip nexthop add id 8 group 7
# ip nexthop replace id 7 via 192.0.2.2 fdb
Error: Cannot change nexthop FDB status while in a group.

[1]
BUG: kernel NULL pointer dereference, address: 00000000000003c0
[...]
Oops: Oops: 0000 [#1] SMP
CPU: 6 UID: 0 PID: 367 Comm: ping Not tainted 6.17.0-rc6-virtme-gb65678cacc03 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:fib_lookup_good_nhc+0x1e/0x80
[...]
Call Trace:
<TASK>
fib_table_lookup+0x541/0x650
ip_route_output_key_hash_rcu+0x2ea/0x970
ip_route_output_key_hash+0x55/0x80
__ip4_datagram_connect+0x250/0x330
udp_connect+0x2b/0x60
__sys_connect+0x9c/0xd0
__x64_sys_connect+0x18/0x20
do_syscall_64+0xa4/0x2a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops")
Reported-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c9a4d2.050a0220.3c6139.0e63.GAE@google.com/
Tested-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250921150824.149157-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS

Currently, alloc_skb_with_frags() will only fill (MAX_SKB_FRAGS - 1)
slots. I think it should use all MAX_SKB_FRAGS slots, as callers of
alloc_skb_with_frags() will size their allocation of frags based
on MAX_SKB_FRAGS.

This issue was discovered via a test patch that sets 'order' to 0
in alloc_skb_with_frags(), which effectively tests/simulates high
fragmentation. In this case sendmsg() on unix sockets will fail every
time for large allocations. If the PAGE_SIZE is 4K, then data_len will
request 68K or 17 pages, but alloc_skb_with_frags() can only allocate
64K in this case or 16 pages.

Fixes: 09c2c90705bb ("net: allow alloc_skb_with_frags() to allocate bigger packets")
Signed-off-by: Jason Baron <jbaron@akamai.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250922191957.2855612-1-jbaron@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'linux-can-fixes-for-6.17-20250923' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-09-23

The 1st patch is by Chen Yufeng and fixes a potential NULL pointer
deref in the hi311x driver.

Duy Nguyen contributes a patch for the rcar_canfd driver to fix the
controller mode setting.

The next 4 patches are by Vincent Mailhol and populate the
ndo_change_mtu(( callback in the etas_es58x, hi311x, sun4i_can and
mcba_usb driver to prevent buffer overflows.

Stéphane Grosjean's patch for the peak_usb driver fixes a
shift-out-of-bounds issue.

* tag 'linux-can-fixes-for-6.17-20250923' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: peak_usb: fix shift-out-of-bounds issue
  can: mcba_usb: populate ndo_change_mtu() to prevent buffer overflow
  can: sun4i_can: populate ndo_change_mtu() to prevent buffer overflow
  can: hi311x: populate ndo_change_mtu() to prevent buffer overflow
  can: etas_es58x: populate ndo_change_mtu() to prevent buffer overflow
  can: rcar_canfd: Fix controller mode setting
  can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled
====================

Link: https://patch.msgid.link/20250923073427.493034-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'tegra-for-6.17-firmware-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes

firmware: tegra: Fixes for v6.17

This contains a simple patch to avoid a warning in the case where the
optional memory-region property is missing.

* tag 'tegra-for-6.17-firmware-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
firmware: tegra: Do not warn on missing memory-region property

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

Another missing supply and a wrong headphone gpio level.

* tag 'v6.17-rockchip-dtsfixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix the headphone detection on the orangepi 5
arm64: dts: rockchip: Add vcc supply for SPI Flash on NanoPC-T6