git.ipfire.org Git - thirdparty/linux.git/log

Merge branch 'iommufd_dmabuf' into k.o-iommufd/for-next

Jason Gunthorpe says:

====================
This series is the start of adding full DMABUF support to
iommufd. Currently it is limited to only work with VFIO's DMABUF exporter.
It sits on top of Leon's series to add a DMABUF exporter to VFIO:

   https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com/

The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but
otherwise works the same as it does today for a memfd. The user can select
a slice of the FD to map into the ioas and if the underliyng alignment
requirements are met it will be placed in the iommu_domain.

Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR
memory from VFIO to an iommu_domain controlled by iommufd. This is used
for PCI Peer to Peer support in VMs, and is the last feature that the VFIO
type 1 container has that iommufd couldn't do.

The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime
control and is a use-after-free security problem.

Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there
should be no access to the MMIO it can shoot down the mapping in iommufd
which will unmap it from the iommu_domain. There is no automatic remap,
this is a safety protocol so the kernel doesn't get stuck. Userspace is
expected to know it is doing something that will revoke the dmabuf and
map/unmap it around the activity. Eg when QEMU goes to issue FLR it should
do the map/unmap to iommufd.

Since DMABUF is missing some key general features for this use case it
relies on a "private interconnect" between VFIO and iommufd via the
vfio_pci_dma_buf_iommufd_map() call.

The call confirms the DMABUF has revoke semantics and delivers a phys_addr
for the memory suitable for use with iommu_map().

Medium term there is a desire to expand the supported DMABUFs to include
GPU drivers to support DPDK/SPDK type use cases so future series will work
to add a general concept of revoke and a general negotiation of
interconnect to remove vfio_pci_dma_buf_iommufd_map().

I also plan another series to modify iommufd's vfio_compat to
transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI
of type1.

The latest series for interconnect negotation to exchange a phys_addr is:
https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com

And the discussion for design of revoke is here:
https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/
====================

Based on a shared branch with vfio.

* iommufd_dmabuf:
  iommufd/selftest: Add some tests for the dmabuf flow
  iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE
  iommufd: Have iopt_map_file_pages convert the fd to a file
  iommufd: Have pfn_reader process DMABUF iopt_pages
  iommufd: Allow MMIO pages in a batch
  iommufd: Allow a DMABUF to be revoked
  iommufd: Do not map/unmap revoked DMABUFs
  iommufd: Add DMABUF to iopt_pages
  vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
  vfio/nvgrace: Support get_dmabuf_phys
  vfio/pci: Add dma-buf export support for MMIO regions
  vfio/pci: Enable peer-to-peer DMA transactions by default
  vfio/pci: Share the core device pointer while invoking feature functions
  vfio: Export vfio device get and put registration helpers
  dma-buf: provide phys_vec to scatter-gather mapping routine
  PCI/P2PDMA: Document DMABUF model
  PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
  PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
  PCI/P2PDMA: Simplify bus address mapping API
  PCI/P2PDMA: Separate the mmap() support from the core logic

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases

A vDEVICE has been a hard requirement for attaching a nested domain to the
device. This makes sense when installing a guest STE, since a vSID must be
present and given to the kernel during the vDEVICE allocation.

But, when CR0.SMMUEN is disabled, VM doesn't really need a vSID to program
the vSMMU behavior as GBPA will take effect, in which case the vSTE in the
nested domain could have carried the bypass or abort configuration in GBPA
register. Thus, having such a hard requirement doesn't work well for GBPA.

Skip vmaster allocation in arm_smmu_attach_prepare_vmaster() for an abort
or bypass vSTE. Note that device on this attachment won't report vevents.

Update the uAPI doc accordingly.

Link: https://patch.msgid.link/r/20251103172755.2026145-1-nicolinc@nvidia.com
Tested-by: Shameer Kolothum <skolothumtho@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd/selftest: Add some tests for the dmabuf flow

Basic tests of establishing a dmabuf and revoking it. The selftest kernel
side provides a basic small dmabuf for this testing.

Link: https://patch.msgid.link/r/9-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE

Finally call iopt_alloc_dmabuf_pages() if the user passed in a DMABUF
through IOMMU_IOAS_MAP_FILE. This makes the feature visible to userspace.

Link: https://patch.msgid.link/r/8-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Have iopt_map_file_pages convert the fd to a file

Since dmabuf only has APIs that work on an int fd and not a struct file *,
pass the fd deeper into the call chain so we can use the dmabuf APIs as
is.

Link: https://patch.msgid.link/r/7-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Have pfn_reader process DMABUF iopt_pages

Make another sub implementation of pfn_reader for DMABUF. This version
will fill the batch using the struct phys_vec recorded during the
attachment.

Link: https://patch.msgid.link/r/6-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Allow MMIO pages in a batch

Addresses intended for MMIO should be propagated through to the iommu with
the IOMMU_MMIO flag set.

Keep track in the batch if all the pfns are cachable or mmio and flush the
batch out of it ever needs to be changed. Switch to IOMMU_MMIO if the
batch is MMIO when mapping the iommu.

Link: https://patch.msgid.link/r/5-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Allow a DMABUF to be revoked

When connected to VFIO, the only DMABUF exporter that is accepted, the
move_notify callback will be made when VFIO wants to remove access to the
MMIO. This is being called revoke.

Wire up revoke to go through all the iommu_domain's that have mapped the
DMABUF and unmap them.

The locking here is unpleasant, since the existing locking scheme was
designed to come from the iopt through the area to the pages we cannot use
pages as starting point for the locking. There is no way to obtain the
domains_rwsem before obtaining the pages mutex to reliably use the
existing domains_itree.

Solve this problem by adding a new tracking structure just for DMABUF
revoke. Record a linked list of areas and domains inside the pages
mutex. Clean the entries on the list during revoke. The map/unmaps are now
all done under a pages mutex while updating the tracking linked list so
nothing can get out of sync. Only one lock is required for revoke
processing.

Link: https://patch.msgid.link/r/4-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Do not map/unmap revoked DMABUFs

Once a DMABUF is revoked the domain will be unmapped under the pages
mutex. Double unmapping will trigger a WARN, and mapping while revoked
will fail.

Check for revoked DMABUFs along all the map and unmap paths to resolve
this. Ensure that map/unmap is always done under the pages mutex so it is
synchronized with the revoke notifier.

If a revoke happens between allocating the iopt_pages and the population
to a domain then the population will succeed, and leave things unmapped as
though revoke had happened immediately after.

Currently there is no way to repopulate the domains. Userspace is expected
to know if it is going to do something that would trigger revoke (eg if it
is about to do a FLR) then it should go and remove the DMABUF mappings
before and put the back after. The revoke is only to protect the kernel
from mis-behaving userspace.

Link: https://patch.msgid.link/r/3-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iommufd: Add DMABUF to iopt_pages

Add IOPT_ADDRESS_DMABUF to the iopt_pages and the basic infrastructure to
create an iopt_pages from a struct dma_buf *.

DMABUF pages are not supported for accesses, and for now can only be used
with the VFIO DMABUF exporter.

The overall flow will be similar to memfd where the user can pass in a
DMABUF file descriptor to IOMMU_IOAS_MAP_FILE and create an area and
pages. Like other areas it can be copied and otherwise manipulated, though
there is little point in doing so.

There is no pinned page accounting done for DMABUF maps.

The DMABUF attachment exists so long as the dmabuf is mapped into an IOAS,
even if the IOAS is not mapped to any domains.

Link: https://patch.msgid.link/r/2-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

vfio/pci: Add vfio_pci_dma_buf_iommufd_map()

This function is used to establish the "private interconnect" between the
VFIO DMABUF exporter and the iommufd DMABUF importer. This is intended to
be a temporary API until the core DMABUF interface is improved to natively
support a private interconnect and revocable negotiation.

This function should only be called by iommufd when trying to map a
DMABUF. For now iommufd will only support VFIO DMABUFs.

The following improvements are needed in the DMABUF API to generically
support more exporters with iommufd/kvm type importers that cannot use the
DMA API:

1) Revoke semantics. VFIO needs to be able to prevent access to the MMIO
    during FLR, and so it will use dma_buf_move_notify() to prevent
    access. iommmufd does not support fault handling so it cannot
    implement the full move_notify. Instead if revoke is negotiated the
    exporter promises not to use move_notify() unless the importer can
    experiance failures. iommufd will unmap the dmabuf from the iommu page
    tables while it is revoked.

2) Private interconnect negotiation. iommufd will only be able to map
    a "private interconnect" that provides a phys_addr_t and a
    struct p2pdma_provider * to describe the memory. It cannot use a DMA
    mapped scatterlist since it is directly calling iommu_map().

3) NULL device during dma_buf_dynamic_attach(). Since iommufd doesn't use
    the DMA API it doesn't have a DMAable struct device to pass here.

Link: https://patch.msgid.link/r/1-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Acked-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Linux 6.18-rc7

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"Fixes for the Allwinner A523 clk driver:

   - Lower the minimum rate for the A523 audio PLL to support
     frequencies required by audio devices

   - Mark a couple clks critical on A523 so that Linux doesn't turn them
     off when they're used by other code like TF-A"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi-ng: sun55i-a523-ccu: Lower audio0 pll minimum rate
  clk: sunxi-ng: sun55i-a523-r-ccu: Mark bus-r-dma as critical
  clk: sunxi-ng: Mark A523 bus-r-cpucfg clock as critical

Merge tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:

- Fix a race in timer->function clearing in timer_shutdown_sync()

- Fix a timekeeper sysfs-setup resource leak in error paths

- Fix the NOHZ report_idle_softirq() syslog rate-limiting
   logic to have no side effects on the return value

* tag 'timers-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix NULL function pointer race in timer_shutdown_sync()
  timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths
  tick/sched: Fix bogus condition in report_idle_softirq()

Merge tag 'perf-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Fix perf CPU-clock counters, and address a static checker warning"

* tag 'perf-urgent-2025-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix 0 count issue of cpu-clock
perf/x86/intel/uncore: Remove superfluous check

timers: Fix NULL function pointer race in timer_shutdown_sync()

There is a race condition between timer_shutdown_sync() and timer
expiration that can lead to hitting a WARN_ON in expire_timers().

The issue occurs when timer_shutdown_sync() clears the timer function
to NULL while the timer is still running on another CPU. The race
scenario looks like this:

CPU0 CPU1
<SOFTIRQ>
lock_timer_base()
expire_timers()
base->running_timer = timer;
unlock_timer_base()
[call_timer_fn enter]
mod_timer()
...
timer_shutdown_sync()
lock_timer_base()
// For now, will not detach the timer but only clear its function to NULL
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()
[call_timer_fn exit]
lock_timer_base()
base->running_timer = NULL;
unlock_timer_base()
...
// Now timer is pending while its function set to NULL.
// next timer trigger
<SOFTIRQ>
expire_timers()
WARN_ON_ONCE(!fn) // hit
...
lock_timer_base()
// Now timer will detach
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
if (shutdown)
timer->function = NULL;
unlock_timer_base()

The problem is that timer_shutdown_sync() clears the timer function
regardless of whether the timer is currently running. This can leave a
pending timer with a NULL function pointer, which triggers the
WARN_ON_ONCE(!fn) check in expire_timers().

Fix this by only clearing the timer function when actually detaching the
timer. If the timer is running, leave the function pointer intact, which is
safe because the timer will be properly detached when it finishes running.

Fixes: 0cc04e80458a ("timers: Add shutdown mechanism to the internal functions")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251122093942.301559-1-zouyipeng@huawei.com

Merge tag 'mips-fixes_6.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- Fix CPU type in DT for econet

- Fix for Malta PCI MMIO breakage for SOC-it

- Fix TLB shutdown caused by iniital uniquification

- Fix random seg faults due to missed vdso storage requirement

* tag 'mips-fixes_6.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kernel: Fix random segmentation faults
  MIPS: mm: Prevent a TLB shutdown on initial uniquification
  mips: dts: econet: fix EN751221 core type
  MIPS: Malta: Fix !EVA SOC-it PCI MMIO

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fix from Eric Biggers:
"Fix another KMSAN warning that made it in while KMSAN wasn't working
reliably"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()

Merge tag 'xfs-fixes-6.18-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Carlos Maiolino:
"A single out-of-bounds fix, nothing special"

* tag 'xfs-fixes-6.18-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix out of bounds memory read error in symlink repair

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"One target driver fix and one scsi-generic one. The latter is 10 lines
  because the problem lock has to be dropped and re-taken around the
  call causing the sleep in atomic"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sg: Do not sleep in atomic context
  scsi: target: tcm_loop: Fix segfault in tcm_loop_tpg_address_show()

Merge tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- INPUT_PROP_HAPTIC_TOUCHPAD definition added early in 6.18 cycle has
   been renamed to INPUT_PROP_PRESSUREPAD to better reflect the kind of
   devices it is supposed to be set for

- a new ID for a touchscreen found in Ayaneo Flip DS in Goodix driver

- Goodix driver no longer tries to set reset pin as "input" as it
   causes issues when there is no pull up resistor installed on the
   board

- fixes for cros_ec_keyb, imx_sc_key, and pegasus-notetaker drivers to
   deal with potential out-of-bounds access and memory corruption issues

* tag 'input-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: rename INPUT_PROP_HAPTIC_TOUCHPAD to INPUT_PROP_PRESSUREPAD
  Input: cros_ec_keyb - fix an invalid memory access
  Input: imx_sc_key - fix memory corruption on unload
  Input: pegasus-notetaker - fix potential out-of-bounds access
  Input: goodix - remove setting of RST pin to input
  Input: goodix - add support for ACPI ID GDIX1003

Merge tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- Correct the MIPS RISC-V/JEDEC vendor ID

- Fix the system shutdown behavior in the legacy case where
   CONFIG_RISCV_SBI_V01 is set, but the firmware implementation
   doesn't support the older v0.1 system shutdown method

- Align some tools/ macro definitions with the corresponding
   kernel headers

* tag 'riscv-for-linus-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  tools: riscv: Fixed misalignment of CSR related definitions
  riscv: sbi: Prefer SRST shutdown over legacy
  riscv: Update MIPS vendor id to 0x127

Merge tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fixes from Paul Moore:
"Three SELinux patches for v6.18 to fix issues around accessing the
  per-task decision cache that we introduced in v6.16 to help reduce
  SELinux overhead on path walks. The problem was that despite the cache
  being located in the SELinux "task_security_struct", the parent struct
  wasn't actually tied to the task, it was tied to a cred.

  Historically SELinux did locate the task_security_struct in the
  task_struct's security blob, but it was later relocated to the cred
  struct when the cred work happened, as it made the most sense at the
  time.

  Unfortunately we never did the task_security_struct to
  cred_security_struct rename work (avoid code churn maybe? who knows)
  because it didn't really matter at the time. However, it suddenly
  became a problem when we added a per-task cache to a per-cred object
  and didn't notice because of the old, no-longer-correct struct naming.

  Thanks to KCSAN for flagging this, as the silly humans running things
  forgot that the task_security_struct was a big lie.

  This contains three patches, only one of which actually fixes the
  problem described above and moves the SELinux decision cache from the
  per-cred struct to a newly (re)created per-task struct.

  The other two patches, which form the bulk of the diffstat, take care
  of the associated renaming tasks so we can hopefully avoid making the
  same stupid mistake in the future.

  For the record, I did contemplate sending just a fix for the cache,
  leaving the renaming patches for the upcoming merge window, but the
  type/variable naming ended up being pretty awful and would have made
  v6.18 an outlier stuck between the "old" names and the "new" names in
  v6.19. The renaming patches are also fairly mechanical/trivial and
  shouldn't pose much risk despite their size.

  TLDR; naming things may be hard, but if you mess it up bad things
  happen"

* tag 'selinux-pr-20251121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: rename the cred_security_struct variables to "crsec"
  selinux: move avdcache to per-task security struct
  selinux: rename task_security_struct to cred_security_struct

Merge tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Use UAPI types in ptrace UAPI header to fix nolibc ptrace.

  Fix CPU name display, NUMA node parsing, kexec/kdump, PCI init and BPF
  trampoline"

* tag 'loongarch-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: BPF: Disable trampoline for kernel module function trace
  LoongArch: Don't panic if no valid cache info for PCI
  LoongArch: Mask all interrupts during kexec/kdump
  LoongArch: Fix NUMA node parsing with numa_memblks
  LoongArch: Consolidate CPU names in /proc/cpuinfo
  LoongArch: Use UAPI types in ptrace UAPI header

Merge tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix potential memory leak in mount

- Add some missing read tracepoints

- Fix locking issue with directory leases

* tag 'v6.18-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Add the smb3_read_* tracepoints to SMB1
  cifs: fix memory leak in smb3_fs_context_parse_param error path
  smb: client: introduce close_cached_dir_locked()

Merge tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fix from Jens Axboe:
"Just a single fix for a mixup of arguments for the skb_queue_splice()
call, in the io_uring timestamp retrieval code"

* tag 'io_uring-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/cmd_net: fix wrong argument types for skb_queue_splice()

Merge tag 'block-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:
"NVMe pull request via Keith:

   - Admin queue use-after-free fix (Keith)

   - Target authentication fix (Alistar)

   - Multipath lockdeup fix (Shin'ichiro)

   - FC transport teardown fixes (Ewan)"

* tag 'block-6.18-20251120' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()
  nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
  nvme-multipath: fix lockdep WARN due to partition scan work
  nvmet-auth: update sc_c in target host hash calculation
  nvme: fix admin request_queue lifetime

Merge tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fixes from Niklas Cassel:

- Add a missing refcount decrement in ata_scsi_dev_rescan() when
   the device or its queue is not running.

   In the case where the device is running, the recount is already
   decremented properly (Yihang Li)

- Generate the proper sense code for a Security locked device.

   There was a regression caused by a recent change of how sense
   data is generated for commands that did not provide any sense
   data. This broke system suspend for Security locked devices.

   Generate the sense data that the SCSI disk driver expects for a
   Security locked device so that system suspend works again (me)

- Set capacity to zero for a Security locked device.

   All I/O commands will be aborted by a Security locked device.
   Thus, the block layer disk partition scanning will result in
   a bunch of, for the user, confusing I/O errors in dmesg during
   boot.

   Since a Security locked device is unusable anyway, set the capacity
   to zero, to avoid the disk partition scanning during boot. We still
   create the block device in /dev such that the user may unlock the
   device using e.g. hdparm (me)

* tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: Set capacity to zero for a security locked drive
  ata: libata-scsi: Fix system suspend for a security locked drive
  ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()

Merge tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- Fix register naming in the Mediatek mt8189 driver

- Select REGMAP_MMIO for the Realtek RTD driver

- Fix the number of items in groups in the Toshiba Visconti driver

- Fix a memory leak in the Cirrus CS42L43 driver

- Fix a deadlock (!) in Qualcomm pinmux configuration

- Fix use of uninitialized memory and list initialization in the S32CC
   pin controller

* tag 'pinctrl-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  dt-bindings: pinctrl: xlnx,versal-pinctrl: Add missing unevaluatedProperties on '^conf' nodes
  pinctrl: s32cc: initialize gpio_pin_config::list after kmalloc()
  pinctrl: s32cc: fix uninitialized memory in s32_pinctrl_desc
  pinctrl: qcom: msm: Fix deadlock in pinmux configuration
  pinctrl: cirrus: Fix fwnode leak in cs42l43_pin_probe()
  dt-bindings: pinctrl: toshiba,visconti: Fix number of items in groups
  pinctrl: realtek: Select REGMAP_MMIO for RTD driver
  pinctrl: mediatek: mt8189: align register base names to dt-bindings ones
  pinctrl: mediatek: mt8196: align register base names to dt-bindings ones

Merge tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- fix a use-after-free bug in GPIO character device code

- update MAINTAINERS

* tag 'gpio-fixes-for-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: update my email address
gpio: cdev: make sure the cdev fd is still active before emitting events

lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()

Fully initialize *ctx, including the buf field which sha256_init()
doesn't initialize, to avoid a KMSAN warning when comparing *ctx to
orig_ctx. This KMSAN warning slipped in while KMSAN was not working
reliably due to a stackdepot bug, which has now been fixed.

Fixes: 6733968be7cb ("lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20251121033431.34406-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>

Merge tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"A range of small fixes across the board, the i915 display
  disambiguation is probably the biggest otherwise amdgpu and xe as
  usual with tegra, nouveau, radeon and a core atomic fix.

  Looks mostly normal.

  atomic:
   - Return error codes on failed blob creation for planes

  nouveau:
   - Fix memory leak

  tegra:
   - Fix device ref counting
   - Fix pid ref counting
   - Revert booting on Pixel C

  xe:
   - Fix out-of-bounds access with BIT()
   - Fix kunit test checking wrong condition
   - Drop duplicate kconfig select
   - Fix guc2host irq handler with MSI-X

  i915:
   - Wildcat Lake and Panther Lake detangled for display fixes

  amdgpu:
   - DTBCLK gating fix
   - EDID fetching retry improvements
   - HDMI HPD debounce filtering
   - DCN 2.0 cursor fix
   - DP MST PBN fix
   - VPE fix
   - GC 11 fix
   - PRT fix
   - MMIO remap page fix
   - SR-IOV fix

  radeon:
   - Fence deadlock fix"

* tag 'drm-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
  drm/amdgpu: Add sriov vf check for VCN per queue reset support.
  drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags
  drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag
  drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled
  drm/amd: Skip power ungate during suspend for VPE
  drm/plane: Fix create_in_format_blob() return value
  drm/xe/irq: Handle msix vector0 interrupt
  drm/xe: Remove duplicate DRM_EXEC selection from Kconfig
  drm/xe/kunit: Fix forcewake assertion in mocs test
  drm/xe: Prevent BIT() overflow when handling invalid prefetch region
  drm/radeon: delete radeon_fence_process in is_signaled, no deadlock
  drm/amd/display: Fix pbn to kbps Conversion
  drm/amd/display: Clear the CUR_ENABLE register on DCN20 on DPP5
  drm/amd/display: Add an HPD filter for HDMI
  drm/amd/display: Increase DPCD read retries
  drm/amd/display: Move sleep into each retry for retrieve_link_cap()
  drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched
  drm/i915/xe3: Restrict PTL intel_encoder_is_c10phy() to only PHY A
  drm/i915/display: Add definition for wcl as subplatform
  drm/pcids: Split PTL pciids group to make wcl subplatform
  ...

samples: work around glibc redefining some of our defines wrong

Apparently as of version 2.42, glibc headers define AT_RENAME_NOREPLACE
and some of the other flags for renameat2() and friends in <stdio.h>.

Which would all be fine, except for inexplicable reasons glibc decided
to define them _differently_ from the kernel definitions, which then
makes some of our sample code that includes both kernel headers and user
space headers unhappy, because the compiler will (correctly) complain
about redefining things.

Now, mixing kernel headers and user space headers is always a somewhat
iffy proposition due to namespacing issues, but it's kind of inevitable
in our sample and selftest code.  And this is just glibc being stupid.

Those defines come from the kernel, glibc is exposing the kernel
interfaces, and glibc shouldn't make up some random new expressions for
these values.

It's not like glibc headers changed the actual result values, but they
arbitrarily just decided to use a different expression to describe those
values.  The kernel just does

    #define AT_RENAME_NOREPLACE  0x0001

while glibc does

    # define RENAME_NOREPLACE (1 << 0)
    # define AT_RENAME_NOREPLACE RENAME_NOREPLACE

instead.  Same value in the end, but very different macro definition.

For absolutely no reason.

This has since been fixed in the glibc development tree, so eventually
we'll end up with the canonical expressions and no clashes.  But in the
meantime the broken headers are in the glibc-2.42 release and have made
it out into distributions.

Do a minimal work-around to make the samples build cleanly by just
undefining the affected macros in between the user space header include
and the kernel header includes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MIPS: kernel: Fix random segmentation faults

Commit 69896119dc9d ("MIPS: vdso: Switch to generic storage
implementation") switches to a generic vdso storage, which increases
the number of data pages from 1 to 4. But there is only one page
reserved, which causes segementation faults depending where the VDSO
area is randomized to. To fix this use the same size of reservation
and allocation of the VDSO data pages.

Fixes: 69896119dc9d ("MIPS: vdso: Switch to generic storage implementation")
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

MIPS: mm: Prevent a TLB shutdown on initial uniquification

Depending on the particular CPU implementation a TLB shutdown may occur
if multiple matching entries are detected upon the execution of a TLBP
or the TLBWI/TLBWR instructions. Given that we don't know what entries
we have been handed we need to be very careful with the initial TLB
setup and avoid all these instructions.

Therefore read all the TLB entries one by one with the TLBR instruction,
bypassing the content addressing logic, and truncate any large pages in
place so as to avoid a case in the second step where an incoming entry
for a large page at a lower address overlaps with a replacement entry
chosen at another index. Then preinitialize the TLB using addresses
outside our usual unique range and avoiding clashes with any entries
received, before making the usual call to local_flush_tlb_all().

This fixes (at least) R4x00 cores if TLBP hits multiple matching TLB
entries (SGI IP22 PROM for examples sets up all TLBs to the same virtual
address).

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: 35ad7e181541 ("MIPS: mm: tlb-r4k: Uniquify TLB entries on init")
Cc: stable@vger.kernel.org
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> # Boston I6400, M5150 sim
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Merge tag 'drm-xe-fixes-2025-11-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Fix out-of-bounds access with BIT() (Shuicheng Lin)
- Fix kunit test checking wrong condition (Matt Roper)
- Drop duplicate kconfig select (Shuicheng Lin)
- Fix guc2host irq handler with MSI-X (Venkata Ramana Nayana)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/uadbrmftcud3wg32c6tje7mmfcr7wgmpnkzxwubk6fletahje2@coek2ciunkvz

Merge tag 'amd-drm-fixes-6.18-2025-11-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-11-20:

amdgpu:
- DTBCLK gating fix
- EDID fetching retry improvements
- HDMI HPD debounce filtering
- DCN 2.0 cursor fix
- DP MST PBN fix
- VPE fix
- GC 11 fix
- PRT fix
- MMIO remap page fix
- SR-IOV fix

radeon:
- Fence deadlock fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251120164110.1077973-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2025-11-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

atomic:
- Return error codes on failed blob creation for planes

nouveau:
- Fix memory leak

tegra:
- Fix device ref counting
- Fix pid ref counting
- Revert booting on Pixel C

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251120151308.GA589436@linux.fritz.box

Merge tag 'drm-intel-fixes-2025-11-20' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Wildcat Lake and Panther Lake detangled for display fixes (Dnyaneshwar)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aR8jByCwjIThpnpk@intel.com

vfio/nvgrace: Support get_dmabuf_phys

Call vfio_pci_core_fill_phys_vec() with the proper physical ranges for the
synthetic BAR 2 and BAR 4 regions. Otherwise use the normal flow based on
the PCI bar.

This demonstrates a DMABUF that follows the region info report to only
allow mapping parts of the region that are mmapable. Since the BAR is
power of two sized and the "CXL" region is just page aligned the there can
be a padding region at the end that is not mmaped or passed into the
DMABUF.

The "CXL" ranges that are remapped into BAR 2 and BAR 4 areas are not PCI
MMIO, they actually run over the CXL-like coherent interconnect and for
the purposes of DMA behave identically to DRAM. We don't try to model this
distinction between true PCI BAR memory that takes a real PCI path and the
"CXL" memory that takes a different path in the p2p framework for now.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-11-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Add dma-buf export support for MMIO regions

Add support for exporting PCI device MMIO regions through dma-buf,
enabling safe sharing of non-struct page memory with controlled
lifetime management. This allows RDMA and other subsystems to import
dma-buf FDs and build them into memory regions for PCI P2P operations.

The implementation provides a revocable attachment mechanism using
dma-buf move operations. MMIO regions are normally pinned as BARs
don't change physical addresses, but access is revoked when the VFIO
device is closed or a PCI reset is issued. This ensures kernel
self-defense against potentially hostile userspace.

Currently VFIO can take MMIO regions from the device's BAR and map
them into a PFNMAP VMA with special PTEs. This mapping type ensures
the memory cannot be used with things like pin_user_pages(), hmm, and
so on. In practice only the user process CPU and KVM can safely make
use of these VMA. When VFIO shuts down these VMAs are cleaned by
unmap_mapping_range() to prevent any UAF of the MMIO beyond driver
unbind.

However, VFIO type 1 has an insecure behavior where it uses
follow_pfnmap_*() to fish a MMIO PFN out of a VMA and program it back
into the IOMMU. This has a long history of enabling P2P DMA inside
VMs, but has serious lifetime problems by allowing a UAF of the MMIO
after the VFIO driver has been unbound.

Introduce DMABUF as a new safe way to export a FD based handle for the
MMIO regions. This can be consumed by existing DMABUF importers like
RDMA or DRM without opening an UAF. A following series will add an
importer to iommufd to obsolete the type 1 code and allow safe
UAF-free MMIO P2P in VM cases.

DMABUF has a built in synchronous invalidation mechanism called
move_notify. VFIO keeps track of all drivers importing its MMIO and
can invoke a synchronous invalidation callback to tell the importing
drivers to DMA unmap and forget about the MMIO pfns. This process is
being called revoke. This synchronous invalidation fully prevents any
lifecycle problems. VFIO will do this before unbinding its driver
ensuring there is no UAF of the MMIO beyond the driver lifecycle.

Further, VFIO has additional behavior to block access to the MMIO
during things like Function Level Reset. This is because some poor
platforms may experience a MCE type crash when touching MMIO of a PCI
device that is undergoing a reset. Today this is done by using
unmap_mapping_range() on the VMAs. Extend that into the DMABUF world
and temporarily revoke the MMIO from the DMABUF importers during FLR
as well. This will more robustly prevent an errant P2P from possibly
upsetting the platform.

A DMABUF FD is a preferred handle for MMIO compared to using something
like a pgmap because:
- VFIO is supported, including its P2P feature, on archs that don't
   support pgmap
- PCI devices have all sorts of BAR sizes, including ones smaller
   than a section so a pgmap cannot always be created
- It is undesirable to waste a lot of memory for struct pages,
   especially for a case like a GPU with ~100GB of BAR size
- We want a synchronous revoke semantic to support FLR with light
   hardware requirements

Use the P2P subsystem to help generate the DMA mapping. This is a
significant upgrade over the abuse of dma_map_resource() that has
historically been used by DMABUF exporters. Experience with an OOT
version of this patch shows that real systems do need this. This
approach deals with all the P2P scenarios:
- Non-zero PCI bus_offset
- ACS flags routing traffic to the IOMMU
- ACS flags that bypass the IOMMU - though vfio noiommu is required
   to hit this.

There will be further work to formalize the revoke semantic in
DMABUF. For now this acts like a move_notify dynamic exporter where
importer fault handling will get a failure when they attempt to map.
This means that only fully restartable fault capable importers can
import the VFIO DMABUFs. A future revoke semantic should open this up
to more HW as the HW only needs to invalidate, not handle restartable
faults.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-10-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

selinux: rename the cred_security_struct variables to "crsec"

Along with the renaming from task_security_struct to cred_security_struct,
rename the local variables to "crsec" from "tsec". This both fits with
existing conventions and helps distinguish between task and cred related
variables.

No functional changes.

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

selinux: move avdcache to per-task security struct

The avdcache is meant to be per-task; move it to a new
task_security_struct that is duplicated per-task.

Cc: stable@vger.kernel.org
Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line length fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>

selinux: rename task_security_struct to cred_security_struct

Before Linux had cred structures, the SELinux task_security_struct was
per-task and although the structure was switched to being per-cred
long ago, the name was never updated. This change renames it to
cred_security_struct to avoid confusion and pave the way for the
introduction of an actual per-task security structure for SELinux. No
functional change.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

Merge tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fix from Tejun Heo:
"One low risk and obvious fix: scx_enable() was dereferencing an error
pointer on helper kthread creation failure. Fixed"

* tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_enable() crash on helper kthread creation failure

vfio/pci: Enable peer-to-peer DMA transactions by default

Make sure that all VFIO PCI devices have peer-to-peer capabilities
enables, so we would be able to export their MMIO memory through DMABUF,

VFIO has always supported P2P mappings with itself. VFIO type 1
insecurely reads PFNs directly out of a VMA's PTEs and programs them
into the IOMMU allowing any two VFIO devices to perform P2P to each
other.

All existing VMMs use this capability to export P2P into a VM where
the VM could setup any kind of DMA it likes. Projects like DPDK/SPDK
are also known to make use of this, though less frequently.

As a first step to more properly integrating VFIO with the P2P
subsystem unconditionally enable P2P support for VFIO PCI devices. The
struct p2pdma_provider will act has a handle to the P2P subsystem to
do things like DMA mapping.

While real PCI devices have to support P2P (they can't even tell if an
IOVA is P2P or not) there may be fake PCI devices that may trigger
some kind of catastrophic system failure. To date VFIO has never
tripped up on such a case, but if one is discovered the plan is to add
a PCI quirk and have pcim_p2pdma_init() fail. This will fully block
the broken device throughout any users of the P2P subsystem in the
kernel.

Thus P2P through DMABUF will follow the historical VFIO model and be
unconditionally enabled by vfio-pci.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-9-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Share the core device pointer while invoking feature functions

There is no need to share the main device pointer (struct vfio_device *)
with all the feature functions as they only need the core device
pointer. Therefore, extract the core device pointer once in the
caller (vfio_pci_core_ioctl_feature) and share it instead.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-8-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: Export vfio device get and put registration helpers

These helpers are useful for managing additional references taken
on the device from other associated VFIO modules.

Original-patch-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-7-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

dma-buf: provide phys_vec to scatter-gather mapping routine

Add dma_buf_phys_vec_to_sgt() and dma_buf_free_sgt() helpers to convert
an array of MMIO physical address ranges into scatter-gather tables with
proper DMA mapping.

These common functions are a starting point and support any PCI
drivers creating mappings from their BAR's MMIO addresses. VFIO is one
case, as shortly will be RDMA. We can review existing DRM drivers to
refactor them separately. We hope this will evolve into routines to
help common DRM that include mixed CPU and MMIO mappings.

Compared to the dma_map_resource() abuse this implementation handles
the complicated PCI P2P scenarios properly, especially when an IOMMU
is enabled:

- Direct bus address mapping without IOVA allocation for
   PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This
   happens if the IOMMU is enabled but the PCIe switch ACS flags allow
   transactions to avoid the host bridge.

   Further, this handles the slightly obscure, case of MMIO with a
   phys_addr_t that is different from the physical BAR programming
   (bus offset). The phys_addr_t is converted to a dma_addr_t and
   accommodates this effect. This enables certain real systems to
   work, especially on ARM platforms.

- Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO
   attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE).
   This happens when the IOMMU is enabled and the ACS flags are forcing
   all traffic to the IOMMU - ie for virtualization systems.

- Cases where P2P is not supported through the host bridge/CPU. The
   P2P subsystem is the proper place to detect this and block it.

Helper functions fill_sg_entry() and calc_sg_nents() handle the
scatter-gather table construction, splitting large regions into
UINT_MAX-sized chunks to fit within sg->length field limits.

Since the physical address based DMA API forbids use of the CPU list
of the scatterlist this will produce a mangled scatterlist that has
a fully zero-length and NULL'd CPU list. The list is 0 length,
all the struct page pointers are NULL and zero sized. This is stronger
and more robust than the existing mangle_sg_table() technique. It is
a future project to migrate DMABUF as a subsystem away from using
scatterlist for this data structure.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Document DMABUF model

Reflect latest changes in p2p implementation to support DMABUF lifecycle.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-5-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function

Provide an access to pci_p2pdma_map_type() function to allow subsystems
to determine the appropriate mapping type for P2PDMA transfers between
a provider and target device.

The pci_p2pdma_map_type() function is the core P2P layer version of
the existing public, but struct page focused, pci_p2pdma_state()
function. It returns the same result. It is required to use the p2p
subsystem from drivers that don't use the struct page layer.

Like __pci_p2pdma_update_state() it is not an exported function. The
idea is that only subsystem code will implement mapping helpers for
taking in phys_addr_t lists, this is deliberately not made accessible
to every driver to prevent abuse.

Following patches will use this function to implement a shared DMA
mapping helper for DMABUF.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-4-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation

Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA
functionality from the optional memory allocation layer. This creates a
two-tier architecture:

The core layer provides P2P mapping functionality for physical addresses
based on PCI device MMIO BARs and integrates with the DMA API for
mapping operations. This layer is required for all P2PDMA users.

The optional upper layer provides memory allocation capabilities
including gen_pool allocator, struct page support, and sysfs interface
for user space access.

This separation allows subsystems like DMABUF to use only the core P2P
mapping functionality without the overhead of memory allocation features
they don't need. The core functionality is now available through the
new pcim_p2pdma_provider() function that returns a p2pdma_provider
structure.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-3-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Simplify bus address mapping API

Update the pci_p2pdma_bus_addr_map() function to take a direct pointer
to the p2pdma_provider structure instead of the pci_p2pdma_map_state.
This simplifies the API by removing the need for callers to extract
the provider from the state structure.

The change updates all callers across the kernel (block layer, IOMMU,
DMA direct, and HMM) to pass the provider pointer directly, making
the code more explicit and reducing unnecessary indirection. This
also removes the runtime warning check since callers now have direct
control over which provider they use.

Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-2-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

PCI/P2PDMA: Separate the mmap() support from the core logic

Currently the P2PDMA code requires a pgmap and a struct page to
function. The was serving three important purposes:

- DMA API compatibility, where scatterlist required a struct page as
input

- Life cycle management, the percpu_ref is used to prevent UAF during
device hot unplug

- A way to get the P2P provider data through the pci_p2pdma_pagemap

The DMA API now has a new flow, and has gained phys_addr_t support, so
it no longer needs struct pages to perform P2P mapping.

Lifecycle management can be delegated to the user, DMABUF for instance
has a suitable invalidation protocol that does not require struct page.

Finding the P2P provider data can also be managed by the caller
without need to look it up from the phys_addr.

Split the P2PDMA code into two layers. The optional upper layer,
effectively, provides a way to mmap() P2P memory into a VMA by
providing struct page, pgmap, a genalloc and sysfs.

The lower layer provides the actual P2P infrastructure and is wrapped
up in a new struct p2pdma_provider. Rework the mmap layer to use new
p2pdma_provider based APIs.

Drivers that do not want to put P2P memory into VMA's can allocate a
struct p2pdma_provider after probe() starts and free it before
remove() completes. When DMA mapping the driver must convey the struct
p2pdma_provider to the DMA mapping code along with a phys_addr of the
MMIO BAR slice to map. The driver must ensure that no DMA mapping
outlives the lifetime of the struct p2pdma_provider.

The intended target of this new API layer is DMABUF. There is usually
only a single p2pdma_provider for a DMABUF exporter. Most drivers can
establish the p2pdma_provider during probe, access the single instance
during DMABUF attach and use that to drive the DMA mapping.

DMABUF provides an invalidation mechanism that can guarantee all DMA
is halted and the DMA mappings are undone prior to destroying the
struct p2pdma_provider. This ensures there is no UAF through DMABUFs
that are lingering past driver removal.

The new p2pdma_provider layer cannot be used to create P2P memory that
can be mapped into VMA's, be used with pin_user_pages(), O_DIRECT, and
so on. These use cases must still use the mmap() layer. The
p2pdma_provider layer is principally for DMABUF-like use cases where
DMABUF natively manages the life cycle and access instead of
vmas/pin_user_pages()/struct page.

In addition, remove the bus_off field from pci_p2pdma_map_state since
it duplicates information already available in the pgmap structure.
The bus_offset is only used in one location (pci_p2pdma_bus_addr_map)
and is always identical to pgmap->bus_offset.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Alex Mastro <amastro@fb.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Ankit Agrawal <ankita@nvidia.com>
Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-1-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

Merge tag 'slab-for-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Fix mempool poisoning order>0 pages with CONFIG_HIGHMEM (Vlastimil Babka)

* tag 'slab-for-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/mempool: fix poisoning order>0 pages with HIGHMEM

sched_ext: Fix scx_enable() crash on helper kthread creation failure

A crash was observed when the sched_ext selftests runner was
terminated with Ctrl+\ while test 15 was running:

NIP [c00000000028fa58] scx_enable.constprop.0+0x358/0x12b0
LR [c00000000028fa2c] scx_enable.constprop.0+0x32c/0x12b0
Call Trace:
scx_enable.constprop.0+0x32c/0x12b0 (unreliable)
bpf_struct_ops_link_create+0x18c/0x22c
__sys_bpf+0x23f8/0x3044
sys_bpf+0x2c/0x6c
system_call_exception+0x124/0x320
system_call_vectored_common+0x15c/0x2ec

kthread_run_worker() returns an ERR_PTR() on failure rather than NULL,
but the current code in scx_alloc_and_add_sched() only checks for a NULL
helper. Incase of failure on SIGQUIT, the error is not handled in
scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an
error pointer.

Error handling is fixed in scx_alloc_and_add_sched() to propagate
PTR_ERR() into ret, so that scx_enable() jumps to the existing error
path, avoiding random dereference on failure.

Fixes: bff3b5aec1b7 ("sched_ext: Move disable machinery into scx_sched")
Cc: stable@vger.kernel.org # v6.16+
Reported-and-tested-by: Samir Mulani <samir@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Reviewed-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

io_uring/cmd_net: fix wrong argument types for skb_queue_splice()

If timestamp retriving needs to be retried and the local list of
SKB's already has entries, then it's spliced back into the socket
queue. However, the arguments for the splice helper are transposed,
causing exactly the wrong direction of splicing into the on-stack
list. Fix that up.

Cc: stable@vger.kernel.org
Reported-by: Google Big Sleep <big-sleep-vuln-reports+bigsleep-462435176@google.com>
Fixes: 9e4ed359b8ef ("io_uring/netcmd: add tx timestamping cmd support")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix a regression introduced during the 6.16 development cycle that may
  cause runtime PM to be enabled by mistake for devices that do not
  support it (which may lead to some serious trouble) if there is a
  system wakeup event during the "late suspend" phase of system suspend"

* tag 'pm-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: core: Fix runtime PM enabling in device_resume_early()

Merge tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"This fixes EINJV2 support introduced during the 6.17 cycle by
  unbreaking the initialization broken by a previous attempted fix,
  adding sanity checks for data coming from the platform firmware, and
  updating the code to handle injecting legacy error types on an EINJV2
  capable systems properly (Tony Luck)"

* tag 'acpi-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: APEI: EINJ: Fix EINJV2 initialization and injection

Merge tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
"This one has lots of new HW entries which adds to the size in diffstat
  but the individual changes are simple.

  Fixes

   - acer-wmi: Ignore backlight event

   - alienware-wmi-wmax: Fix quirk match table order & drop redundant
     entries

   - amd/pmc:
      - Add Xbox Ally to spurious 8042 quirk list
      - Quirk list Lenovo Legion Go 2 NVMe resume

   - msi-wmi-platform:
      - Correct GUID to uppercase
      - GUID is uncleverly copy-pasted from an example so add a DMI
        whitelist

   - intel/speed_select_if: PCIBIOS_* return code conversion

   - intel-uncore-freq & ISST: Fix kernel doc warnings

  New HW support

   - alienware-wmi-wmax:
      - Alienware 16 Aurora support
      - Alienware M support
      - Alienware X support
      - Dell G support

   - amd/pmc:
      - ROG Xbox Ally (non-X) support

   - huaway-wmi: HONOR MagicBoox X16/X14 PrintScreen & YOYO keys

   - hp-wmi:
      - Omen 16-wf1xxx fan support
      - Omen MAX 16-ah0xx fan + thermal profile support
      - Victus 16-r0 and 16-s0 fan + thermal profile support

   - intel/hid: Intel Nova Lake support

   - intel-uncore-freq:
      - Intel Panther Lake support
      - Intel Wildcat Lake support
      - Intel Nova Lake support"

* tag 'platform-drivers-x86-v6.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (21 commits)
  platform/x86: intel-uncore-freq: fix all header kernel-doc warnings
  platform/x86: acer-wmi: Ignore backlight event
  platform/x86/intel/speed_select_if: Convert PCIBIOS_* return codes to errnos
  platform/x86/intel/hid: Add Nova Lake support
  platform/x86: alienware-wmi-wmax: Add AWCC support to Alienware 16 Aurora
  platform/x86: hp-wmi: Add Omen MAX 16-ah0xx fan support and thermal profile
  platform/x86: msi-wmi-platform: Fix typo in WMI GUID
  platform/x86: msi-wmi-platform: Only load on MSI devices
  platform/x86/amd: pmc: Add Lenovo Legion Go 2 to pmc quirk list
  platform/x86/amd/pmc: Add spurious_8042 to Xbox Ally
  platform/x86/amd/pmc: Add support for Van Gogh SoC
  platform/x86: alienware-wmi-wmax: Add support for the whole "G" family
  platform/x86: alienware-wmi-wmax: Add support for the whole "X" family
  platform/x86: alienware-wmi-wmax: Add support for the whole "M" family
  platform/x86: alienware-wmi-wmax: Drop redundant DMI entries
  platform/x86: alienware-wmi-wmax: Fix "Alienware m16 R1 AMD" quirk order
  platform/x86: ISST: isst_if.h: fix all kernel-doc warnings
  platform/x86: intel-uncore-freq: Add additional client processors
  platform/x86: hp-wmi: Add Omen 16-wf1xxx fan support
  platform/x86: huawei-wmi: add keys for HONOR models
  ...

Merge tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from IPsec and wireless.

  Previous releases - regressions:

   - prevent NULL deref in generic_hwtstamp_ioctl_lower(),
     newer APIs don't populate all the pointers in the request

   - phylink: add missing supported link modes for the fixed-link

   - mptcp: fix false positive warning in mptcp_pm_nl_rm_addr

  Previous releases - always broken:

   - openvswitch: remove never-working support for setting NSH fields

   - xfrm: number of fixes for error paths of xfrm_state creation/
     modification/deletion

   - xfrm: fixes for offload
      - fix the determination of the protocol of the inner packet
      - don't push locally generated packets directly to L2 tunnel
        mode offloading, they still need processing from the standard
        xfrm path

   - mptcp: fix a couple of corner cases in fallback and fastclose
     handling

   - wifi: rtw89: hw_scan: prevent connections from getting stuck,
     work around apparent bug in FW by tweaking messages we send

   - af_unix: fix duplicate data if PEEK w/ peek_offset needs to wait

   - veth: more robust handing of race to avoid txq getting stuck

   - eth: ps3_gelic_net: handle skb allocation failures"

* tag 'net-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  vsock: Ignore signal/timeout on connect() if already established
  be2net: pass wrb_params in case of OS2BMC
  l2tp: reset skb control buffer on xmit
  net: dsa: microchip: lan937x: Fix RGMII delay tuning
  selftests: mptcp: add a check for 'add_addr_accepted'
  mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
  selftests: mptcp: join: userspace: longer timeout
  selftests: mptcp: join: endpoints: longer timeout
  selftests: mptcp: join: fastclose: remove flaky marks
  mptcp: fix duplicate reset on fastclose
  mptcp: decouple mptcp fastclose from tcp close
  mptcp: do not fallback when OoO is present
  mptcp: fix premature close in case of fallback
  mptcp: avoid unneeded subflow-level drops
  mptcp: fix ack generation for fallback msk
  wifi: rtw89: hw_scan: Don't let the operating channel be last
  net: phylink: add missing supported link modes for the fixed-link
  selftest: af_unix: Add test for SO_PEEK_OFF.
  af_unix: Read sk_peek_offset() again after sleeping in unix_stream_read_generic().
  net/mlx5: Clean up only new IRQ glue on request_irq() failure
  ...

timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths

tk_aux_sysfs_init() returns immediately on error during the auxiliary clock
initialization loop without cleaning up previously allocated kobjects and
sysfs groups.

If kobject_create_and_add() or sysfs_create_group() fails during loop
iteration, the parent kobjects (tko and auxo) and any previously created
child kobjects are leaked.

Fix this by adding proper error handling with goto labels to ensure all
allocated resources are cleaned up on failure. kobject_put() on the
parent kobjects will handle cleanup of their children.

Fixes: 7b95663a3d96 ("timekeeping: Provide interface to control auxiliary clocks")
Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251120150213.246777-1-mrout@redhat.com

vsock: Ignore signal/timeout on connect() if already established

During connect(), acting on a signal/timeout by disconnecting an already
established socket leads to several issues:

1. connect() invoking vsock_transport_cancel_pkt() ->
   virtio_transport_purge_skbs() may race with sendmsg() invoking
   virtio_transport_get_credit(). This results in a permanently elevated
   `vvs->bytes_unsent`. Which, in turn, confuses the SOCK_LINGER handling.

2. connect() resetting a connected socket's state may race with socket
   being placed in a sockmap. A disconnected socket remaining in a sockmap
   breaks sockmap's assumptions. And gives rise to WARNs.

3. connect() transitioning SS_CONNECTED -> SS_UNCONNECTED allows for a
   transport change/drop after TCP_ESTABLISHED. Which poses a problem for
   any simultaneous sendmsg() or connect() and may result in a
   use-after-free/null-ptr-deref.

Do not disconnect socket on signal/timeout. Keep the logic for unconnected
sockets: they don't linger, can't be placed in a sockmap, are rejected by
sendmsg().

[1]: https://lore.kernel.org/netdev/e07fd95c-9a38-4eea-9638-133e38c2ec9b@rbox.co/
[2]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/
[3]: https://lore.kernel.org/netdev/60f1b7db-3099-4f6a-875e-af9f6ef194f6@rbox.co/

Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251119-vsock-interrupted-connect-v2-1-70734cf1233f@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

be2net: pass wrb_params in case of OS2BMC

be_insert_vlan_in_pkt() is called with the wrb_params argument being NULL
at be_send_pkt_to_bmc() call site. This may lead to dereferencing a NULL
pointer when processing a workaround for specific packet, as commit
bc0c3405abbb ("be2net: fix a Tx stall bug caused by a specific ipv6
packet") states.

The correct way would be to pass the wrb_params from be_xmit().

Fixes: 760c295e0e8d ("be2net: Support for OS2BMC.")
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Vatoropin <a.vatoropin@crpt.ru>
Link: https://patch.msgid.link/20251119105015.194501-1-a.vatoropin@crpt.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nvme-6.18-2025-11-20' of git://git.infradead.org/nvme into block-6.18

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.18

- Admin queue use-after-free fix (Keith)
- Target authentication fix (Alistar)
- Multipath lockdeup fix (Shin'ichiro)
- FC transport teardown fixes (Ewan)"

* tag 'nvme-6.18-2025-11-20' of git://git.infradead.org/nvme:
  nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()
  nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
  nvme-multipath: fix lockdep WARN due to partition scan work
  nvmet-auth: update sc_c in target host hash calculation
  nvme: fix admin request_queue lifetime

ata: libata-core: Set capacity to zero for a security locked drive

For Security locked drives (drives that have Security enabled, and have
not been Security unlocked by boot firmware), the automatic partition
scanning will result in the user being spammed with errors such as:

  ata5.00: failed command: READ DMA
  ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 7 dma 4096 in
           res 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
  ata5.00: status: { DRDY ERR }
  ata5.00: error: { ABRT }
  sd 4:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
  sd 4:0:0:0: [sda] tag#7 Sense Key : Aborted Command [current]
  sd 4:0:0:0: [sda] tag#7 Add. Sense: No additional sense information

during boot, because most commands except for IDENTIFY will be aborted by
a Security locked drive.

For a Security locked drive, set capacity to zero, so that no automatic
partition scanning will happen.

If the user later unlocks the drive using e.g. hdparm, the close() by the
user space application should trigger a revalidation of the drive.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: libata-scsi: Fix system suspend for a security locked drive

Commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status
handling") fixed ata_to_sense_error() to properly generate sense key
ABORTED COMMAND (without any additional sense code), instead of the
previous bogus sense key ILLEGAL REQUEST with the additional sense code
UNALIGNED WRITE COMMAND, for a failed command.

However, this broke suspend for Security locked drives (drives that have
Security enabled, and have not been Security unlocked by boot firmware).

The reason for this is that the SCSI disk driver, for the Synchronize
Cache command only, treats any sense data with sense key ILLEGAL REQUEST
as a successful command (regardless of ASC / ASCQ).

After commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error()
status handling") the code that treats any sense data with sense key
ILLEGAL REQUEST as a successful command is no longer applicable, so the
command fails, which causes the system suspend to be aborted:

  sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5
  sd 1:0:0:0: PM: failed to suspend async: error -5
  PM: Some devices failed to suspend, or early wake event detected

To make suspend work once again, for a Security locked device only,
return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense
data which a real SCSI device would have returned if locked.
The SCSI disk driver treats this sense data as a successful command.

Cc: stable@vger.kernel.org
Reported-by: Ilia Baryshnikov <qwelias@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704
Fixes: cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()

Call scsi_device_put() in ata_scsi_dev_rescan() if the device or its
queue are not running.

Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
Cc: stable@vger.kernel.org
Signed-off-by: Yihang Li <liyihang9@h-partners.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>

Merge tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
wireless-2025-11-20

A single fix for scanning on some rtw89 devices.

* tag 'wireless-2025-11-20' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: rtw89: hw_scan: Don't let the operating channel be last
====================

Link: https://patch.msgid.link/20251120085433.8601-3-johannes@sipsolutions.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

l2tp: reset skb control buffer on xmit

The L2TP stack did not reset the skb control buffer before sending the
encapsulated package.

In a setup with an ath10k radio and batman-adv over an L2TP tunnel
massive fragmentations happen sporadically if the L2TP tunnel is
established over IPv4.

L2TP might reset some of the fields in the IP control buffer, but L2TP
assumes the type of the control buffer to be of an IPv4 packet.

In case the L2TP interface is used as a batadv hardif or the packet is
an IPv6 packet, this assumption breaks.

Clear the entire control buffer to avoid such mishaps altogether.

Fixes: f77ae9390438 ("[PPPOL2TP]: Reset meta-data in xmit function")
Signed-off-by: David Bauer <mail@david-bauer.net>
Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: dsa: microchip: lan937x: Fix RGMII delay tuning

Correct RGMII delay application logic in lan937x_set_tune_adj().

The function was missing `data16 &= ~PORT_TUNE_ADJ` before setting the
new delay value. This caused the new value to be bitwise-OR'd with the
existing PORT_TUNE_ADJ field instead of replacing it.

For example, when setting the RGMII 2 TX delay on port 4, the
intended TUNE_ADJUST value of 0 (RGMII_2_TX_DELAY_2NS) was
incorrectly OR'd with the default 0x1B (from register value 0xDA3),
leaving the delay at the wrong setting.

This patch adds the missing mask to clear the field, ensuring the
correct delay value is written. Physical measurements on the RGMII TX
lines confirm the fix, showing the delay changing from ~1ns (before
change) to ~2ns.

While testing on i.MX 8MP showed this was within the platform's timing
tolerance, it did not match the intended hardware-characterized value.

Fixes: b19ac41faa3f ("net: dsa: microchip: apply rgmii tx and rx delay in phylink mac config")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20251114090951.4057261-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xfs: fix out of bounds memory read error in symlink repair

xfs/286 produced this report on my test fleet:

==================================================================
BUG: KFENCE: out-of-bounds read in memcpy_orig+0x54/0x110

Out-of-bounds read at 0xffff88843fe9e038 (184B right of kfence-#184):
  memcpy_orig+0x54/0x110
  xrep_symlink_salvage_inline+0xb3/0xf0 [xfs]
  xrep_symlink_salvage+0x100/0x110 [xfs]
  xrep_symlink+0x2e/0x80 [xfs]
  xrep_attempt+0x61/0x1f0 [xfs]
  xfs_scrub_metadata+0x34f/0x5c0 [xfs]
  xfs_ioc_scrubv_metadata+0x387/0x560 [xfs]
  xfs_file_ioctl+0xe23/0x10e0 [xfs]
  __x64_sys_ioctl+0x76/0xc0
  do_syscall_64+0x4e/0x1e0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

kfence-#184: 0xffff88843fe9df80-0xffff88843fe9dfea, size=107, cache=kmalloc-128

allocated by task 3470 on cpu 1 at 263329.131592s (192823.508886s ago):
  xfs_init_local_fork+0x79/0xe0 [xfs]
  xfs_iformat_local+0xa4/0x170 [xfs]
  xfs_iformat_data_fork+0x148/0x180 [xfs]
  xfs_inode_from_disk+0x2cd/0x480 [xfs]
  xfs_iget+0x450/0xd60 [xfs]
  xfs_bulkstat_one_int+0x6b/0x510 [xfs]
  xfs_bulkstat_iwalk+0x1e/0x30 [xfs]
  xfs_iwalk_ag_recs+0xdf/0x150 [xfs]
  xfs_iwalk_run_callbacks+0xb9/0x190 [xfs]
  xfs_iwalk_ag+0x1dc/0x2f0 [xfs]
  xfs_iwalk_args.constprop.0+0x6a/0x120 [xfs]
  xfs_iwalk+0xa4/0xd0 [xfs]
  xfs_bulkstat+0xfa/0x170 [xfs]
  xfs_ioc_fsbulkstat.isra.0+0x13a/0x230 [xfs]
  xfs_file_ioctl+0xbf2/0x10e0 [xfs]
  __x64_sys_ioctl+0x76/0xc0
  do_syscall_64+0x4e/0x1e0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

CPU: 1 UID: 0 PID: 1300113 Comm: xfs_scrub Not tainted 6.18.0-rc4-djwx #rc4 PREEMPT(lazy)  3d744dd94e92690f00a04398d2bd8631dcef1954
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-4.module+el8.8.0+21164+ed375313 04/01/2014
==================================================================

On further analysis, I realized that the second parameter to min() is
not correct.  xfs_ifork::if_bytes is the size of the xfs_ifork::if_data
buffer.  if_bytes can be smaller than the data fork size because:

(a) the forkoff code tries to keep the data area as large as possible
(b) for symbolic links, if_bytes is the ondisk file size + 1
(c) forkoff is always a multiple of 8.

Case in point: for a single-byte symlink target, forkoff will be
8 but the buffer will only be 2 bytes long.

In other words, the logic here is wrong and we walk off the end of the
incore buffer.  Fix that.

Cc: stable@vger.kernel.org # v6.10
Fixes: 2651923d8d8db0 ("xfs: online repair of symbolic links")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

perf: Fix 0 count issue of cpu-clock

Currently cpu-clock event always returns 0 count, e.g.,

perf stat -e cpu-clock -- sleep 1

Performance counter stats for 'sleep 1':
0 cpu-clock # 0.000 CPUs utilized
1.002308394 seconds time elapsed

The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle
error of some clock events")' adds PERF_EF_UPDATE flag check before
calling cpu_clock_event_update() to update the count, however the
PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in
counting mode (pmu->dev() -> cpu_clock_event_del() ->
cpu_clock_event_stop()). This leads to the cpu-clock event count is
never updated.

To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event
just like what task-clock does.

Fixes: bc4394e5e79c ("perf: Fix the throttle error of some clock events")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com

cifs: Add the smb3_read_* tracepoints to SMB1

Add the smb3_read_* tracepoints to SMB1's cifs_async_readv() and
cifs_readv_callback().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

cifs: fix memory leak in smb3_fs_context_parse_param error path

Add proper cleanup of ctx->source and fc->source to the
cifs_parse_mount_err error handler. This ensures that memory allocated
for the source strings is correctly freed on all error paths, matching
the cleanup already performed in the success path by
smb3_cleanup_fs_context_contents().
Pointers are also set to NULL after freeing to prevent potential
double-free issues.

This change fixes a memory leak originally detected by syzbot. The
leak occurred when processing Opt_source mount options if an error
happened after ctx->source and fc->source were successfully
allocated but before the function completed.

The specific leak sequence was:
1. ctx->source = smb3_fs_context_fullpath(ctx, '/') allocates memory
2. fc->source = kstrdup(ctx->source, GFP_KERNEL) allocates more memory
3. A subsequent error jumps to cifs_parse_mount_err
4. The old error handler freed passwords but not the source strings,
causing the memory to leak.

This issue was not addressed by commit e8c73eb7db0a ("cifs: client:
fix memory leak in smb3_fs_context_parse_param"), which only fixed
leaks from repeated fsconfig() calls but not this error path.

Patch updated with minor change suggested by kernel test robot

Reported-by: syzbot+87be6809ed9bf6d718e3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=87be6809ed9bf6d718e3
Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: introduce close_cached_dir_locked()

Replace close_cached_dir() calls under cfid_list_lock with a new
close_cached_dir_locked() variant that uses kref_put() instead of
kref_put_lock() to avoid recursive locking when dropping references.

While the existing code works if the refcount >= 2 invariant holds,
this area has proven error-prone. Make deadlocks impossible and WARN
on invariant violations.

Cc: stable@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge tag 'rtw-2025-11-20' of https://github.com/pkshih/rtw

Ping-Ke Shih says:
==================
rtw patches for v6.18-rc7

Fix firmware goes wrong and causes device unusable after scanning. This
issue presents under certain regulatory domain reported from end users.
==================

Link: https://patch.msgid.link/8217bee0-96c4-44c1-9593-2e9ca12eccc5@RTKEXHMBS03.realtek.com.tw
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

LoongArch: BPF: Disable trampoline for kernel module function trace

The current LoongArch BPF trampoline implementation is incompatible
with tracing functions in kernel modules. This causes several severe
and user-visible problems:

* The `bpf_selftests/module_attach` test fails consistently.
* Kernel lockup when a BPF program is attached to a module function [1].
* Critical kernel modules like WireGuard experience traffic disruption
when their functions are traced with fentry [2].

Given the severity and the potential for other unknown side-effects, it
is safest to disable the feature entirely for now. This patch prevents
the BPF subsystem from allowing trampoline attachments to kernel module
functions on LoongArch.

This is a temporary mitigation until the core issues in the trampoline
code for kernel module handling can be identified and fixed.

[root@fedora bpf]# ./test_progs -a module_attach -v
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.

[1]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C13FMT58oqQ@mail.gmail.com/
[2]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcFaK93uoELPg@mail.gmail.com/

Cc: stable@vger.kernel.org
Fixes: f9b6b41f0cf3 ("LoongArch: BPF: Add basic bpf trampoline support")
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Don't panic if no valid cache info for PCI

If there is no valid cache info detected (may happen in virtual machine)
for pci_dfl_cache_line_size, kernel shouldn't panic. Because in the PCI
core it will be evaluated to (L1_CACHE_BYTES >> 2).

Cc: <stable@vger.kernel.org>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Mask all interrupts during kexec/kdump

If the default state of the interrupt controllers in the first kernel
don't mask any interrupts, it may cause the second kernel to potentially
receive interrupts (which were previously allocated by the first kernel)
immediately after a CPU becomes online during its boot process. These
interrupts cannot be properly routed, leading to bad IRQ issues.

This patch calls machine_kexec_mask_interrupts() to mask all interrupts
during the kexec/kdump process.

Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Fix NUMA node parsing with numa_memblks

On physical machine, NUMA node id comes from high bit 44:48 of physical
address. However it is not true on virt machine. With general method, it
comes from ACPI SRAT table.

Here the common function numa_memblks_init() is used to parse NUMA node
information with numa_memblks.

Cc: <stable@vger.kernel.org>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Consolidate CPU names in /proc/cpuinfo

Some processors have no IOCSR.VENDOR and IOCSR.CPUNAME, some processors
have these registers but there is no valid information.

Consolidate CPU names in /proc/cpuinfo:
1. Add "PRID" to display the PRID & Core-Name;
2. Let "Model Name" display "Unknown" if no valid name.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Use UAPI types in ptrace UAPI header

The kernel UAPI headers already contain fixed-width integer types, there
is no need to rely on the libc types. There may not be a libc available
or the libc may not provides the <stdint.h>, like for example on nolibc.

This also aligns the header with the rest of the LoongArch UAPI headers.

Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-11-18 (idpf, ice)

This series contains updates to idpf and ice drivers.

Emil adds a check for NULL vport_config during removal to avoid NULL
pointer dereference in idpf.

Grzegorz fixes PTP teardown paths to account for some missed cleanups
for ice driver.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: fix PTP cleanup on driver removal in error path
idpf: fix possible vport_config NULL pointer deref in remove
====================

Link: https://patch.msgid.link/20251118235207.2165495-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-misc-fixes-for-v6-18-rc7'

Matthieu Baerts says:

====================
mptcp: misc fixes for v6.18-rc7

Here are various unrelated fixes:

- Patch 1: Fix window space computation for fallback connections which
  can affect ACK generation. A fix for v5.11.

- Patch 2: Avoid unneeded subflow-level drops due to unsynced received
  window. A fix for v5.11.

- Patch 3: Avoid premature close for fallback connections with PREEMPT
  kernels. A fix for v5.12.

- Patch 4: Reset instead of fallback in case of data in the MPTCP
  out-of-order queue. A fix for v5.7.

- Patches 5-7: Avoid also sending "plain" TCP reset when closing with an
  MP_FASTCLOSE. A fix for v6.1.

- Patches 8-9: Longer timeout for background connections in MPTCP Join
  selftests. An additional fix for recent patches for v5.13/v6.1.

- Patches 10-11: Fix typo in a check introduce in a recent refactoring.
  A fix for v6.15.
====================

Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-0-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: add a check for 'add_addr_accepted'

The previous patch fixed an issue with the 'add_addr_accepted' counter.
This was not spot by the test suite.

Check this counter and 'add_addr_signal' in MPTCP Join 'delete re-add
signal' test. This should help spotting similar regressions later on.
These counters are crucial for ensuring the MPTCP path manager correctly
handles the subflow creation via 'ADD_ADDR'.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-11-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix address removal logic in mptcp_pm_nl_rm_addr

Fix inverted WARN_ON_ONCE condition that prevented normal address
removal counter updates. The current code only executes decrement
logic when the counter is already 0 (abnormal state), while
normal removals (counter > 0) are ignored.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
Fixes: 636113918508 ("mptcp: pm: remove '_nl' from mptcp_pm_nl_rm_addr_received")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-10-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: userspace: longer timeout

In rare cases, when the test environment is very slow, some userspace
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.

To play it safe, all userspace tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.

The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.

Fixes: 290493078b96 ("selftests: mptcp: join: userspace: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-9-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: endpoints: longer timeout

In rare cases, when the test environment is very slow, some endpoints
tests can fail because some expected events have not been seen.

Because the tests are expecting a long on-going connection, and they are
not waiting for the end of the transfer, it is fine to have a longer
timeout, and even go over the default one. This connection will be
killed at the end, after the verifications: increasing the timeout
doesn't change anything, apart from avoiding it to end before the end of
the verifications.

To play it safe, all endpoints tests not waiting for the end of the
transfer are now having a longer timeout: 2 minutes.

The Fixes commit was making the connection longer, but still, the
default timeout would have stopped it after 1 minute, which might not be
enough in very slow environments.

Fixes: 6457595db987 ("selftests: mptcp: join: endpoints: longer transfer")
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-8-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: fastclose: remove flaky marks

After recent fixes like the parent commit, and "selftests: mptcp:
connect: trunc: read all recv data", the two fastclose subtests no
longer look flaky any more.

It then feels fine to remove these flaky marks, to no longer ignore
these subtests in case of errors.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-7-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix duplicate reset on fastclose

The CI reports sporadic failures of the fastclose self-tests. The root
cause is a duplicate reset, not carrying the relevant MPTCP option.
In the failing scenario the bad reset is received by the peer before
the fastclose one, preventing the reception of the latter.

Indeed there is window of opportunity at fastclose time for the
following race:

  mptcp_do_fastclose
    __mptcp_close_ssk
      __tcp_close()
        tcp_set_state() [1]
        tcp_send_active_reset() [2]

After [1] the stack will send reset to in-flight data reaching the now
closed port. Such reset may race with [2].

Address the issue explicitly sending a single reset on fastclose before
explicitly moving the subflow to close status.

Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/596
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-6-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: decouple mptcp fastclose from tcp close

With the current fastclose implementation, the mptcp_do_fastclose()
helper is in charge of two distinct actions: send the fastclose reset
and cleanup the subflows.

Formally decouple the two steps, ensuring that mptcp explicitly closes
all the subflows after the mentioned helper.

This will make the upcoming fix simpler, and allows dropping the 2nd
argument from mptcp_destroy_common(). The Fixes tag is then the same as
in the next commit to help with the backports.

Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-5-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: do not fallback when OoO is present

In case of DSS corruption, the MPTCP protocol tries to avoid the subflow
reset if fallback is possible. Such corruptions happen in the receive
path; to ensure fallback is possible the stack additionally needs to
check for OoO data, otherwise the fallback will break the data stream.

Fixes: e32d262c89e2 ("mptcp: handle consistently DSS corruption")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/598
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-4-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix premature close in case of fallback

I'm observing very frequent self-tests failures in case of fallback when
running on a CONFIG_PREEMPT kernel.

The root cause is that subflow_sched_work_if_closed() closes any subflow
as soon as it is half-closed and has no incoming data pending.

That works well for regular subflows - MPTCP needs bi-directional
connectivity to operate on a given subflow - but for fallback socket is
race prone.

When TCP peer closes the connection before the MPTCP one,
subflow_sched_work_if_closed() will schedule the MPTCP worker to
gracefully close the subflow, and shortly after will do another schedule
to inject and process a dummy incoming DATA_FIN.

On CONFIG_PREEMPT kernel, the MPTCP worker can kick-in and close the
fallback subflow before subflow_sched_work_if_closed() is able to create
the dummy DATA_FIN, unexpectedly interrupting the transfer.

Address the issue explicitly avoiding closing fallback subflows on when
the peer is only half-closed.

Note that, when the subflow is able to create the DATA_FIN before the
worker invocation, the worker will change the msk state before trying to
close the subflow and will skip the latter operation as the msk will not
match anymore the precondition in __mptcp_close_subflow().

Fixes: f09b0ad55a11 ("mptcp: close subflow when receiving TCP+FIN")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-3-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: avoid unneeded subflow-level drops

The rcv window is shared among all the subflows. Currently, MPTCP sync
the TCP-level rcv window with the MPTCP one at tcp_transmit_skb() time.

The above means that incoming data may sporadically observe outdated
TCP-level rcv window and being wrongly dropped by TCP.

Address the issue checking for the edge condition before queuing the
data at TCP level, and eventually syncing the rcv window as needed.

Note that the issue is actually present from the very first MPTCP
implementation, but backports older than the blamed commit below will
range from impossible to useless.

Before:

  $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow
  TcpExtBeyondWindow              14                 0.0

After:

  $ nstat -n; sleep 1; nstat -z TcpExtBeyondWindow
  TcpExtBeyondWindow              0                  0.0

Fixes: fa3fe2b15031 ("mptcp: track window announced to peer")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-2-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix ack generation for fallback msk

mptcp_cleanup_rbuf() needs to know the last most recent, mptcp-level
rcv_wnd sent, and such information is tracked into the msk->old_wspace
field, updated at ack transmission time by mptcp_write_options().

Fallback socket do not add any mptcp options, such helper is never
invoked, and msk->old_wspace value remain stale. That in turn makes
ack generation at recvmsg() time quite random.

Address the issue ensuring mptcp_write_options() is invoked even for
fallback sockets, and just update the needed info in such a case.

The issue went unnoticed for a long time, as mptcp currently overshots
the fallback socket receive buffer autotune significantly. It is going
to change in the near future.

Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
Cc: stable@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251118-net-mptcp-misc-fixes-6-18-rc6-v1-1-806d3781c95f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

scsi: sg: Do not sleep in atomic context

sg_finish_rem_req() calls blk_rq_unmap_user(). The latter function may
sleep. Hence, call sg_finish_rem_req() with interrupts enabled instead
of disabled.

Reported-by: syzbot+c01f8e6e73f20459912e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-scsi/691560c4.a70a0220.3124cb.001a.GAE@google.com/
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Fixes: 97d27b0dd015 ("scsi: sg: close race condition in sg_remove_sfp_usercontext()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251113181643.1108973-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

wifi: rtw89: hw_scan: Don't let the operating channel be last

Scanning can be offloaded to the firmware. To that end, the driver
prepares a list of channels to scan, including periodic visits back to
the operating channel, and sends the list to the firmware.

When the channel list is too long to fit in a single H2C message, the
driver splits the list, sends the first part, and tells the firmware to
scan. When the scan is complete, the driver sends the next part of the
list and tells the firmware to scan.

When the last channel that fit in the H2C message is the operating
channel something seems to go wrong in the firmware. It will
acknowledge receiving the list of channels but apparently it will not
do anything more. The AP can't be pinged anymore. The driver still
receives beacons, though.

One way to avoid this is to split the list of channels before the
operating channel.

Affected devices:

* RTL8851BU with firmware 0.29.41.3
* RTL8832BU with firmware 0.29.29.8
* RTL8852BE with firmware 0.29.29.8

The commit 57a5fbe39a18 ("wifi: rtw89: refactor flow that hw scan handles channel list")
is found by git blame, but it is actually to refine the scan flow, but not
a culprit, so skip Fixes tag.

Reported-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Closes: https://lore.kernel.org/linux-wireless/0abbda91-c5c2-4007-84c8-215679e652e1@gmail.com/
Cc: stable@vger.kernel.org # 6.16+
Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/c1e61744-8db4-4646-867f-241b47d30386@gmail.com

drm/amdgpu: Add sriov vf check for VCN per queue reset support.

Add SRIOV check when setting VCN ring's supported reset mask.

Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ee9b603ad43f9870eb75184f9fb0a84f8c3cc852)
Cc: stable@vger.kernel.org

drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags

The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt
behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like
normal TT/doorbell/preempt memory and unconditionally accessed
ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a
NULL pointer dereference when computing PDE flags.

Fix this by checking that ttm is non-NULL before reading ttm->caching.
This prevents the crash for MMIO_REMAP and also makes the code more
defensive if other BOs ever come through without a ttm_tt.

Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement")
Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0db94da5a0a1cacda080b9ec8425fcbe4babc141)