git.ipfire.org Git - thirdparty/kernel/linux.git/log

Merge tag 'kvm-x86-vfio-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM VFIO changes for 7.2

Use guard() to cleanup up various KVM+VFIO flows.

Merge tag 'kvm-x86-sev-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM SEV changes for 7.2

- Don't advertise support for unusuable VM types, and account for VM types
   that are disabled by firmware, e.g. to mitigate security vulnerabilities.

- Rewrite the SEV {en,de}crypt debug ioctls as they were riddle with bugs and
   unnecessarily complicated, and add comprehensive tests.

- Clean up and deduplicate the SEV page pinning code.

- Fix minor goofs related to writing back CPUID information after firmware
   rejects a CPUID page for an SNP vCPU.

Merge tag 'kvm-x86-selftests-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 7.2

- Randomize the dirty log test's delay when reaping the bitmap on the first
pass, as always waiting only 1ms hid a KVM RISC-V bug as the test reaped the
bitmap before KVM could build up enough state to hit the bug.

- A pile of one-off fixes and cleanups.

Merge tag 'kvm-x86-mmu-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM x86 MMU changes for 7.2

- Use the kernel's "enum pg_level" in the TDX APIs instead of the TDX-Module's
   level definitions (which are 0-based).

- Rework the TDX memory APIs to not require/assume that guest memory is
   backed by "struct page" (in prepartion for guest_memfd hugepage support).

- Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific logic as
   possible into the TDX code, and to funnel (almost) all S-EPT updates into
   a single chokepoint.  The motivation is largely to prepare for upcoming
   Dynamic PAMT support, but the cleanups are nice to have on their own.

- Plug a hole in the shadow MMU where KVM fails to recursively zap nested TDP
   shadow when L1 is tearing its TDP page tables from the bottom up, as KVM's
   TDP MMU now does.

Merge tag 'kvm-x86-misc-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM misc x86 changes for 7.2

- Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor code
   gets a chance to handle things like reaping the PML buffer.

- Ensure KVM's copy of CR0 and CR3 are up-to-date on SVM prior to invoking
   fastpath handlers.

- Update KVM's view of PV async enabling if and only if the MSR write fully
   succeeds.

- Fix a variety of issues where the emulator doesn't honor guest-debug state,
   and clean up related code along the way.

- Synthesize EPT Violation and #NPF "error code" bits when injecting faults
   into L1 that didn't originate in hardware (in which case the VMCS/VMCB
   doesn't hold relevant information).

- Add support for virtualizing (well, emulating) AMD's flavor of CPL>0 CPUID
   faulting.

- Clean up the GPR APIs so that KVM's use of "raw" is consistent, and fix a
   variety of minor bugs along the way.

- Fix an OOB memory access due to not checking the VP ID when handling a
   Hyper-V PV TLB flush for L2.

- Fix a bug in the mediated PMU's handling of fixed counters that allowed the
   guest to bypass the PMU event filter.

- Allow userspace to return EAGAIN when handling SNP and TDX hypercalls, so
   the KVM can forward a "retry" status code to the guest, and reserve all
   unused error codes for future usage.

- Misc fixes and cleanups.

Merge tag 'kvm-x86-gmem-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM guest_memfd changes for 7.2

- Return -EEXIST instead of -EINVAL if userspace attempts to bind a gmem
   range to multiple memslots, and fix the test that was supposed to ensure
   KVM returns -EEXIST.

- Treat memslot binding offsets and sizes as unsigned values to fix a bug
   where KVM interprets a large "offset + size" as a negative value and allows
   a nonsensical offset.

- Use the inode number instead of the page offset for the NUMA interleaving
   index to fix a bug where the effective index would jump by two for
   consecutive pages (the caller also adds in the page offset).

Merge branch kvm-arm64/vgic-v5-PPI-fixes into kvmarm-master/next

* kvm-arm64/vgic-v5-PPI-fixes:
  : .
  : Substantial cleanup of the vgic-v5 PPI support. From the original
  : cover letter:
  :
  : "With the GICv5 PPi support merged in, it has become obvious that a few
  :  things could be improved, both from the correctness and maintainability
  :  angles."
  : .
  KVM: arm64: Fix arch timer interrupts for GICv3-on-GICv5 guests
  irqchip/gic-v5: Immediately exec priority drop following activate
  Documentation: KVM: Clarify that PMU_V3_IRQ IntID requirements for GICv5
  Documentation: KVM: Fix typos in VGICv5 documentation
  KVM: arm64: selftests: Improve error handling for GICv5 PPI selftest
  KVM: arm64: selftests: Cleanup unused vars in GICv5 PPI selftest
  KVM: arm64: selftests: Add missing GIC CDEN to no-vgic-v5 selftest
  KVM: arm64: vgic-v5: Atomically assign bits to PPI DVI bitmap
  KVM: arm64: vgic-v5: Add missing trap handing for NV triage
  KVM: arm64: vgic-v5: Limit support to 64 PPIs
  KVM: arm64: vgic: Rationalise per-CPU irq accessor
  KVM: arm64: vgic-v5: Drop defensive checks from vgic_v5_ppi_queue_irq_unlock()
  KVM: arm64: vgic: Consolidate vgic_allocate_private_irqs_locked()
  KVM: arm64: vgic: Constify struct irq_ops usage
  KVM: arm64: vgic-v5: Drop pointless ARM64_HAS_GICV5_CPUIF check
  KVM: arm64: vgic-v5: Remove use of __assign_bit() with a constant
  KVM: arm64: vgic-v5: Move PPI caps into kvm_vgic_global_state
  KVM: arm64: vgic-v5: Add for_each_visible_v5_ppi() iterator

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/pkvm-fixes-7.2 into kvmarm-master/next

* kvm-arm64/pkvm-fixes-7.2:
  : .
  : Assorted pKVM fixes for 7.2:
  :
  : - Ensure that the vcpu memcache is filled in a number of cases (donate,
  :   share, selftest)
  :
  : - Fix vmemmap page order handling by resetting it when initialising the
  :   memory pool
  :
  : - Don't leak page references on failed memory donation
  :
  : - Add sanity-check for refcounted pages when donating/sharing pages
  :
  : - Clear __hyp_running_vcpu on state flush
  :
  : - Check LR upper bound against a trusted value
  :
  : - Assorted fixes for the host-side tracking of the pages shared with
  :   EL2 as a result of some Sashiko testing from Fuad
  :
  : - Correctly forward HCR_EL2.VSE from host to guest, so that protected
  :   guests can see SErrors
  : .
  KVM: arm64: Roll back partial shares on kvm_share_hyp() failure
  KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
  KVM: arm64: Free hyp-share tracking node when share hypercall fails
  KVM: arm64: Flush HCR_EL2.VSE to deliver SErrors to pKVM guests
  KVM: arm64: Bound used_lrs when flushing the pKVM hyp vCPU
  KVM: arm64: Clear __hyp_running_vcpu when flushing the pKVM hyp vCPU
  KVM: arm64: Pre-check vcpu memcache for host->guest donate
  KVM: arm64: Pre-check vcpu memcache for host->guest share
  KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
  KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host
  KVM: arm64: Fix __pkvm_init_vm error path
  KVM: arm64: Reset page order in pKVM hyp_pool

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-granule-sizes into kvmarm-master/next

* kvm-arm64/nv-granule-sizes:
  : .
  : Tidying up of the behaviour when the selected page size in not
  : implemented, courtesy of Wei-Lin Chang. From the initial cover
  : letter:
  :
  : "This small series fixes the granule size selection for software stage-1
  :  and stage-2 walks. Previously we treat the guest's TCR/VTCR.TGx as-is
  :  and use the encoded granule size for the walks. However this is
  :  incorrect if the granule sizes are not advertised in the guest's
  :  ID_AA64MMFR0_EL1.TGRAN*. The architecture specifies that when an
  :  unsupported size is programed in TGx, it must be treated as an
  :  implemented size. Fix this by choosing an available one while
  :  prioritizing PAGE_SIZE."
  : .
  KVM: arm64: Fallback to a supported value for unsupported guest TGx
  KVM: arm64: nv: Use literal granule size in TLBI range calculation
  KVM: arm64: Factor out TG0/1 decoding of VTCR and TCR
  KVM: arm64: nv: Rename vtcr_to_walk_info() to setup_s2_walk()

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/nv-fp-elision into kvmarm-master/next

* kvm-arm64/nv-fp-elision:
  : .
  : Significantly reduce the overhead of the context switch between L1 and
  : L2 guests by eliding the save/restore of the FP/SIMD/SVE registers, as
  : this state is shared between the two guests, and therefore can be left
  : live.
  : .
  KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
  KVM: arm64: nv: Track L2 to L1 exception emulation

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/no-lazy-vgic-init into kvmarm-master/next

* kvm-arm64/no-lazy-vgic-init:
  : .
  : Fix an ugly situation where the vgic lazy init could happen in
  : non-preemtible contexts such as vcpu reset, resulting in lockdep
  : splats.
  :
  : This requires revamping the way in-kernel emulation of devices
  : (timers, PMU) are presenting their interrupt to the vgic, and
  : make sure there is no need to init the vgic on the back of that.
  : .
  KVM: arm64: vgic-v2: Don't init the vgic on in-kernel interrupt injection
  KVM: arm64: vgic-v2: Force vgic init on injection outside the run loop
  KVM: arm64: pmu: Kill the PMU interrupt level cache
  KVM: arm64: timer: Kill the per-timer irq level cache
  KVM: arm64: Simplify userspace notification of interrupt state
  KVM: arm64: timer: Repaint kvm_timer_{should,irq_can}_fire() to kvm_timer_{pending,enabled}()

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: vgic-its: Make ABI commit helpers return void

The return values of vgic_its_set_abi() and vgic_its_commit_v0() are always
0 and do not carry useful error information. Simplify by changing them to
void.

Suggested-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://patch.msgid.link/20260604075147.53299-1-liu.yun@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>

xfs: shut down the filesystem on a failed mount

A corrupt/crafted XFS image can make mount fail after background inode
inactivation has already been enabled.  xfs_mountfs() turns on inodegc
(xfs_inodegc_start()) right after log recovery, but the quota subsystem
(mp->m_quotainfo) is only allocated much later, in xfs_qm_newmount() /
xfs_qm_mount_quotas().  The quota accounting flags in mp->m_qflags are
parsed from the mount options before xfs_mountfs() even runs.

If the mount then aborts in between - e.g. xfs_rtmount_inodes() failing
with "failed to read RT inodes" - the unwind path flushes the inodegc
queue, which inactivates the inodes that are still queued, and
xfs_inactive() calls xfs_qm_dqattach().  That path trusts
XFS_IS_QUOTA_ON() (the flag is set) and dereferences the not yet
allocated mp->m_quotainfo:

  XFS (loop0): failed to read RT inodes
  Oops: general protection fault, probably for non-canonical address
        0xdffffc000000002a: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000150-0x0000000000000157]
  Workqueue: xfs-inodegc/loop0 xfs_inodegc_worker
  RIP: 0010:__mutex_lock+0xfe/0x930
  Call Trace:
   xfs_qm_dqget_cache_lookup+0x63/0x7f0
   xfs_qm_dqget_inode+0x336/0x860
   xfs_qm_dqattach_one+0x232/0x4e0
   xfs_qm_dqattach_locked+0x2c6/0x470
   xfs_qm_dqattach+0x46/0x70
   xfs_inactive+0x988/0xe80
   xfs_inodegc_worker+0x27c/0x730

The NULL m_quotainfo deref is only one symptom.  The deeper problem is
that a failed mount should not be inactivating inodes at all: it must
not write to the (possibly corrupt, only partially set up) persistent
metadata of a filesystem we just refused to mount, and the subsystems
inactivation relies on may not be initialised.

Mark the filesystem shut down before flushing the inodegc queue in the
xfs_mountfs() failure path.  With the preceding patch a shut down mount
no longer inactivates the queued inodes: xfs_inactive() returns early so
they are dropped straight to reclaim instead.  They are still pulled down
so reclaim can free them (which is why the flush was added in commit
ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")), but
without touching the on-disk structures - matching that comment's own
"pull down all the state and flee" intent.

Use SHUTDOWN_META_IO_ERROR for the shutdown: it is the generic "cannot
safely touch metadata" reason already used elsewhere in this file and in
the xfs_ifree() failure path, and unlike SHUTDOWN_FORCE_UMOUNT it does
not log a misleading "User initiated shutdown received".  A failed mount
is not necessarily on-disk corruption (it can be a transient I/O or
resource error), so SHUTDOWN_CORRUPT_ONDISK would not be accurate either.

Found by fuzzing XFS with syzkaller (corrupt image mount); reproduced and
verified under QEMU/KASAN.

Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: skip inode inactivation on a shut down mount

XFS already declines to inactivate inodes on a shut down mount, but only
at queue time: xfs_inode_mark_reclaimable() calls
xfs_inode_needs_inactive(), which returns false when the mount is shut
down ("If the log isn't running, push inodes straight to reclaim"), and
then drops the dquots and marks the inode reclaimable directly.

An inode that was queued for background inactivation while the mount was
still live is not covered by that check: the inodegc worker still calls
xfs_inactive() on it even after the mount has been shut down in the
meantime. Inactivation modifies persistent metadata and runs
transactions that cannot complete on a shut down mount, and it relies on
subsystems (e.g. quota) that a torn down, or never fully set up, mount
may not have available.

Honour the same invariant in xfs_inactive() itself: if the mount is shut
down, return early before doing any inactivation work. The dquots
attached to the inode are released by the existing xfs_qm_dqdetach() at
the out: label, so references are not leaked, and the caller then makes
the inode reclaimable exactly as before.

On its own this is a consistency fix with the existing queue-time
behaviour; it is also a prerequisite for shutting the mount down in the
xfs_mountfs() failure path in the following patch.

Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'kvm-x86-generic-7.2' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 7.2

- Rename invalidate_begin() to invalidate_start() throughout KVM to follow
the kernel's nomenclature, e.g. for mmu_notifiers.

- Minor cleanups.

Merge tag 'kvm-s390-master-7.1-4' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: A few more misc gmap fixes.

xfs: move XFS_LSN_CMP to xfs_log_format.h

Because CYCLE_LSN/BLOCK_LSN are defined in xfs_log_format.h, XFS_LSN_CMP
forces a xfs_log_format.h dependency in xfs_log.h. Move XFS_LSN_CMP
to xfs_log_format.h and drop the macro/inline indirection to clean up
our header mess a little bit.

This also helps xfsprogs, which doesn't have xfs_log.h, but needs
XFS_LSN_CMP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: shut down zoned file systems on writeback errors

Zoned writeback allocates space from an open zone and advances the
in-memory allocation state before submitting the bio. The completion
path only records the written blocks and updates the mapping on success.
If the write fails, XFS cannot tell how far the device write pointer
advanced and cannot safely roll the open zone accounting back.

This was observed while investigating xfs/643 and xfs/646 on an external
ZNS realtime device. A writeback error after consuming space from an
open zone left later writers waiting for open-zone or GC progress that
could not happen. xfs/643 exposed this through the GC defragmentation
path, while xfs/646 exposed the same failure mode through the
truncate/EOF-zeroing space wait path.

There is no local recovery path in ioend completion that can restore a
consistent zoned allocation state after the device has rejected the
write. Treat writeback errors for zoned inodes as fatal and force a
file system shutdown from the ioend completion path. The existing
shutdown path wakes zoned allocation waiters and makes future space
waits return -EIO instead of leaving tasks stuck waiting for progress.

Signed-off-by: Yao Sang <sangyao@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather

Now that AMD uses iommupt, it is easy to make use of the PDE bit. If
the gather has no free list then no page directory entries were
changed.

Pass GN/PDE through the invalidation call chain in a u32 flags field
that is OR'd into data[2] and set it properly from the gather.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec

The spec in Table 14 defines the "Entire Cache" case as having the low
12 bits as zero. Indeed the command format doesn't even have the low
12 bits. Since there is only one user now, fix the constant to have 0
in the low 12 bits instead of 1 and remove the masking.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/amd: Have amd_iommu_domain_flush_pages() use last

Finish clearing out the size/last/end switching by converting
amd_iommu_domain_flush_pages() to use last-based logic.

This algorithm is simpler than the previous. Ultimately all this wants
to do is select powers of two that are aligned to address and not
longer than the distance to last.

The new version is fully safe for size = U64_MAX and last = U64_MAX.

Finally, the gather can be passed through natively without risking an
overflow in (gather->end - gather->start + 1).

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/amd: Pass last in through to build_inv_address()

This is the trivial call chain below amd_iommu_domain_flush_pages().

Cases that are doing a full invalidate will pass a last of U64_MAX.

This avoids converting between size and last, and type confusion with
size_t, unsigned long and u64 all being used in different places along
the driver's invalidation path. Consistently use u64 in the internals.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/amd: Simplify build_inv_address()

This function is doing more work than it needs to:

- iommu_num_pages() is pointless, the fls() is going to compute the
   required page size already.

- It is easier to understand as sz_lg2, which is 12 if size is 4K,
   than msb_diff which is 11 if size is 4K.

- Simplify the control flow to early exit on the out of range cases.

- Use the usual last instead of end to signify an inclusive last
   address.

- Use GENMASK to compute the 1's mask.

- Use GENMASK to compute the address mask for the command layout,
   not PAGE_MASK.

- Directly reference the spec language that defines the 52 bit
   limit.

No functional change intended.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Wei Wang <wei.w.wang@hotmail.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

Merge tag 'bst-arm64-emmc-driver-defconfig-for-v7.2' of https://github.com/BlackSesame-SoC/linux into soc/defconfig

arm64: BST C1200 eMMC defconfig for v7.2

Black Sesame Technologies:

Enable eMMC controller on BST C1200 CDCU1.0 board:
- Enable CONFIG_MMC_SDHCI_BST=y in arm64 defconfig

The MMC driver was merged via mmc-next in v7.1-rc1.
This is the remaining defconfig piece.

* tag 'bst-arm64-emmc-driver-defconfig-for-v7.2' of https://github.com/BlackSesame-SoC/linux:
arm64: defconfig: enable BST SDHCI controller

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'bst-arm64-emmc-driver-dts-for-v7.2' of https://github.com/BlackSesame-SoC/linux into soc/dt

arm64: BST C1200 eMMC DTS for v7.2

Black Sesame Technologies:

Enable eMMC controller on BST C1200 CDCU1.0 board:
    - Add mmc0 node in bstc1200.dtsi (DWCMSHC SDHCI controller)
    - Add fixed clock definition and reserved SRAM bounce buffer
    - Enable mmc0 with 8-bit bus on CDCU1.0 ADAS 4C2G board
The MMC driver was merged via mmc-next in v7.1-rc1.
this is the remaining DTS piece.

Signed-off-by: Gordon Ge <gordon.ge@bst.ai>
* tag 'bst-arm64-emmc-driver-dts-for-v7.2' of https://github.com/BlackSesame-SoC/linux:
  arm64: dts: bst: enable eMMC controller in C1200

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

arm64: defconfig: enable BST SDHCI controller

Enable CONFIG_MMC_SDHCI_BST to support eMMC on Black Sesame
Technologies C1200 boards.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Acked-by: Gordon Ge <gordon.ge@bst.ai>
Signed-off-by: Gordon Ge <gordon.ge@bst.ai>

arm64: dts: bst: enable eMMC controller in C1200

Add mmc0 node for the DWCMSHC SDHCI controller with basic configuration
(disabled by default) and fixed clock definition in bstc1200.dtsi.

Enable mmc0 with board-specific configuration including 8-bit bus
width and reserved SRAM bounce buffer on the CDCU1.0 ADAS 4C2G board.

The bounce buffer in reserved SRAM addresses hardware constraints
where the eMMC controller cannot access main system memory through
SMMU due to a hardware bug, and all DRAM is located outside the
4GB boundary.

Signed-off-by: Albert Yang <yangzh0906@thundersoft.com>
Acked-by: Gordon Ge <gordon.ge@bst.ai>
Signed-off-by: Gordon Ge <gordon.ge@bst.ai>

Merge tag 'drm-xe-fixes-2026-06-11' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

Driver Changes:
- fix oops in suspend/shutdown without display (Jani)
- RAS fixes (Raag)
- Use HW_ERR prefix in log (Raag)
- include all registered queues in TLB invalidation (Tangudu)
- Fix refcount leak in xe_range_tree in error paths (Wentao)
- fix job timeout recovery for unstarted jobs and kernel queues (Rodrigo)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/aitt8ZkYmxIT9cdP@gsse-cloud1.jf.intel.com

Merge tag 'drm-intel-fixes-2026-06-11' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Check supported link rates DPCD read [edp] (Nikita Zhandarovich)
- Fix phys BO pread/pwrite with offset [gem] (Joonas Lahtinen)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/aipkcUDnTlzre-8F@linux

ip6_tunnel: annotate data-races around t->err_count and t->err_time

ip6_tnl_xmit() and ipip6_tunnel_xmit() run locklessly (dev->lltx == true).

ip6gre_err() and ipip6_err() also run locklessly.

We need to add READ_ONCE() and WRITE_ONCE() annotations
around t->err_count and t->err_time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260610171458.1359630-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

crypto: tegra - fix refcount leak in tegra_se_host1x_submit()

The timeout error path in tegra_se_host1x_submit() returns without
calling host1x_job_put(), while all other paths (success, submit
error, pin error) properly release the job reference through the
job_put label. Since host1x_job_alloc() initializes the reference
count and host1x_job_put() is required to drop it, omitting it on
timeout causes a permanent refcount leak.

Fix this by redirecting the timeout return to the existing job_put
label, ensuring the job reference and any associated syncpt
references are consistently released.

Cc: stable@vger.kernel.org
Fixes: 0880bb3b00c8 ("crypto: tegra - Add Tegra Security Engine driver")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: rng - Free default RNG on module exit

When the rng module is removed the default RNG will be leaked.
Call crypto_del_default_rng to free it if possible.

Fixes: 7cecadb7cca8 ("crypto: rng - Do not free default RNG when it becomes unused")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: testmgr - allow authenc(hmac(sha{256,384}),cts(cbc(aes))) in FIPS mode

hmac(sha256), hmac(sha384) and cts(cbc(aes)) algorithms have been
marked as FIPS allowed for years. Mark the respective authenc()
constructions per RFC 8009 ("AES Encryption with HMAC-SHA2 for
Kerberos 5") as such as well.

SP 800-57 Part 3 Rev. 1 from Jan 2015 [1] links the draft of what
became RFC 8009 in Oct 2016 as approved in section 6.3 Procurement
Guidance (item/recommendation 3).

[1] https://csrc.nist.gov/pubs/sp/800/57/pt3/r1/final

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

hwrng: jh7110 - fix refcount leak in starfive_trng_read()

The starfive_trng_read() function acquires a runtime PM reference
via pm_runtime_get_sync() but fails to release it on two error
paths. If starfive_trng_wait_idle() or starfive_trng_cmd() returns
an error, the function exits without calling
pm_runtime_put_sync_autosuspend(), leaving the runtime PM usage
counter permanently elevated and preventing the device from entering
runtime suspend.

Refactor the function to use a unified error path that calls
pm_runtime_put_sync_autosuspend() before returning.

Cc: stable@vger.kernel.org
Fixes: c388f458bc34 ("hwrng: starfive - Add TRNG driver for StarFive SoC")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: atmel-ecc - drop dead code in atmel_ecdh_max_size

atmel_ecdh_init_tfm() always allocates ctx->fallback, so it is never
NULL in atmel_ecdh_max_size(). Remove the dead code and return
crypto_kpp_maxsize() directly.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: cavium/cpt - fix DMA cleanup using wrong loop index

The sg_cleanup error path used list[i] instead of list[j] when unmapping
DMA buffers, leaking successfully mapped entries and repeatedly unmapping
the failed one.

Fixes: c694b233295b ("crypto: cavium - Add the Virtual Function driver for CPT")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: marvell/octeontx - fix DMA cleanup using wrong loop index

The sg_cleanup path used list[i] instead of list[j] when unmapping DMA
buffers, leaking successfully mapped entries and repeatedly unmapping
the failed one.

Fixes: 10b4f09491bf ("crypto: marvell - add the Virtual Function driver for CPT")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

MAINTAINERS: make myself the maintainer of the Qualcomm QCE driver

Qualcomm wants to keep supporting and extending the crypto engine driver.
Thara has not been active for many months, so change the maintainer to
myself and upgrade the driver to Supported.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: amcc - convert irq_of_parse_and_map to platform_get_irq

Replace the deprecated irq_of_parse_and_map() call with the modern
platform_get_irq() in the probe function. This also improves error
handling: platform_get_irq() returns a negative errno on failure,
whereas irq_of_parse_and_map() returned 0.

Change the irq field in struct crypto4xx_core_device from u32 to int
to match the return type of platform_get_irq().

Assisted-by: opencode:big-pickle
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

crypto: sun4i-ss - Remove insecure and unused rng_alg

Remove sun4i_ss_rng, as it is insecure and unused:

- It has multiple vulnerabilities.  sun4i_ss_prng_seed() is missing
  locking and has a buffer overflow.  sun4i_ss_prng_generate() fails to
  fill the entire buffer with cryptographic random bytes, because it
  rounds the destination length down and also doesn't actually wait for
  the hardware to be ready before pulling bytes from it.

- No user of this code is known.  It's usable only theoretically via the
  "rng" algorithm type of AF_ALG.  But userspace actually just uses the
  actual Linux RNG (/dev/random etc) instead.  And rng_algs don't
  contribute entropy to the actual Linux RNG either.  (This may have
  been confused with hwrng, which does contribute entropy.)

The sun4i_ss_prng_seed() buffer overflow was reported by Tianchu Chen
and discovered by Atuin - Automated Vulnerability Discovery Engine

There's no point in fixing all these vulnerabilities individually when
this is unused code, so let's just remove it.

Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG")
Cc: stable@vger.kernel.org
Reported-by: Tianchu Chen <flynnnchen@tencent.com>
Closes: https://lore.kernel.org/r/af749a8447bd7f0e9dd26ca6c87e9c6afecb09d9@linux.dev/
Acked-by: Corentin LABBE <clabbe.montjoie@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

hwrng: xilinx - Move xilinx-rng into drivers/char/hw_random/

Since this file just implements a hwrng driver, move it into
drivers/char/hw_random/. Rename the kconfig option accordingly as well.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cxl/test: Add check after kzalloc() memory in alloc_mock_res()

alloc_mock_res() calls kzalloc() without checking the return value.
Add scope based resource management to deal with the allocated memory
cleanly.

Reported-by: sashiko-bot
Fixes: 67dcdd4d3b83 ("tools/testing/cxl: Introduce a mocked-up CXL port hierarchy")
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260611230305.197390-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

cxl/test: Unregister cxl_acpi in cxl_test_init() error path

In cxl_test_init(), Once cxl_mock_platform_device_add() succeeds, all
error paths after needs to call platform_device_unregister() instead of
platform_device_put() to clean up.

Fixes: 67dcdd4d3b83 ("tools/testing/cxl: Introduce a mocked-up CXL port hierarchy")
Reported-by: sashiko-bot
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20260611230355.198912-1-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Merge tag 'for-net-next-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

core:
- hci_sync: Add support for HCI_LE_Set_Host_Feature [v2]
- SMP: Use AES-CMAC library API
- sockets: convert to getsockopt_iter
- Add SPDX id lines to some source files

drivers:
- btintel_pcie: Support Product level reset
- btintel_pcie: Add support for smart trigger dump
- btintel_pcie: Add 50 ms delay before MAC init on BlazarIW
- btintel_pcie: Separate coredump work from RX work
- btmtk: add event filter to filter specific event
- btrtl: fix RTL8761B/BU broken LE extended scan
- btusb: Add Realtek RTL8922AE VID/PID 0bda/d922
- btusb: Add Realtek RTL8922AE VID/PID 0bda/d923
- btusb: MT7922: Add VID/PID 0e8d/223c
- btusb: MT7925: Add VID/PID 0e8d/8c38
- btusb: Add support for TP-Link TL-UB250
- btusb: Add Mercusys MA530 for Realtek RTL8761BUV
- btusb: Add TP-Link UB600 for Realtek 8761BUV
- btusb: Add support for Intel Lizard Peak 2 (0x8087:0x0040)
- btusb: Add USB ID 2c4e:0128 for Mercusys MA60XNB
- btusb: MT7925: Add VID/PID 13d3/3609

* tag 'for-net-next-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (49 commits)
  Bluetooth: btintel_pcie: Separate coredump work from RX work
  Bluetooth: btmtksdio: fix infinite loop in btmtksdio_txrx_work()
  Bluetooth: qca: Add BT FW build version to kernel log
  Bluetooth: vhci: validate devcoredump state before side effects
  Bluetooth: L2CAP: validate connectionless PSM length
  Bluetooth: hci: validate codec capability element length
  Bluetooth: L2CAP: Fix UAF in channel timeout by holding conn ref
  Bluetooth: btintel_pcie: Load IOSF debug regs by controller variant
  Bluetooth: btintel_pcie: Add 50 ms delay before MAC init on BlazarIW
  Bluetooth: Add SPDX id lines to some source files
  Bluetooth: btintel_pcie: Add support for smart trigger dump
  Bluetooth: hci_h5: reset hci_uart::priv in the close() method
  Bluetooth: btusb: clean up probe error handling
  Bluetooth: btusb: fix wakeup irq devres lifetime
  Bluetooth: btusb: fix wakeup source leak on probe failure
  Bluetooth: btusb: fix use-after-free on marvell probe failure
  Bluetooth: btusb: fix use-after-free on registration failure
  Bluetooth: btmtk: fix URB leak in alloc_mtk_intr_urb error path
  Bluetooth: hci_core: Fix UAF in hci_unregister_dev()
  Bluetooth: hci_event: fix simultaneous discovery stuck in FINDING
  ...
====================

Link: https://patch.msgid.link/20260611183358.176776-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nfc-net-next-20260611' of https://codeberg.org/linux-nfc/linux

David Heidelberg says:

====================
NFC updates for net-next 20260611

- nxp-nci: Add ISO15693 support
- nxp-nci: treat -ENXIO in IRQ thread as no data available
- nci: uart: Constify struct tty_ldisc_ops
- trf7970a: fix comment typos
- Use named initializers for struct i2c_device_id
- MAINTAINERS: Update address for David Heidelberg

* tag 'nfc-net-next-20260611' of https://codeberg.org/linux-nfc/linux:
  MAINTAINERS: Update address for David Heidelberg
  nfc: Use named initializers for struct i2c_device_id
  nfc: nxp-nci: treat -ENXIO in IRQ thread as no data available
  nfc: nxp-nci: Add ISO15693 support
  nfc: nci: uart: Constify struct tty_ldisc_ops
  nfc: trf7970a: fix comment typos
====================

Link: https://patch.msgid.link/1aed7555-3d24-413c-b284-bc85fdd33055@ixit.cz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

io_uring/zcrx: kill dead 'sock' member in struct io_zcrx_args

This member is only ever assigned, never read. Kill it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge branch 'tipc-fix-netlink-gate-and-receive-path-bugs'

Michael Bommarito says:

====================
tipc: fix netlink gate and receive-path bugs

This is v4 of the public TIPC series. The only change from v3 is in
patch 1: TIPC_NL_MEDIA_SET now uses GENL_UNS_ADMIN_PERM like the other
mutators, instead of GENL_ADMIN_PERM, so the whole series uses the
namespace-aware CAP_NET_ADMIN check that matches the legacy TIPC netlink
path. Patches 2 and 3 are unchanged.

Patch 1 gives the TIPCv2 mutating generic-netlink operations the admin
gate the legacy API already has, so a local unprivileged process can no
longer change TIPC state. Patch 2 drops CONN_ACK messages that
acknowledge more outstanding sends than exist, preventing the
snt_unacked underflow. Patch 3 rejects peer bindings with lower > upper,
which would otherwise leak binding-table memory.
====================

Link: https://patch.msgid.link/20260610124003.3831170-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tipc: reject inverted service ranges from peer bindings

tipc_update_nametbl() inserts a binding advertised by a peer node using
the lower and upper service-range bounds taken directly from the wire,
without checking that lower <= upper. The local bind path validates the
ordering (tipc_uaddr_valid()), but the name-distribution path does not.

A binding with lower > upper is inserted at the far end of the
service-range rbtree (keyed on lower) where no lookup or withdrawal can
ever match it (service_range_foreach_match() requires sr->lower <= end).
The publication, its service_range node and the augmented rbtree entry
are then leaked for the lifetime of the namespace, and there is no
per-peer cap equivalent to TIPC_MAX_PUBL on locally created bindings.

Reject inverted ranges in the network path as well. A peer node can
otherwise leak unbounded binding-table memory by sending PUBLICATION
items with lower > upper.

Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20260610124003.3831170-4-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tipc: prevent snt_unacked underflow on CONN_ACK

tipc_sk_conn_proto_rcv() subtracts the peer-supplied connection ack count
from the unsigned 16-bit send counter snt_unacked without checking that it
does not exceed the number of messages actually outstanding:

tsk->snt_unacked -= msg_conn_ack(hdr);

msg_conn_ack() is read straight from a received CONN_MANAGER/CONN_ACK
message. If the ack count is larger than snt_unacked, the subtraction
wraps to a near-maximum value, leaving tsk_conn_cong() permanently true
and starving the connection of further transmits.

Validate the ACK count at the start of the CONN_ACK block and drop the
message if it acknowledges more messages than are outstanding. A peer (or,
for a local connection, the connected peer socket) can otherwise wedge a
TIPC connection's send side by sending an oversized connection ack.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20260610124003.3831170-3-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tipc: require net admin for TIPCv2 netlink mutators

TIPCv2 registers mutating generic-netlink operations without admin
permission flags. Generic netlink only checks CAP_NET_ADMIN when an
operation sets GENL_ADMIN_PERM or GENL_UNS_ADMIN_PERM, so a local
unprivileged process can currently change TIPC state through commands
such as TIPC_NL_NET_SET, TIPC_NL_KEY_SET, TIPC_NL_KEY_FLUSH, and
bearer enable/disable.

The legacy TIPC netlink API already checks netlink_net_capable(...,
CAP_NET_ADMIN) for administrative commands. Give the TIPCv2 mutators
the equivalent generic-netlink gate. Use GENL_UNS_ADMIN_PERM, which
maps to the same namespace-aware CAP_NET_ADMIN check that
netlink_net_capable() performs, so the behaviour matches the legacy
path and keeps working for CAP_NET_ADMIN holders in a non-initial user
namespace (containers).

A QEMU/KASAN repro run as uid/gid 65534 with zero effective
capabilities previously succeeded in changing the network id and node
identity, setting and flushing key material, and enabling/disabling a
UDP bearer. With this patch applied the same operations fail with
-EPERM.

Fixes: 0655f6a8635b ("tipc: add bearer disable/enable to new netlink api")
Link: https://lore.kernel.org/all/20260604163102.2658553-1-dominik.czarnota@trailofbits.com/
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20260610124003.3831170-2-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: simplify WAN device check in airoha_dev_init()

airoha_register_gdm_devices() iterates eth->ports[] in order, so GDM2's
netdev is always registered before GDM3/GDM4. This means the explicit
check for eth->ports[1] && eth->ports[1]->devs[0] is a redundant
special-case of what airoha_get_wan_gdm_dev() already covers, since
GDM2 is always marked as WAN during its own ndo_init.
Remove the redundant check and rely solely on airoha_get_wan_gdm_dev()
which handles both the GDM2-present and GDM2-absent cases.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260610-airoha-eth-simplify-dev-init-v2-1-8f244e69b0d4@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: sch_hfsc: Don't make class passive twice

update_vf() is called from two places for the same class during a single
dequeue when the class's child qdisc (e.g. codel/fq_codel) drops its last
packets while dequeuing:

1. The child calls qdisc_tree_reduce_backlog(), which, now that the child
   is empty, invokes hfsc_qlen_notify() -> update_vf(cl, 0, 0) and turns
   the class passive (cl_nactive is decremented up the hierarchy).

2. hfsc_dequeue() then calls update_vf(cl, qdisc_pkt_len(skb), cur_time)
   to charge the dequeued bytes.

On the second call the class is already passive, but its child qdisc is
still empty, so update_vf() arms go_passive again:

      if (cl->qdisc->q.qlen == 0 && cl->cl_flags & HFSC_FSC)
              go_passive = 1;

The leaf is then skipped by the cl_nactive == 0 check inside the loop,
which does not clear go_passive, so the stale go_passive propagates to the
parent and decrements its cl_nactive a second time. A parent that still
has other active children is driven to cl_nactive == 0 and removed from
the vttree, even though those siblings are still backlogged. They are
never dequeued again and the qdisc stalls.

Fix this by only arming go_passive when the class is actually active, so an
already-passive class no longer triggers a second passive transition. The
byte accounting (cl->cl_total += len) still runs for every ancestor, so
dequeued bytes continue to be counted exactly once.

Fixes: 51eb3b65544c ("sch_hfsc: make hfsc_qlen_notify() idempotent")
Reported-by: Anirudh Gupta <anirudhrudr@gmail.com>
Closes: https://lore.kernel.org/netdev/CAN2cbVe79oj0O9==m4+4x3v+O+qzRagA=2=wkrp9i9=CqYvyZA@mail.gmail.com/
Tested-by: Anirudh Gupta <anirudhrudr@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20260610132824.3027549-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Stop leased rxq before uninstalling its memory provider

netif_rxq_cleanup_unlease() tears down the memory provider that was
installed on a physical RX queue through a netkit queue lease. It
currently revokes the provider's DMA mappings before stopping the
physical queue:

__netif_mp_uninstall_rxq(virt_rxq, p); /* DMA unmap */
__netif_mp_close_rxq(phys_rxq->dev, rxq_idx, p); /* queue stop */

This inverts the ordering used by the regular teardown paths (normal
device unregister and the io_uring zcrx close path), which stop the
queue before revoking the provider's mappings.

With the physical queue still live, its NAPI can keep consuming
net_iov entries from the page_pool alloc cache after the
__netif_mp_uninstall_rxq() has already cleared their dma_addr,
opening a window for the device to DMA to a stale or zero address.

Fix it by swapping the two calls so the queue is stopped (and its
NAPI quiesced) before the provider is uninstalled. No functional
regression was observed across repeated runs of the nk_qlease.py
HW selftest, which exercises the lease teardown path; this was
tested against fbnic QEMU emulation.

Fixes: 5602ad61ebee ("net: Proxy netif_mp_{open,close}_rxq for leased queues")
Reported-by: Ahmed Abdelmoemen <ahmedabdelmoumen05@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Wei <dw@davidwei.uk>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260609212240.677889-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: fix refcount leak in mlxsw_sp_vrs_lpm_tree_replace()

When mlxsw_sp_vrs_lpm_tree_replace() fails after replacing some VRs,
the error rollback loop does not correctly revert the preceding
replacements. The loop decrements the index but fails to update the
vr pointer, which still points to the VR that caused the failure. As
a result, the condition and the rollback call always operate on the
same VR, potentially calling mlxsw_sp_vr_lpm_tree_replace() multiple
times on it while never rolling back the earlier VRs. Those VRs
continue to hold a reference to new_tree acquired via
mlxsw_sp_lpm_tree_hold(), leaking the reference count of new_tree.

Fix by reinitializing vr inside the error loop with the updated index:

vr = &mlxsw_sp->router->vrs[i];

so that the loop correctly iterates over all VRs that were actually
replaced.

Cc: stable@vger.kernel.org
Fixes: fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260609084730.215732-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: fix refcount leak in mlxsw_sp_port_lag_join()

When mlxsw_sp_port_lag_index_get() fails, mlxsw_sp_port_lag_join()
returns an error without releasing the lag reference obtained by
the earlier mlxsw_sp_lag_get(). All other error paths in the
function jump to the cleanup label that ends with
mlxsw_sp_lag_put(), so this is a single missed release.

Fix the leak by replacing the bare 'return err' with a goto to the
existing error cleanup label, which will drop the reference safely.

Cc: stable@vger.kernel.org
Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260609083709.209743-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ksz87xx-add-support-for-low-loss-cable-equalizer-errata'

Fidelio Lawson says:

====================
ksz87xx: add support for low-loss cable equalizer errata

This patch implements the KSZ87xx short cable erratum
described in Microchip document DS80000687C for KSZ87xx switches
and the following support article:

Link: https://support.microchip.com/s/article/Solution-for-Using-CAT-5E-or-CAT-6-Short-Cable-with-a-Link-Issue-for-the-KSZ8795-Family
According to the erratum, the embedded PHY receiver in KSZ87xx switches is
tuned by default for long, high-loss Ethernet cables. When operating with
short or low-loss cables (for example CAT5e or CAT6), the PHY equalizer may
over-amplify the incoming signal, leading to internal distortion and link
establishment failures.

Microchip documents two independent mechanisms to mitigate this issue:
adjusting the receiver low‑pass filter bandwidth and reducing the DSP
equalizer initial value. These registers are located in the switch’s
internal LinkMD table and cannot be accessed directly through a
stand‑alone PHY driver.

To keep the PHY‑facing API clean, this series models the erratum handling
as vendor‑specific Clause 22 PHY registers, virtualized by the KSZ8 DSA
driver. Accesses are intercepted by ksz8_r_phy() / ksz8_w_phy() and
translated into the appropriate indirect LinkMD register writes. The
erratum affects the shared PHY analog front‑end and therefore applies
globally to the switch.

Based on review feedback, the user‑visible interface is kept deliberately
simple and predictable:

- A boolean “short‑cable” PHY tunable applies a documented and
  conservative preset (LPF bandwidth 62MHz, DSP EQ initial value 0).
  This is the recommended KISS interface for the common short‑cable
  scenario.

- Two additional integer PHY tunables allow advanced or experimental
  tuning of the LPF bandwidth and the DSP EQ initial value. These
  controls are orthogonal, have no ordering requirements, and simply
  override the corresponding setting when written.

The tunables act as simple setters with no implicit state machine or
invalid combinations, avoiding surprises for userspace and not relying
on extended error reporting or netlink ethtool support.

This series contains:

  1. Support for the KSZ87xx low‑loss cable erratum in the KSZ8 DSA driver,
     including the short‑cable preset and orthogonal tuning controls.

  2. Addition of vendor‑specific PHY tunable identifiers for the
     short‑cable preset, LPF bandwidth, and DSP EQ initial value.

  3. Exposure of these tunables through the Micrel PHY driver via
     get_tunable / set_tunable callbacks.

This version follows the design agreed upon during v3 review and
reworks the interface accordingly.
====================

Link: https://patch.msgid.link/20260609-ksz87xx_errata_low_loss_connections-v10-0-9ba4418cf3db@exotec.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: micrel: expose KSZ87xx low-loss cable tunables

Add support for the KSZ87xx low-loss cable PHY tunables in the Micrel
PHY driver by implementing get_tunable and set_tunable callbacks.

These callbacks expose vendor-specific PHY tunables used to control the
KSZ87xx embedded PHY receiver behavior when operating with short or
low-loss Ethernet cables. The tunables provide:

- a boolean short-cable preset applying known good settings;
- an integer LPF bandwidth control;
- an integer DSP EQ initial value control.

The Micrel PHY driver forwards these tunables via standard phy_read() /
phy_write() operations, which are virtualized by the KSZ8 DSA driver and
translated into the appropriate indirect switch register accesses.

Reviewed-by: Marek Vasut <marex@nabladev.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
Link: https://patch.msgid.link/20260609-ksz87xx_errata_low_loss_connections-v10-3-9ba4418cf3db@exotec.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: add KSZ87xx low-loss cable PHY tunables

Introduce vendor-specific PHY tunable identifiers to control the
KSZ87xx low-loss cable erratum handling through the ethtool PHY
tunable interface.

The following tunables are added:

- a boolean "short-cable" tunable, applying a documented and
  conservative preset intended for short or low-loss Ethernet cables;

- an integer LPF bandwidth tunable, allowing advanced adjustment of the
  receiver low-pass filter bandwidth;

- an integer DSP EQ initial value tunable, allowing advanced tuning of
  the PHY equalizer initialization.

The actual behavior is implemented by the corresponding PHY and switch
drivers.

Reviewed-by: Marek Vasut <marex@nabladev.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
Link: https://patch.msgid.link/20260609-ksz87xx_errata_low_loss_connections-v10-2-9ba4418cf3db@exotec.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: microchip: implement KSZ87xx Module 3 low-loss cable errata

Implement the KSZ87xx short cable workaround.

This patch implements the KSZ87xx short cable erratum
described in Microchip document DS80000687C for KSZ87xx switches
and the following support article:

Link: https://support.microchip.com/s/article/Solution-for-Using-CAT-5E-or-CAT-6-Short-Cable-with-a-Link-Issue-for-the-KSZ8795-Family
The issue affects short or low-loss cable links (e.g. CAT5e/CAT6),
where the PHY receiver equalizer may amplify high-amplitude signals
excessively, resulting in internal distortion and link establishment
failures.

KSZ87xx devices require a workaround for the Module 3 low-loss cable
condition, controlled through the switch TABLE_LINK_MD_V indirect
registers.

This change models the erratum handling as vendor-specific Clause 22 PHY
registers, virtualized by the KSZ8 DSA driver and accessed via
ksz8_r_phy() / ksz8_w_phy(). The following controls are provided:

- A boolean “short-cable” preset, which applies a documented and
  conservative configuration (LPF 62 MHz bandwidth and DSP EQ initial
  value 0), and is the recommended interface for typical use cases.

- Separate LPF bandwidth and DSP EQ initial value controls intended for
  advanced or experimental tuning. These are orthogonal and independent,
  and override the corresponding settings without requiring any specific
  ordering.

The preset and tunables act as simple setters with no implicit state
machine or invalid combinations, keeping the API predictable and aligned
with the KISS principle.

The erratum affects the shared PHY analog front-end and therefore applies
globally to the switch.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Marek Vasut <marex@nabladev.com>
Signed-off-by: Fidelio Lawson <fidelio.lawson@exotec.com>
Link: https://patch.msgid.link/20260609-ksz87xx_errata_low_loss_connections-v10-1-9ba4418cf3db@exotec.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net/openvswitch: add flow modify test

Add mod_flow() and the mod-flow CLI command to ovs-dpctl.py, exercising
OVS_FLOW_CMD_SET. Add test_flow_set which first modifies an existing
flow with new actions and verifies the change via traffic, then modifies
the same flow without actions and verifies the kernel handles the
no-actions case gracefully.

The no-actions path is unreachable from userspace OVS tools (dpctl
mod-flow requires actions) but reachable via raw netlink. This is the
code path where Adrian Moreno found a possible kfree_skb of ERR_PTR
when reply allocation fails after locking.

Make parse() skip OVS_FLOW_ATTR_ACTIONS when actstr is None so the
kernel enters the post-lock allocation branch in ovs_flow_cmd_set().
After the no-actions set, verify via dump-flows that the flow retained
its drop action.

Suggested-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Minxi Hou <houminxi@gmail.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20260609165725.107484-1-houminxi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmgenet: convert RX path to page_pool

Replace the per-packet __netdev_alloc_skb() + dma_map_single() in the
RX path with page_pool. SKBs are built from pool pages via
napi_build_skb() with skb_mark_for_recycle() so the network stack
returns pages to the pool, and DMA mapping happens once per page
instead of once per packet.

Reject HW-reported lengths smaller than the RSB so a runt cannot
underflow the SKB build path.

Drop the now-unused priv->rx_buf_len field and the rx_dma_failed soft
MIB counter (nothing increments it after the conversion). This
removes the "rx_dma_failed" entry from ethtool -S, which is a
user-visible change for monitoring tools that key on stat names.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Tested-by: Justin Chen <justin.chen@broadcom.com>
Link: https://patch.msgid.link/20260610114835.2225423-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: move get_sport() callback at the beginning of airoha_enable_gdm2_loopback()

Move the get_sport() callback invocation at the beginning of
airoha_enable_gdm2_loopback() routine in order to avoid leaving the
hardware in a partially configured state if get_sport() fails.
Previously, get_sport() was called after GDM2 forwarding, loopback,
channel, length, VIP and IFC registers had already been programmed.
A failure at that point would return an error leaving GDM2 with
loopback enabled but WAN port, PPE CPU port and flow control mappings
not configured.
Performing the get_sport() lookup before any register write guarantees
the routine either completes the full configuration sequence or exits
with no side effects on the hardware.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260608-airoha_enable_gdm2_loopback-minor-change-v1-1-1787a0f42b31@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-pm-drop-tcp-ts-with-add_addrv6-port'

Matthieu Baerts says:

====================
mptcp: pm: drop TCP TS with ADD_ADDRv6 + port

Up to this series, it was possible to add a "signal" MPTCP endpoint with
an IPv6 address and a port, or to directly request to send an ADD_ADDR
with a v6 address and a port, but the expected ADD_ADDR wasn't sent when
TCP timestamps was used for the connection.

In fact, such signalling option cannot be sent when TCP timestamps is
used due to a lack of option space: the limit is at 40 bytes, and, with
padding, TCP timestamps is taking 12 bytes, while an ADD_ADDR IPv6 +
port is taking 30 bytes. The selected solution here is to simply drop
the TCP timestamps option when such ADD_ADDR of 30 bytes needs to be
sent.

- Patches 1-3: small cleanups to avoid computing ADD/RM_ADDR twice.

- Patches 4-7: the new feature, controlled by a new sysctl knob.

- Patch 8: extra checks in the MPTCP Join selftests.

- Patches 9-15: A bunch of refactoring: renamed confusing helpers and
variables, and prevent future misused functions.
====================

Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-0-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: options: rst: drop unused skb parameter

It was passed since its introduction in commit dc87efdb1a5c ("mptcp: add
mptcp reset option support"), but never used.

Simply removes it.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-15-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: avoid using del_timer directly

mptcp_pm_announced_del_timer() removes the matched ADD_ADDR entry (if
found) from the ADD_ADDR list only if check_id is false. That's
dangerous, and not clear, because it means the caller should be free the
entry only in some cases, and it easy to miss that.

Instead, make it static, and call it from mptcp_pm_add_addr_echoed,
which is the only other case where mptcp_pm_add_addr_del_timer should be
called with check_id set to true. Bonus with that: a second call to
mptcp_pm_add_addr_lookup_by_addr() can be avoided.

Note that instead of adding the signature above to avoid a compilation
issue because this helper is called before the definition of the
function, the whole helper is moved above where it is first called. Its
content is untouched, except the addition of the 'static' keyboard.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-14-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: make mptcp_pm_add_addr_send_ack static

Only used in pm.c.

Note that the signature is added above: it is easier than moving the
code around, because this helper depends on mptcp_pm_schedule_work which
is declared below.

While at it, explicitly mark it as to be called while pm->lock is held.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-13-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: remove add_ prefix from timer

Similar to the two previous commits, using the 'add' prefix is
confusing, also confirmed by [1].

Now that the structure has been renamed to include 'add_addr' in its
name, easier to know the timer is linked to the ADD_ADDR, no need to
add the confusing prefix, or an unneeded longer one.

While at it, also update the ADD_ADDR timer helper to clearly specify it
is linked to ADD_ADDR, and it is not there to add a new timer.

Link: https://lore.kernel.org/20251117100745.1913963-1-edumazet@google.com
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-12-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: uniform announced addresses helpers

Similar to the previous commit, only using the 'add' or 'anno' prefixes
is confusing -- generally associated to the action of adding something,
or the Latin name for "year" -- and lack of uniformity.

This has been causing issues in the past, e.g. del_add_timer seemed to
suggest the goal is to delete a previously added timer.

Instead, use the mptcp_pm_announced_ prefix.

While at it, slightly improves some helpers:

- mptcp_lookup_anno_list_by_saddr: no need to specify what is used to do
  the lookup: mptcp_pm_announced_lookup.

- mptcp_pm_sport_in_anno_list: it doesn't just compare the port, but the
  whole address linked to the sublow: mptcp_pm_announced_has_ssk.

- mptcp_pm_alloc_anno_list: it allocates one item of the list, not a
  whole list: mptcp_pm_announced_alloc.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-11-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: rename add_entry structure to add_addr

Using only the 'add' prefix is confusing: does it refer to a generic
added entry or address, or specifically to ADD_ADDRs. Using add_addr
removes this confusion.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-10-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: use for_each_subflow helper

Similar to most places in the MPTCP code. So instead of passing the
subflow list and use list_for_each_entry(subflow, list, node), pass the
msk and use mptcp_for_each_subflow(msk, subflow).

That's clearer and more uniform with the rest.

While at it, add 'pm_' prefix for the exported one to easily identify
the origin. Plus replace 'lookup' by 'has', because a bool is returned.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-9-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: always check sent/dropped ADD_ADDRs

Before, they were only checked on demand, but it seems better to check
them each time received ADD_ADDRs are checked.

Errors are only reported when the counter exists, and the value is not
the expected one. This is similar to what is done in chk_join_nr: it
reduces the output, and avoids a lot of 'skip' when validating older
kernels. Also here, some tests need to adapt the default expected
counters, e.g. when ADD_ADDR echo are dropped on the reception side, or
it is not possible to send an ADD_ADDR due to the limited option space.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-8-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: validate ADD_ADDRv6 + TS + port

This validates the feature added by parent commit, where it is now
possible to send an ADD_ADDR with a v6 IP address and a port number,
while the connection is using TCP Timestamps.

This test is simply a copy of the previous one: "signal address with
port", but using IPv6 addresses. This test is only executed if the
add_addr_v6_port_drop_ts sysctl knob is available. If not, it means the
kernel doesn't support this feature.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-7-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: drop TCP TS with ADD_ADDRv6 + port

With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
taking 30 bytes, the 40-byte limit for the TCP options is reached. In
this case, it is then not possible to send the signal.

To be able to send this ADD_ADDR, the TCP timestamps option can now be
dropped. This is done, when needed by setting the *drop_ts parameter
from mptcp_established_options. This feature is controlled by a new
net.mptcp.add_addr_v6_port_drop_ts sysctl knob, enabled by default.

It is important to keep in mind that dropping the TCP timestamps option
for one packet of the connection could eventually disrupt some
middleboxes: even if it should be unlikely, they could drop the packet
or even block the connection. That's why this new feature can be
controlled by a sysctl knob.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/448
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-6-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: allow mptcp to drop TS for some packets

With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
taking 30 bytes, the 40-byte limit for the TCP options is reached. In
this case, it is then not possible to send the address signal.

The idea is to let MPTCP dropping the TCP-timestamps option for some
specific packets, to be able to send some specific pure ACK carrying >28
bytes of MPTCP options, like with this specific ADD_ADDR. A new
parameter is passed from tcp_established_options to the MPTCP side to
indicate if the TCP TS option is used, and if it should be dropped. The
next commit implements the part on MPTCP side, but split into two
patches to help TCP maintainers to identify the modifications on TCP
side. This feature will be controlled by a new add_addr_v6_port_drop_ts
MPTCP sysctl knob.

It is important to keep in mind that dropping the TCP timestamps option
for one packet of the connection could eventually disrupt some
middleboxes: even if it should be unlikely, they could drop the packet
or even block the connection. That's why this new feature will be
controlled by a sysctl knob.

Note that it would be technically possible to squeeze both options into
the header if the ADD_ADDR is first written, and then the TCP timestamps
without the NOPs preceding it. But this means more modifications on TCP
side, plus some middleboxes could still be disrupted by that.

In this implementation, an unused bit is used in mptcp_out_options
structure to avoid passing an address to a local variable. Reading and
setting it needs CONFIG_MPTCP, so the whole block now has this #if
condition: mptcp_established_options() is then no longer used without
CONFIG_MPTCP.

About alternatives, instead of passing a new boolean (has_ts), another
option would be to pass the whole option structure (opts), but
'struct tcp_out_options' is currently defined in tcp_output.c, and it
would need to be exported. Plus that means the removal of the TCP TS
option would be done on the MPTCP side, and not here on the TCP side.
It feels clearer to remove other TCP options from the TCP side, than
hiding that from the MPTCP side.

Yet an other alternative would be to pass the size already taken by the
other TCP options, and have a way to drop them all when needed. But this
feels better to target only the timestamps option where dropping it
should be safe, even if it is currently the only option that would be
set before MPTCP, when MPTCP is used.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-5-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: introduce add_addr_v6_port_drop_ts sysctl knob

This sysctl is going to be used in the next commits to drop TCP
timestamps option, to be able to send an ADD_ADDR with a v6 IP address
and a port number. It is enabled by default.

This knob is explicitly disabled in the MPTCP Join selftest, with the
"signal addr list progresses after tx drop" subtest, to continue
verifying the previous behaviour where the ADD_ADDR is not sent due to a
lack of space.

While at it, move syn_retrans_before_tcp_fallback down from struct
mptcp_pernet, to avoid creating another 3 bytes hole.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-4-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: avoid computing add_addr size twice

mptcp_add_addr_len helper was called twice: in mptcp_pm_add_addr_signal,
then just after in mptcp_established_options_add_addr. Both to check
the remaining space.

The second call is not needed: if there is not enough space,
mptcp_pm_add_addr_signal will return false, and the caller,
mptcp_established_options_add_addr, will do the same without re-checking
the size again. Instead, mptcp_pm_add_addr_signal can directly set the
size.

Note that the returned size can be negative when other suboptions are
dropped, e.g. to send an echo ADD_ADDR with a v4 address, and no port.

While at it:

- move mptcp_add_addr_len to pm.c, as it is now only used from there

- use 'int' in mptcp_add_addr_len for the size, instead of having a mix

- use a bool for 'ret' in mptcp_pm_add_addr_signal

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-3-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pm: avoid computing rm_addr size twice

mptcp_rm_addr_len helper was called twice: in mptcp_pm_rm_addr_signal,
then just after in mptcp_established_options_rm_addr. Both to check the
remaining space.

The second call is not needed: if there is not enough space,
mptcp_pm_rm_addr_signal will return false, and the caller,
mptcp_established_options_rm_addr, will do the same without re-checking
the size again. Instead, mptcp_pm_rm_addr_signal can directly set the
size.

While at it, move mptcp_rm_addr_len to pm.c, as it is now only used
there, once.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-2-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: options: suboptions sizes can be negative

Use a signed int for the returned size, because when other options are
dropped, the size can be negative, e.g. to send an echo ADD_ADDR with a
v4 address, and no port.

The behaviour is not changed, because it was working as expected with an
overflow. But it is clearer like this, and it will help later on.

Even if, for the moment, only the ADD_ADDR size can be negative in some
cases, a signed int is now used for all mptcp_established_options_*()
helpers, not to mismatch the type, and as a question of uniformity.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260605-net-next-mptcp-add-addr6-port-ts-v2-1-758e7ca73f4d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipq5018-add-and-enable-gephy-rx-and-tx-clocks'

George Moussalem says:

====================
IPQ5018: Add and enable GEPHY RX and TX clocks

This patch series addresses a missing hardware description issue for
the Qualcomm IPQ5018 Internal Ethernet PHY, where the data paths fail
to function correctly unless their dedicated RX and TX clocks are
explicitly enabled.

Further testing revealed that leaving these clocks unmanaged by the
kernel, they were inadvertently left enabled by the bootloader / QSDK
platform, which masked the issue. Testing a fresh network configuration
path exposed that the data link fails to work without explicit software
gating.

To correctly introduce the required multi-clock properties, the IPQ5018
binding definition must first be split away from the shared
qca,ar803x.yaml schema. This isolation is required because ar803x
references the generic ethernet-phy.yaml, which enforces a strict
single-clock limit constraint.

- Patch 1: Moves the clocks property and its restriction out of the
   generic ethernet-phy.yaml schema to individual bindings files
   that need it to allow for PHYs that require multiple clocks.
- Patch 2: Add clocks property to qca,ar803x.yaml for the IPQ5018 PHY.
- Patch 3: Updates the Qualcomm AT803x PHY driver framework to acquire,
   enable, and gate these clocks upon link state changes for
   runtime power optimization.
====================

Link: https://patch.msgid.link/20260608-ipq5018-gephy-clocks-v4-0-fb2ccd56894b@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: at803x: add RX and TX clock management for IPQ5018 PHY

Acquire and enable the RX and TX clocks for the IPQ5018 PHY.
These clocks are required for the PHY's datapath to function correctly.

Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Link: https://patch.msgid.link/20260608-ipq5018-gephy-clocks-v4-4-fb2ccd56894b@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: qca,ar803x: Add clocks for IPQ5018 PHY

Further testing revealed that the RX and TX clocks of the IPQ5018 PHY
need to be explicitly enabled. As such, add the required clocks to the
schema.

Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Link: https://patch.msgid.link/20260608-ipq5018-gephy-clocks-v4-2-fb2ccd56894b@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: ethernet-phy: increase max clock count to two

The clocks property has a restriction to maximum one.
Yet, some PHYs may require more than 1 clock such as the IPQ5018 PHY
which requires two clocks for RX and TX. As such, increase maxItems to
two.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Link: https://patch.msgid.link/20260608-ipq5018-gephy-clocks-v4-1-fb2ccd56894b@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6lowpan: fix NHC entry use-after-free on error path

lowpan_nhc_do_uncompression() looks up an NHC descriptor while holding
lowpan_nhc_lock.  If the descriptor has no uncompress callback, the error
path drops the lock before printing nhc->name.

lowpan_nhc_del() removes descriptors under the same lock and then relies
on synchronize_net() before the owning module can be unloaded.  That only
waits for net RX RCU readers.  lowpan_header_decompress() is also exported
and can be reached from callers that are not necessarily covered by the net
core RX critical section, for example the Bluetooth 6LoWPAN L2CAP receive
path.

This leaves a race where one task drops lowpan_nhc_lock in the error path,
another task unregisters and frees the matching descriptor after
synchronize_net() returns, and the first task then dereferences nhc->name
for the warning.

With the post-unlock window widened, KASAN reports:

  BUG: KASAN: slab-use-after-free in lowpan_nhc_do_uncompression+0x1f4/0x220
  Read of size 8
  lowpan_nhc_do_uncompression
  lowpan_header_decompress

Fix this by printing the warning before dropping lowpan_nhc_lock, so the
descriptor name is read while unregister is still excluded.  The malformed
packet is still rejected with -ENOTSUPP.

Fixes: 92aa7c65d295 ("6lowpan: add generic nhc layer interface")
Cc: stable@vger.kernel.org
Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
Reported-by: Ao Wang <wangao@seu.edu.cn>
Reported-by: Xuewei Feng <fengxw06@126.com>
Reported-by: Qi Li <qli01@tsinghua.edu.cn>
Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://patch.msgid.link/20260609080054.4541-1-zhaoyz24@mails.tsinghua.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fec: remove reference to nonexistent CONFIG_GILBARCONAP option

The CONFIG_GILBARCONAP option has never been defined by the kernel, but
is referred to by drivers/net/ethernet/freescale/fec_main.c. Remove this
reference to eliminate dead code.

Discovered while searching for CONFIG_* symbols referenced in code but
not defined in any Kconfig file.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260609045200.32606-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pfcp: allocate per-cpu tstats for PFCP netdevs

PFCP uses dev_get_tstats64() as its ndo_get_stats64 callback, but
pfcp_link_setup() does not request NETDEV_PCPU_STAT_TSTATS. The net
core therefore leaves dev->tstats NULL for PFCP devices.

Creating a PFCP rtnetlink device can immediately ask the new netdev for
stats while building the RTM_NEWLINK notification. That reaches
dev_get_tstats64() and dereferences the NULL dev->tstats pointer.

Set pcpu_stat_type to NETDEV_PCPU_STAT_TSTATS during PFCP link setup so
the net core allocates the storage expected by dev_get_tstats64().

Fixes: 76c8764ef36a ("pfcp: add PFCP module")
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260609232244.1602027.c569f6c530f6.pfcp-missing-tstats-link-create-oops@trailofbits.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: validate embedded address parameter length

sctp_verify_asconf() and sctp_verify_param() only validate ADD_IP, DEL_IP,
and SET_PRIMARY parameters against a fixed minimum size of sizeof(struct
sctp_addip_param) + sizeof(struct sctp_paramhdr). This ensures the outer
parameter is large enough to contain an embedded address parameter header,
but does not verify that the embedded address parameter's declared length
fits within the bounds of the outer parameter.

Later, sctp_process_param() and sctp_process_asconf_param() extract the
embedded address parameter and pass it to af->from_addr_param(), which uses
the address parameter length to parse the variable-length address payload.
A malformed peer can therefore advertise an embedded address parameter
length that exceeds the remaining bytes in the enclosing parameter.

Validate that addr_param->p.length does not exceed the space available
after the sctp_addip_param header before processing the embedded address
parameter. Reject malformed parameters when the embedded address length
extends beyond the enclosing parameter bounds.

This prevents out-of-bounds reads when parsing malformed parameters carried
in INIT or ASCONF processing paths.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: sashiko <sashiko-bot@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/7838b86b69f52add28808fb59034c8f992e97b2d.1781043268.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Convert mr_table.cache_resolve_queue_len to u32.

mr_table.cache_resolve_queue_len is always updated under
spin_lock_bh(&mfc_unres_lock).

Let's convert it to u32.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260609222013.1550355-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bridge: cfm: reject invalid CCM interval at configuration time

ccm_tx_work_expired() re-arms itself via queue_delayed_work() using
the configured exp_interval converted by interval_to_us(). When
exp_interval is BR_CFM_CCM_INTERVAL_NONE or out of range,
interval_to_us() returns 0, causing the worker to fire immediately in
a tight loop that allocates skbs until OOM.

Fix this by validating exp_interval at configuration time:

- Constrain IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL to the valid range
   [BR_CFM_CCM_INTERVAL_3_3_MS, BR_CFM_CCM_INTERVAL_10_MIN] in the
   netlink policy so userspace cannot set an invalid value.

- Reject starting CCM TX in br_cfm_cc_ccm_tx() when exp_interval has
   not yet been configured (defaults to 0 from kzalloc).

Fixes: 2be665c3940d ("bridge: cfm: Netlink SET configuration Interface.")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260609065116.2818837-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-fib-fix-two-use-after-free-in-drivers-during-rcu-dump'

Kuniyuki Iwashima says:

====================
net: fib: Fix two use-after-free in drivers during RCU dump.

syzbot reported fib_info UAF in netdevsim, and the same bug
exists in rocker and mlxsw.

Patch 1 fixes it, and Patch 2 fixes the same type of bug of
fib_rule.
====================

Link: https://patch.msgid.link/20260610061744.2030996-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: fib_rules: Don't dump dying fib_rule in fib_rules_dump().

rocker_router_fib_event() calls fib_rule_get() during RCU dump.

If the fib_rule is dying, refcount_inc() will complain about it.

Let's call refcount_inc_not_zero() in fib_rules_dump().

Fixes: 5d7bfd141924 ("ipv4: fib_rules: Dump FIB rules when registering FIB notifier")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260610061744.2030996-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: fib: Don't dump dying fib_info in fib_leaf_notify().

syzbot reported use-after-free in nsim_fib4_prepare_event(). [0]

The problem is that the following functions call fib_info_hold() /
refcount_inc() while dumping fib_info under RCU, which is unsafe.

  * mlxsw_sp_router_fib4_event()
  * rocker_router_fib_event()
  * nsim_fib4_prepare_event()

refcount_inc_not_zero() must be used, but it would be too late
there.

Let's guarantee the lifetime of fib_info in fib_leaf_notify().

Note that IPv6 does not need the corresponding change since
fib6_table_dump() holds fib6_table.tb6_lock.

[0]:
refcount_t: addition on 0; use-after-free.
WARNING: lib/refcount.c:25 at refcount_warn_saturate+0x9f/0x110 lib/refcount.c:25, CPU#0: kworker/u8:15/3420
Modules linked in:
CPU: 0 UID: 0 PID: 3420 Comm: kworker/u8:15 Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: netns cleanup_net
RIP: 0010:refcount_warn_saturate+0x9f/0x110 lib/refcount.c:25
Code: eb 66 85 db 74 3e 83 fb 01 75 4c e8 1b f1 22 fd 48 8d 3d 84 cb f1 0a 67 48 0f b9 3a eb 4a e8 08 f1 22 fd 48 8d 3d 81 cb f1 0a <67> 48 0f b9 3a eb 37 e8 f5 f0 22 fd 48 8d 3d 7e cb f1 0a 67 48 0f
RSP: 0018:ffffc9000f2c7270 EFLAGS: 00010293
RAX: ffffffff84a18858 RBX: 0000000000000002 RCX: ffff888032ff9ec0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8f9353e0
RBP: 0000000000000000 R08: ffff888032ff9ec0 R09: 0000000000000005
R10: 0000000000000100 R11: 0000000000000004 R12: ffff8880570cc000
R13: dffffc0000000000 R14: ffff88802b40563c R15: ffff8880570cc000
FS:  0000000000000000(0000) GS:ffff888126173000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb1f4d5d000 CR3: 000000006072a000 CR4: 00000000003526f0
Call Trace:
<TASK>
__refcount_add include/linux/refcount.h:-1 [inline]
__refcount_inc include/linux/refcount.h:366 [inline]
refcount_inc include/linux/refcount.h:383 [inline]
fib_info_hold include/net/ip_fib.h:629 [inline]
nsim_fib4_prepare_event drivers/net/netdevsim/fib.c:930 [inline]
nsim_fib_event_schedule_work drivers/net/netdevsim/fib.c:1000 [inline]
nsim_fib_event_nb+0x1055/0x1240 drivers/net/netdevsim/fib.c:1043
call_fib_notifier+0x45/0x80 net/core/fib_notifier.c:25
call_fib_entry_notifier net/ipv4/fib_trie.c:90 [inline]
fib_leaf_notify net/ipv4/fib_trie.c:2176 [inline]
fib_table_notify net/ipv4/fib_trie.c:2194 [inline]
fib_notify+0x36b/0x5e0 net/ipv4/fib_trie.c:2217
fib_net_dump net/core/fib_notifier.c:70 [inline]
register_fib_notifier+0x184/0x360 net/core/fib_notifier.c:108
nsim_fib_create+0x85d/0x9f0 drivers/net/netdevsim/fib.c:1596
nsim_dev_reload_create drivers/net/netdevsim/dev.c:1604 [inline]
nsim_dev_reload_up+0x374/0x7c0 drivers/net/netdevsim/dev.c:1058
devlink_reload+0x501/0x8d0 net/devlink/dev.c:475
devlink_pernet_pre_exit+0x1ff/0x420 net/devlink/core.c:558
ops_pre_exit_list net/core/net_namespace.c:161 [inline]
ops_undo_list+0x187/0x940 net/core/net_namespace.c:234
cleanup_net+0x56e/0x800 net/core/net_namespace.c:702
process_one_work kernel/workqueue.c:3314 [inline]
process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Fixes: 0ae3eb7b4611 ("netdevsim: fib: Perform the route programming in a non-atomic context")
Fixes: c3852ef7f2f8 ("ipv4: fib: Replay events when registering FIB notifier")
Reported-by: syzbot+cb2aa2390ac024e25f5c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a290011.39669fcc.33b062.00b1.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260610061744.2030996-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnx2x: fix resource leaks in bnx2x_init_one() error paths

bnx2x_init_one() falls through to the common memory cleanup path for
several failures after probe has already acquired additional resources.

If register_netdev() fails after bnx2x_set_int_mode(), MSI/MSI-X remains
enabled. If later failures happen after bnx2x_iov_init_one(), PF SR-IOV
state can be left allocated. Also, failures after bnx2x_vfpf_acquire()
must release the PF resources before freeing the VF-PF mailbox allocated
by bnx2x_vf_pci_alloc().

Add error labels matching the resource acquisition order so probe failure
disables MSI/MSI-X, removes SR-IOV state, releases VF-PF resources,
deallocates VF PCI resources, and then frees the common driver memory.
Also clear PCI drvdata before freeing the netdev on probe failure.

Cc: stable+noautosel@kernel.org # untested fix to unlikely error path
Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260609074610.1968721-1-lihaoxiang@isrc.iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: cls_flow: Dont expose folded kernel pointers

The flow classifier falls back to addr_fold() for fields that are missing
from packet headers. In map mode, userspace controls mask, xor, rshift,
addend and divisor, and can observe the resulting classid through class
statistics. This allows a tc classifier in a user/network namespace to
recover the 32-bit folded value of skb->sk, skb_dst() or skb_nfct().

Align with standard kernel practices for pointer hashing and replace the
XOR folding with a keyed siphash (which is cryptographically secure)

Fixes: e5dfb815181f ("[NET_SCHED]: Add flow classifier")
Reported-by: Kyle Zeng <kylebot@openai.com>
Tested-by: Kyle Zeng <kylebot@openai.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260610101839.14135-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: qca8k: fix led devicename when using external mdio bus

The qca8k dsa switch can use either an external or internal mdio bus.
This depends on whether the mdio node is defined under the switch node
itself. Upon registering the internal mdio bus, the internal_mdio_bus
of the dsa switch is assigned to this bus. When an external mdio bus is
used, the driver still uses the internal_mdio_bus id which is used to
create the device names of the leds.
This leads to the leds being prefixed with '(efault)' as the
internal_mii_bus is null. So let's fix this by adding a null check and
use the devicename of the external bus instead when an external bus is
configured.

Fixes: 1e264f9d2918 ("net: dsa: qca8k: add LEDs basic support")
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260608-qca8k-leds-fix-v3-1-a915bb2f37ae@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.1-rc8).

Conflicts:

drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c
  f67aead16e85 ("net: txgbe: rework service event handling")
  57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices")

net/rds/info.c
  512db8267b73 ("rds: mark snapshot pages dirty in rds_info_getsockopt()")
  6e94eeb2a2a6 ("rds: convert to getsockopt_iter")

Adjacent changes:

include/net/sock.h
  1ee90b77b727 ("net: guard timestamp cmsgs to real error queue skbs")
  f0de88303d5e ("net: make is_skb_wmem() available to modules")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'dma-mapping-7.1-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fix from Marek Szyprowski:
"Three more fixes for the DMA-mapping code, related to PCI P2PDMA, DMA
  debug and DMA link ranges API (Li RongQing and Jason Gunthorpe)"

* tag 'dma-mapping-7.1-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  iommu/dma: Do not try to iommu_map a 0 length region in swiotlb
  dma-debug: fix physical address retrieval in debug_dma_sync_sg_for_device
  dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments

Merge tag 'sunxi-dt-for-7.2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into soc/dt

Allwinner device tree changes for 7.2 - Take 2

Some changes for old chips and some for recent ones.

- A83T gained the MIPI CSI-2 receiver
- overlays enabled for Pine64 boards
- D1s / T113 and H616 gained the high speed timer
- T113s watchdog enabled (for reboot)
- H616 gained proper SRAM regions
- A523 family gained EL2 virtual timer interrupt and GPADC
- A523 pinctrl IRQ fix

* tag 'sunxi-dt-for-7.2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: dts: allwinner: a523: Add missing GPIO interrupt
  arm64: dts: allwinner: a523: add gpadc node
  arm64: dts: allwinner: Add EL2 virtual timer interrupt
  ARM: dts: sun8i: a83t: Add MIPI CSI-2 controller node
  dt-bindings: media: sun6i-a31-isp: Add optional interconnect properties
  dt-bindings: media: sun6i-a31-csi: Add optional interconnect properties
  arm64: dts: allwinner: sun50i-a64: Enable DT overlays
  arm: dts: allwinner: t113s: enable watchdog for reboot
  arm64: dts: allwinner: h616: add hstimer node
  riscv: dts: allwinner: d1s-t113: add hstimer node
  arm64: dts: allwinner: sun50i-h616: Add SRAM nodes

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

MAINTAINERS: add Onur Özkan as Rust reviewer

Onur has been involved with the Rust for Linux project for a year now. He
works on the Tyr driver for Arm Mali GPUs [1] and has been driving the
`ww_mutex` series and the SRCU abstractions, as well as improving the
core Rust support in several areas.

In addition, he is already a reviewer of the `RUST [SYNC]` entry and has
been involved with upstream Rust -- for instance, he led the bootstrap
team for two years.

His expertise with the language and its toolchain will be very useful to
have around in the future. Thus add him to the `RUST` entry as reviewer.

Link: https://rust-for-linux.com/tyr-gpu-driver
Acked-by: Onur Özkan <work@onurozkan.dev>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260611055538.61425-4-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

MAINTAINERS: add Alexandre Courbot as Rust reviewer

Alexandre has been involved with the Rust for Linux project for more
than a year now. He is one of the main contributors to Nova [1], the
Rust driver for NVIDIA GPUs, and has authored core Rust infrastructure
motivated by that work, such as the `num` module with the `Bounded`
integer type, the `register!` and `bitfield!` macros, as well as
improvements to abstractions like DMA.

He maintains the nova-core driver, as well as the `RUST [NUM]`, `RUST
[BITFIELD]` and `RUST [INTEROP]` entries. In addition, he has been very
active reviewing Rust code in the mailing list.

He also proposed and implemented the `int_lowest_highest_one` feature
in the Rust standard library [2], which we should eventually use in
the kernel.

His experience maintaining a major Rust GPU driver and the abstractions
it needs will be very useful to have around in the future. Thus add him
to the `RUST` entry as reviewer.

Link: https://rust-for-linux.com/nova-gpu-driver
Link: https://github.com/rust-lang/rust/issues/145203
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260611055538.61425-3-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

MAINTAINERS: add Tamir Duberstein as Rust reviewer

Tamir has been involved with the Rust for Linux project for more than
a year and a half now. He has been working on improving the integration
between the kernel and the Rust language and tooling: he led the effort
to replace the kernel's own `CStr` type with the standard library's,
and reworked the rust-analyzer integration, among other things.

He is already the maintainer of the `RUST [RUST-ANALYZER]` and `XARRAY API
[RUST]` entries. In addition, he has been active reviewing Rust code in
the mailing list.

He is also a long-time contributor to the upstream Rust project, including
on topics that matter for the Linux kernel [1].

His expertise with the language and its tooling will be very useful to
have around in the future. Thus add him to the `RUST` entry as reviewer.

Link: https://github.com/rust-lang/rust/pull/139994
Acked-by: Tamir Duberstein <tamird@kernel.org>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260611055538.61425-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>