git.ipfire.org Git - thirdparty/kernel/stable.git/log

Merge branch 'guest-memfd-mmap' into HEAD

Add support for host userspace mapping of guest_memfd-backed memory for VM
types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't
precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have
no way to detect private vs. shared).

mmap() support paves the way for several evolving KVM use cases:

* Allows VMMs like Firecracker to run guests entirely backed by
  guest_memfd [1]. This provides a unified memory management model for
  both confidential and non-confidential guests, simplifying VMM design.

* Enhanced Security via direct map removal: When combined with Patrick's
  series for direct map removal [2], this provides additional hardening
  against Spectre-like transient execution attacks by eliminating the
  need for host kernel direct maps of guest memory.

* Lays the groundwork for *restricted* mmap() support for guest_memfd-backed
  memory on CoCo platforms [3] that permit in-place sharing of guest memory
   with the host.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory

Add a guest_memfd testcase to verify that a vCPU can fault-in guest_memfd
memory that supports mmap(), but that is not currently mapped into host
userspace and/or has a userspace address (in the memslot) that points at
something other than the target guest_memfd range. Mapping guest_memfd
memory into the guest is supposed to operate completely independently from
any userspace mappings.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: guest_memfd mmap() test when mmap is supported

Expand the guest_memfd selftests to comprehensively test host userspace
mmap functionality for guest_memfd-backed memory when supported by the
VM type.

Introduce new test cases to verify the following:

* Successful mmap operations: Ensure that MAP_SHARED mappings succeed
  when guest_memfd mmap is enabled.

* Data integrity: Validate that data written to the mmap'd region is
  correctly persistent and readable.

* fallocate interaction: Test that fallocate(FALLOC_FL_PUNCH_HOLE)
  correctly zeros out mapped pages.

* Out-of-bounds access: Verify that accessing memory beyond the
  guest_memfd's size correctly triggers a SIGBUS signal.

* Unsupported mmap: Confirm that mmap attempts fail as expected when
  guest_memfd mmap support is not enabled for the specific guest_memfd
  instance or VM type.

* Flag validity: Introduce test_vm_type_gmem_flag_validity() to
  systematically test that only allowed guest_memfd creation flags are
  accepted for different VM types (e.g., GUEST_MEMFD_FLAG_MMAP for
  default VMs, no flags for CoCo VMs).

The existing tests for guest_memfd creation (multiple instances, invalid
sizes), file read/write, file size, and invalid punch hole operations
are integrated into the new test_with_type() framework to allow testing
across different VM types.

Cc: James Houghton <jthoughton@google.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Shivank Garg <shivankg@amd.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-24-seanjc@google.com>
[Fix default vm_types to use BIT() - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Do not use hardcoded page sizes in guest_memfd test

Update the guest_memfd_test selftest to use getpagesize() instead of
hardcoded 4KB page size values.

Using hardcoded page sizes can cause test failures on architectures or
systems configured with larger page sizes, such as arm64 with 64KB
pages. By dynamically querying the system's page size, the test becomes
more portable and robust across different environments.

Additionally, build the guest_memfd_test selftest for arm64.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Allow and advertise support for host mmap() on guest_memfd files

Now that all the x86 and arm64 plumbing for mmap() on guest_memfd is in
place, allow userspace to set GUEST_MEMFD_FLAG_MMAP and advertise support
via a new capability, KVM_CAP_GUEST_MEMFD_MMAP.

The availability of this capability is determined per architecture, and
its enablement for a specific guest_memfd instance is controlled by the
GUEST_MEMFD_FLAG_MMAP flag at creation time.

Update the KVM API documentation to detail the KVM_CAP_GUEST_MEMFD_MMAP
capability, the associated GUEST_MEMFD_FLAG_MMAP, and provide essential
information regarding support for mmap in guest_memfd.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-22-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: Enable support for guest_memfd backed memory

Now that the infrastructure is in place, enable guest_memfd for arm64.

* Select CONFIG_KVM_GUEST_MEMFD in KVM/arm64 Kconfig.

* Enforce KVM_MEMSLOT_GMEM_ONLY for guest_memfd on arm64: Ensure that
  guest_memfd-backed memory slots on arm64 are only supported if they
  are intended for shared memory use cases (i.e.,
  kvm_memslot_is_gmem_only() is true). This design reflects the current
  arm64 KVM ecosystem where guest_memfd is primarily being introduced
  for VMs that support shared memory.

Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-21-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfd

Handle faults for memslots backed by guest_memfd in arm64 nested
virtualization triggered by VNCR_EL2.

* Introduce is_gmem output parameter to kvm_translate_vncr(), indicating
  whether the faulted memory slot is backed by guest_memfd.

* Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn().

* Update kvm_handle_vncr_abort() to handle potential guest_memfd errors.
  Some of the guest_memfd errors need to be handled by userspace instead
  of attempting to (implicitly) retry by returning to the guest.

Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: Handle guest_memfd-backed guest page faults

Add arm64 architecture support for handling guest page faults on memory
slots backed by guest_memfd.

This change introduces a new function, gmem_abort(), which encapsulates
the fault handling logic specific to guest_memfd-backed memory. The
kvm_handle_guest_abort() entry point is updated to dispatch to
gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
determined by kvm_slot_has_gmem()).

Until guest_memfd gains support for huge pages, the fault granule for
these memory regions is restricted to PAGE_SIZE.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-19-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: arm64: Refactor user_mem_abort()

Refactor user_mem_abort() to improve code clarity and simplify
assumptions within the function.

Key changes include:

* Immediately set force_pte to true at the beginning of the function if
  logging_active is true. This simplifies the flow and makes the
  condition for forcing a PTE more explicit.

* Remove the misleading comment stating that logging_active is
  guaranteed to never be true for VM_PFNMAP memslots, as this assertion
  is not entirely correct.

* Extract reusable code blocks into new helper functions:
  * prepare_mmu_memcache(): Encapsulates the logic for preparing and
    topping up the MMU page cache.
  * adjust_nested_fault_perms(): Isolates the adjustments to shadow S2
    permissions and the encoding of nested translation levels.

* Update min(a, (long)b) to min_t(long, a, b) for better type safety and
  consistency.

* Perform other minor tidying up of the code.

These changes primarily aim to simplify user_mem_abort() and make its
logic easier to understand and maintain, setting the stage for future
modifications.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Tao Chan <chentao@kylinos.cn>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory

Update the KVM MMU fault handler to service guest page faults
for memory slots backed by guest_memfd with mmap support. For such
slots, the MMU must always fault in pages directly from guest_memfd,
bypassing the host's userspace_addr.

This ensures that guest_memfd-backed memory is always handled through
the guest_memfd specific faulting path, regardless of whether it's for
private or non-private (shared) use cases.

Additionally, rename kvm_mmu_faultin_pfn_private() to
kvm_mmu_faultin_pfn_gmem(), as this function is now used to fault in
pages from guest_memfd for both private and non-private memory,
accommodating the new use cases.

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
[sean: drop the helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250729225455.670324-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Extend guest_memfd's max mapping level to shared mappings

Rework kvm_mmu_max_mapping_level() to consult guest_memfd for all mappings,
not just private mappings, so that hugepage support plays nice with the
upcoming support for backing non-private memory with guest_memfd.

In addition to getting the max order from guest_memfd for gmem-only
memslots, update TDX's hook to effectively ignore shared mappings, as TDX's
restrictions on page size only apply to Secure EPT mappings. Do nothing
for SNP, as RMP restrictions apply to both private and shared memory.

Suggested-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250729225455.670324-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Enforce guest_memfd's max order when recovering hugepages

Rework kvm_mmu_max_mapping_level() to provide the plumbing to consult
guest_memfd (and relevant vendor code) when recovering hugepages, e.g.
after disabling live migration.  The flaw has existed since guest_memfd was
originally added, but has gone unnoticed due to lack of guest_memfd support
for hugepages or dirty logging.

Don't actually call into guest_memfd at this time, as it's unclear as to
what the API should be.  Ideally, KVM would simply use kvm_gmem_get_pfn(),
but invoking kvm_gmem_get_pfn() would lead to sleeping in atomic context
if guest_memfd needed to allocate memory (mmu_lock is held).  Luckily,
the path isn't actually reachable, so just add a TODO and WARN to ensure
the functionality is added alongisde guest_memfd hugepage support, and
punt the guest_memfd API design question to the future.

Note, calling kvm_mem_is_private() in the non-fault path is safe, so long
as mmu_lock is held, as hugepage recovery operates on shadow-present SPTEs,
i.e. calling kvm_mmu_max_mapping_level() with @fault=NULL is mutually
exclusive with kvm_vm_set_mem_attributes() changing the PRIVATE attribute
of the gfn.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Hoist guest_memfd max level/order helpers "up" in mmu.c

Move kvm_max_level_for_order() and kvm_max_private_mapping_level() up in
mmu.c so that they can be used by __kvm_mmu_max_mapping_level().

Opportunistically drop the "inline" from kvm_max_level_for_order().

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: Rename .private_max_mapping_level() to .gmem_max_mapping_level()

Rename kvm_x86_ops.private_max_mapping_level() to .gmem_max_mapping_level()
in anticipation of extending guest_memfd support to non-private memory.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: guest_memfd: Track guest_memfd mmap support in memslot

Add a new internal flag, KVM_MEMSLOT_GMEM_ONLY, to the top half of
memslot->flags (which makes it strictly for KVM's internal use). This
flag tracks when a guest_memfd-backed memory slot supports host
userspace mmap operations, which implies that all memory, not just
private memory for CoCo VMs, is consumed through guest_memfd: "gmem
only".

This optimization avoids repeatedly checking the underlying guest_memfd
file for mmap support, which would otherwise require taking and
releasing a reference on the file for each check. By caching this
information directly in the memslot, we reduce overhead and simplify the
logic involved in handling guest_memfd-backed pages for host mappings.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: guest_memfd: Add plumbing to host to map guest_memfd pages

Introduce the core infrastructure to enable host userspace to mmap()
guest_memfd-backed memory. This is needed for several evolving KVM use
cases:

* Non-CoCo VM backing: Allows VMMs like Firecracker to run guests
  entirely backed by guest_memfd, even for non-CoCo VMs [1]. This
  provides a unified memory management model and simplifies guest memory
  handling.

* Direct map removal for enhanced security: This is an important step
  for direct map removal of guest memory [2]. By allowing host userspace
  to fault in guest_memfd pages directly, we can avoid maintaining host
  kernel direct maps of guest memory. This provides additional hardening
  against Spectre-like transient execution attacks by removing a
  potential attack surface within the kernel.

* Future guest_memfd features: This also lays the groundwork for future
  enhancements to guest_memfd, such as supporting huge pages and
  enabling in-place sharing of guest memory with the host for CoCo
  platforms that permit it [3].

Enable the basic mmap and fault handling logic within guest_memfd, but
hold off on allow userspace to actually do mmap() until the architecture
support is also in place.

[1] https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
[2] https://lore.kernel.org/linux-mm/cc1bb8e9bc3e1ab637700a4d3defeec95b55060a.camel@amazon.com
[3] https://lore.kernel.org/all/c1c9591d-218a-495c-957b-ba356c8f8e09@redhat.com/T/#u

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Enable KVM_GUEST_MEMFD for all 64-bit builds

Enable KVM_GUEST_MEMFD for all KVM x86 64-bit builds, i.e. for "default"
VM types when running on 64-bit KVM. This will allow using guest_memfd
to back non-private memory for all VM shapes, by supporting mmap() on
guest_memfd.

Opportunistically clean up various conditionals that become tautologies
once x86 selects KVM_GUEST_MEMFD more broadly. Specifically, because
SW protected VMs, SEV, and TDX are all 64-bit only, private memory no
longer needs to take explicit dependencies on KVM_GUEST_MEMFD, because
it is effectively a prerequisite.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Fix comment that refers to kvm uapi header path

The comment that points to the path where the user-visible memslot flags
are refers to an outdated path and has a typo.

Update the comment to refer to the correct path.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Fix comments that refer to slots_lock

Fix comments so that they refer to slots_lock instead of slots_locks
(remove trailing s).

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()

Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() to improve
clarity and accurately reflect its purpose.

The function kvm_slot_can_be_private() was previously used to check if a
given kvm_memory_slot is backed by guest_memfd. However, its name
implied that the memory in such a slot was exclusively "private".

As guest_memfd support expands to include non-private memory (e.g.,
shared host mappings), it's important to remove this association. The
new name, kvm_slot_has_gmem(), states that the slot is backed by
guest_memfd without making assumptions about the memory's privacy
attributes.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_HAVE_KVM_ARCH_GMEM_POPULATE

The original name was vague regarding its functionality. This Kconfig
option specifically enables and gates the kvm_gmem_populate() function,
which is responsible for populating a GPA range with guest data.

The new name, HAVE_KVM_ARCH_GMEM_POPULATE, describes the purpose of the
option: to enable arch-specific guest_memfd population mechanisms. It
also follows the same pattern as the other HAVE_KVM_ARCH_* configuration
options.

This improves clarity for developers and ensures the name accurately
reflects the functionality it controls, especially as guest_memfd
support expands beyond purely "private" memory scenarios.

Temporarily keep KVM_GENERIC_PRIVATE_MEM as an x86-only config so as to
minimize churn, and to hopefully make it easier to see what features
require HAVE_KVM_ARCH_GMEM_POPULATE. On that note, omit GMEM_POPULATE
for KVM_X86_SW_PROTECTED_VM, as regular ol' memset() suffices for
software-protected VMs.

As for KVM_GENERIC_PRIVATE_MEM, a future change will select KVM_GUEST_MEMFD
for all 64-bit KVM builds, at which point the intermediate config will
become obsolete and can/will be dropped.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Select TDX's KVM_GENERIC_xxx dependencies iff CONFIG_KVM_INTEL_TDX=y

Select KVM_GENERIC_PRIVATE_MEM and KVM_GENERIC_MEMORY_ATTRIBUTES directly
from KVM_INTEL_TDX, i.e. if and only if TDX support is fully enabled in
KVM. There is no need to enable KVM's private memory support just because
the core kernel's INTEL_TDX_HOST is enabled.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Select KVM_GENERIC_PRIVATE_MEM directly from KVM_SW_PROTECTED_VM

Now that KVM_SW_PROTECTED_VM doesn't have a hidden dependency on KVM_X86,
select KVM_GENERIC_PRIVATE_MEM from within KVM_SW_PROTECTED_VM instead of
conditionally selecting it from KVM_X86.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Have all vendor neutral sub-configs depend on KVM_X86, not just KVM

Make all vendor neutral KVM x86 configs depend on KVM_X86, not just KVM,
i.e. gate them on at least one vendor module being enabled and thus on
kvm.ko actually being built.  Depending on just KVM allows the user to
select the configs even though they won't actually take effect, and more
importantly, makes it all too easy to create unmet dependencies.  E.g.
KVM_GENERIC_PRIVATE_MEM can't be selected by KVM_SW_PROTECTED_VM, because
the KVM_GENERIC_MMU_NOTIFIER dependency is select by KVM_X86.

Hiding all sub-configs when neither KVM_AMD nor KVM_INTEL is selected also
helps communicate to the user that nothing "interesting" is going on, e.g.

  --- Virtualization
  <M>   Kernel-based Virtual Machine (KVM) support
  < >   KVM for Intel (and compatible) processors support
  < >   KVM for AMD processors support

Fixes: ea4290d77bda ("KVM: x86: leave kvm.ko out of the build if no vendor module is requested")
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Message-ID: <20250729225455.670324-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD

Rename the Kconfig option CONFIG_KVM_PRIVATE_MEM to
CONFIG_KVM_GUEST_MEMFD. The original name implied that the feature only
supported "private" memory. However, CONFIG_KVM_PRIVATE_MEM enables
guest_memfd in general, which is not exclusively for private memory.
Subsequent patches in this series will add guest_memfd support for
non-CoCo VMs, whose memory is not private.

Renaming the Kconfig option to CONFIG_KVM_GUEST_MEMFD more accurately
reflects its broader scope as the main Kconfig option for all
guest_memfd-backed memory. This provides clearer semantics for the
option and avoids confusion as new features are introduced.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-fixes-6.17-rc7' of https://github.com/kvm-x86/linux into HEAD

* Use array_index_nospec() to sanitize the target vCPU ID when handling PV
  IPIs and yields as the ID is guest-controlled.

* Drop a superfluous cpumask_empty() check when reclaiming SEV memory, as
  the common case, by far, is that at least one CPU will have entered the
  VM, and wbnoinvd_on_cpus_mask() will naturally handle the rare case where
  the set of have_run_cpus is empty.

* Rename the is_signed_type() macro in kselftest_harness.h to is_signed_var()
  to fix a collision with linux/overflow.h.  The collision generates compiler
  warnings due to the two macros having different implementations.

Linux 6.17-rc3

Merge tag 'i2c-for-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- hisi: update maintainership

- fix several issues in rtl9300 xfer:
     - check message length boundaries
     - correct multi-byte value composition on write
     - increase polling timeout
     - fix block transfer protocol

* tag 'i2c-for-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rtl9300: Add missing count byte for SMBus Block Ops
  i2c: rtl9300: Increase timeout for transfer polling
  i2c: rtl9300: Fix multi-byte I2C write
  i2c: rtl9300: Fix out-of-bounds bug in rtl9300_i2c_smbus_xfer
  MAINTAINERS: i2c: Update i2c_hisi entry

Merge tag 'perf_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

- Fix a case where the events throttling logic operates on inactive
events

* tag 'perf_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Avoid undefined behavior from stopping/starting inactive events

Merge tag 'x86_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Fix the GDS mitigation detection on some machines after the recent
   attack vectors conversion

- Filter out the invalid machine reset reason value -1 when running as
   a guest as in such cases the reason why the machine was rebooted does
   not make a whole lot of sense

- Init the resource control machinery on Hygon hw in order to avoid a
   division by zero and to actually enable the feature on hw which
   supports it

* tag 'x86_urgent_for_v6.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Fix GDS mitigation selecting when mitigation is off
  x86/CPU/AMD: Ignore invalid reset reason value
  x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper

Merge tag 'mips-fixes_6.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:
"Fix ethernet on Lantiq boards"

* tag 'mips-fixes_6.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: lantiq: xway: sysctrl: rename the etop node
mips: dts: lantiq: danube: add missing burst length property

Merge tag 'modules-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull modules fix from Daniel Gomez:
"This includes a fix part of the KSPP (Kernel Self Protection Project)
  to replace the deprecated and unsafe strcpy() calls in the kernel
  parameter string handler and sysfs parameters for built-in modules.
  Single commit, no functional changes"

* tag 'modules-6.17-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
  params: Replace deprecated strcpy() with strscpy() and memcpy()

Merge tag 'char-misc-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc/iio fixes from Greg KH:
"Here are a small number of char/misc/iio and other driver fixes for
  6.17-rc3.  Included in here are:

   - IIO driver bugfixes for reported issues

   - bunch of comedi driver fixes

   - most core bugfix

   - fpga driver bugfix

   - cdx driver bugfix

  All of these have been in linux-next this week with no reported
  issues"

* tag 'char-misc-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  most: core: Drop device reference after usage in get_channel()
  comedi: Make insn_rw_emulate_bits() do insn->n samples
  comedi: Fix use of uninitialized memory in do_insn_ioctl() and do_insnlist_ioctl()
  comedi: pcl726: Prevent invalid irq number
  cdx: Fix off-by-one error in cdx_rpmsg_probe()
  fpga: zynq_fpga: Fix the wrong usage of dma_map_sgtable()
  iio: pressure: bmp280: Use IS_ERR() in bmp280_common_probe()
  iio: light: as73211: Ensure buffer holes are zeroed
  iio: adc: rzg2l_adc: Set driver data before enabling runtime PM
  iio: adc: rzg2l: Cleanup suspend/resume path
  iio: adc: ad7380: fix missing max_conversion_rate_hz on adaq4381-4
  iio: adc: bd79124: Add GPIOLIB dependency
  iio: imu: inv_icm42600: change invalid data error to -EBUSY
  iio: adc: ad7124: fix channel lookup in syscalib functions
  iio: temperature: maxim_thermocouple: use DMA-safe buffer for spi_read()
  iio: adc: ad7173: prevent scan if too many setups requested
  iio: proximity: isl29501: fix buffered read on big-endian systems
  iio: accel: sca3300: fix uninitialized iio scan data

Merge tag 'usb-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 6.17-rc3 to resolve a bunch
  of reported issues. Included in here are:

   - typec driver fixes

   - dwc3 new device id

   - dwc3 driver fixes

   - new usb-storage driver quirks

   - xhci driver fixes

   - other tiny USB driver fixes to resolve bugs

  All of these have been in linux-next this week with no reported issues"

* tag 'usb-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: xhci: fix host not responding after suspend and resume
  usb: xhci: Fix slot_id resource race conflict
  usb: typec: fusb302: Revert incorrect threaded irq fix
  USB: core: Update kerneldoc for usb_hcd_giveback_urb()
  usb: typec: maxim_contaminant: re-enable cc toggle if cc is open and port is clean
  usb: typec: maxim_contaminant: disable low power mode when reading comparator values
  usb: dwc3: Remove WARN_ON for device endpoint command timeouts
  USB: storage: Ignore driver CD mode for Realtek multi-mode Wi-Fi dongles
  usb: storage: realtek_cr: Use correct byte order for bcs->Residue
  usb: chipidea: imx: improve usbmisc_imx7d_pullup()
  kcov, usb: Don't disable interrupts in kcov_remote_start_usb_softirq()
  usb: dwc3: pci: add support for the Intel Wildcat Lake
  usb: dwc3: Ignore late xferNotReady event to prevent halt timeout
  USB: storage: Add unusual-devs entry for Novatek NTK96550-based camera
  usb: core: hcd: fix accessing unmapped memory in SINGLE_STEP_SET_FEATURE test
  usb: renesas-xhci: Fix External ROM access timeouts
  usb: gadget: tegra-xudc: fix PM use count underflow
  usb: quirks: Add DELAY_INIT quick for another SanDisk 3.2Gen1 Flash Drive

Merge tag 'trace-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix rtla and latency tooling pkg-config errors

   If libtraceevent and libtracefs is installed, but their corresponding
   '.pc' files are not installed, it reports that the libraries are
   missing and confuses the developer. Instead, report that the
   pkg-config files are missing and should be installed.

- Fix overflow bug of the parser in trace_get_user()

   trace_get_user() uses the parsing functions to parse the user space
   strings. If the parser fails due to incorrect processing, it doesn't
   terminate the buffer with a nul byte. Add a "failed" flag to the
   parser that gets set when parsing fails and is used to know if the
   buffer is fine to use or not.

- Remove a semicolon that was at an end of a comment line

- Fix register_ftrace_graph() to unregister the pm notifier on error

   The register_ftrace_graph() registers a pm notifier but there's an
   error path that can exit the function without unregistering it. Since
   the function returns an error, it will never be unregistered.

- Allocate and copy ftrace hash for reader of ftrace filter files

   When the set_ftrace_filter or set_ftrace_notrace files are open for
   read, an iterator is created and sets its hash pointer to the
   associated hash that represents filtering or notrace filtering to it.
   The issue is that the hash it points to can change while the
   iteration is happening. All the locking used to access the tracer's
   hashes are released which means those hashes can change or even be
   freed. Using the hash pointed to by the iterator can cause UAF bugs
   or similar.

   Have the read of these files allocate and copy the corresponding
   hashes and use that as that will keep them the same while the
   iterator is open. This also simplifies the code as opening it for
   write already does an allocate and copy, and now that the read is
   doing the same, there's no need to check which way it was opened on
   the release of the file, and the iterator hash can always be freed.

- Fix function graph to copy args into temp storage

   The output of the function graph tracer shows both the entry and the
   exit of a function. When the exit is right after the entry, it
   combines the two events into one with the output of "function();",
   instead of showing:

     function() {
     }

   In order to do this, the iterator descriptor that reads the events
   includes storage that saves the entry event while it peaks at the
   next event in the ring buffer. The peek can free the entry event so
   the iterator must store the information to use it after the peek.

   With the addition of function graph tracer recording the args, where
   the args are a dynamic array in the entry event, the temp storage
   does not save them. This causes the args to be corrupted or even
   cause a read of unsafe memory.

   Add space to save the args in the temp storage of the iterator.

- Fix race between ftrace_dump and reading trace_pipe

   ftrace_dump() is used when a crash occurs where the ftrace buffer
   will be printed to the console. But it can also be triggered by
   sysrq-z. If a sysrq-z is triggered while a task is reading trace_pipe
   it can cause a race in the ftrace_dump() where it checks if the
   buffer has content, then it checks if the next event is available,
   and then prints the output (regardless if the next event was
   available or not). Reading trace_pipe at the same time can cause it
   to not be available, and this triggers a WARN_ON in the print. Move
   the printing into the check if the next event exists or not

* tag 'trace-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ftrace: Also allocate and copy hash for reading of filter files
  ftrace: Fix potential warning in trace_printk_seq during ftrace_dump
  fgraph: Copy args in intermediate storage with entry
  trace/fgraph: Fix the warning caused by missing unregister notifier
  ring-buffer: Remove redundant semicolons
  tracing: Limit access to parser->buffer when trace_get_user failed
  rtla: Check pkg-config install
  tools/latency-collector: Check pkg-config install

Merge tag 'driver-core-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Danilo Krummrich:

- Fix swapped handling of lru_gen and lru_gen_full debugfs files in
   vmscan

- Fix debugfs mount options (uid, gid, mode) being silently ignored

- Fix leak of devres action in the unwind path of Devres::new()

- Documentation:
     - Expand and fix documentation of (outdated) Device, DeviceContext
       and generic driver infrastructure
     - Fix C header link of faux device abstractions
     - Clarify expected interaction with the security team
     - Smooth text flow in the security bug reporting process
       documentation

* tag 'driver-core-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  Documentation: smooth the text flow in the security bug reporting process
  Documentation: clarify the expected collaboration with security bugs reporters
  debugfs: fix mount options not being applied
  rust: devres: fix leaking call to devm_add_action()
  rust: faux: fix C header link
  driver: rust: expand documentation for driver infrastructure
  device: rust: expand documentation for Device
  device: rust: expand documentation for DeviceContext
  mm/vmscan: fix inverted polarity in lru_gen_seq_show()

Merge tag 'i2c-host-fixes-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.17-rc3

- hisi: update maintainership
- rtl9300: fix several issues in xfer
  - check message length boundaries
  - correct multi-byte value composition on write
  - increase polling timeout
  - fix block transfer protocol

ftrace: Also allocate and copy hash for reading of filter files

Currently the reader of set_ftrace_filter and set_ftrace_notrace just adds
the pointer to the global tracer hash to its iterator. Unlike the writer
that allocates a copy of the hash, the reader keeps the pointer to the
filter hashes. This is problematic because this pointer is static across
function calls that release the locks that can update the global tracer
hashes. This can cause UAF and similar bugs.

Allocate and copy the hash for reading the filter files like it is done
for the writers. This not only fixes UAF bugs, but also makes the code a
bit simpler as it doesn't have to differentiate when to free the
iterator's hash between writers and readers.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250822183606.12962cc3@batman.local.home
Fixes: c20489dad156 ("ftrace: Assign iter->hash to filter or notrace hashes on seq read")
Closes: https://lore.kernel.org/all/20250813023044.2121943-1-wutengda@huaweicloud.com/
Closes: https://lore.kernel.org/all/20250822192437.GA458494@ax162/
Reported-by: Tengda Wu <wutengda@huaweicloud.com>
Tested-by: Tengda Wu <wutengda@huaweicloud.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'drm-fixes-2025-08-23-1' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly drm fixes. Looks like things did indeed get busier after rc2,
  nothing seems too major, but stuff scattered all over the place,
  amdgpu, xe, i915, hibmc, rust support code, and other small fixes.

  rust:
   - drm device memory layout and safety fixes

  tests:
   - Endianness fixes

  gpuvm:
   - docs warning fix

  panic:
   - fix division on 32-bit arm

  i915:
   - TypeC DP display Fixes
   - Silence rpm wakeref asserts on GEN11_GU_MISC_IIR access
   - Relocate compression repacking WA for JSL/EHL

  xe:
   - xe_vm_create fixes
   - fix vm bind ioctl double free

  amdgpu:
   - Replay fixes
   - SMU14 fix
   - Null check DC fixes
   - DCE6 DC fixes
   - Misc DC fixes

  bridge:
   - analogix_dp: devm_drm_bridge_alloc() error handling fix

  habanalabs:
   - Memory deallocation fix

  hibmc:
   - modesetting black screen fixes
   - fix UAF on irq
   - fix leak on i2c failure path

  nouveau:
   - memory leak fixes
   - typos

  rockchip:
   - Kconfig fix
   - register caching fix"

* tag 'drm-fixes-2025-08-23-1' of https://gitlab.freedesktop.org/drm/kernel: (49 commits)
  drm/xe: Fix vm_bind_ioctl double free bug
  drm/xe: Move ASID allocation and user PT BO tracking into xe_vm_create
  drm/xe: Assign ioctl xe file handler to vm in xe_vm_create
  drm/i915/gt: Relocate compression repacking WA for JSL/EHL
  drm/i915: silence rpm wakeref asserts on GEN11_GU_MISC_IIR access
  drm/amd/display: Fix DP audio DTO1 clock source on DCE 6.
  drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3
  drm/amd/display: Don't print errors for nonexistent connectors
  drm/amd/display: Don't warn when missing DCE encoder caps
  drm/amd/display: Fill display clock and vblank time in dce110_fill_display_configs
  drm/amd/display: Find first CRTC and its line time in dce110_fill_display_configs
  drm/amd/display: Adjust DCE 8-10 clock, don't overclock by 15%
  drm/amd/display: Don't overclock DCE 6 by 15%
  drm/amd/display: Add null pointer check in mod_hdcp_hdcp1_create_session()
  drm/amd/display: Fix Xorg desktop unresponsive on Replay panel
  drm/amd/display: Avoid a NULL pointer dereference
  drm/amdgpu/swm14: Update power limit logic
  drm/amd/display: Revert Add HPO encoder support to Replay
  drm/i915/icl+/tc: Convert AUX powered WARN to a debug message
  drm/i915/lnl+/tc: Use the cached max lane count value
  ...

ftrace: Fix potential warning in trace_printk_seq during ftrace_dump

When calling ftrace_dump_one() concurrently with reading trace_pipe,
a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race
condition.

The issue occurs because:

CPU0 (ftrace_dump)                              CPU1 (reader)
echo z > /proc/sysrq-trigger

!trace_empty(&iter)
trace_iterator_reset(&iter) <- len = size = 0
                                                cat /sys/kernel/tracing/trace_pipe
trace_find_next_entry_inc(&iter)
  __find_next_entry
    ring_buffer_empty_cpu <- all empty
  return NULL

trace_printk_seq(&iter.seq)
  WARN_ON_ONCE(s->seq.len >= s->seq.size)

In the context between trace_empty() and trace_find_next_entry_inc()
during ftrace_dump, the ring buffer data was consumed by other readers.
This caused trace_find_next_entry_inc to return NULL, failing to populate
`iter.seq`. At this point, due to the prior trace_iterator_reset, both
`iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal,
the WARN_ON_ONCE condition is triggered.

Move the trace_printk_seq() into the if block that checks to make sure the
return value of trace_find_next_entry_inc() is non-NULL in
ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before
subsequent operations.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@elte.hu>
Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com
Fixes: d769041f8653 ("ring_buffer: implement new locking")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

fgraph: Copy args in intermediate storage with entry

The output of the function graph tracer has two ways to display its
entries. One way for leaf functions with no events recorded within them,
and the other is for functions with events recorded inside it. As function
graph has an entry and exit event, to simplify the output of leaf
functions it combines the two, where as non leaf functions are separate:

2)               |              invoke_rcu_core() {
2)               |                raise_softirq() {
2)   0.391 us    |                  __raise_softirq_irqoff();
2)   1.191 us    |                }
2)   2.086 us    |              }

The __raise_softirq_irqoff() function above is really two events that were
merged into one. Otherwise it would have looked like:

2)               |              invoke_rcu_core() {
2)               |                raise_softirq() {
2)               |                  __raise_softirq_irqoff() {
2)   0.391 us    |                  }
2)   1.191 us    |                }
2)   2.086 us    |              }

In order to do this merge, the reading of the trace output file needs to
look at the next event before printing. But since the pointer to the event
is on the ring buffer, it needs to save the entry event before it looks at
the next event as the next event goes out of focus as soon as a new event
is read from the ring buffer. After it reads the next event, it will print
the entry event with either the '{' (non leaf) or ';' and timestamps (leaf).

The iterator used to read the trace file has storage for this event. The
problem happens when the function graph tracer has arguments attached to
the entry event as the entry now has a variable length "args" field. This
field only gets set when funcargs option is used. But the args are not
recorded in this temp data and garbage could be printed. The entry field
is copied via:

  data->ent = *curr;

Where "curr" is the entry field. But this method only saves the non
variable length fields from the structure.

Add a helper structure to the iterator data that adds the max args size to
the data storage in the iterator. Then simply copy the entire entry into
this storage (with size protection).

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20250820195522.51d4a268@gandalf.local.home
Reported-by: Sasha Levin <sashal@kernel.org>
Tested-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/aJaxRVKverIjF4a6@lappy/
Fixes: ff5c9c576e75 ("ftrace: Add support for function argument to graph tracer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Merge tag 'drm-xe-fixes-2025-08-21-1' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- xe_vm_create fixes (Piotr)
- Fix vm_bind_ioctl double free (Christoph)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aKdxiw9hvO6mcyKs@intel.com

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:
"Two very minor fixes:

   - Fix mismatched kvalloc()/kfree()

   - Spelling fixes in documentation"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Fix spelling errors in iommufd.rst
  iommufd: viommu: free memory allocated by kvcalloc() using kvfree()

Merge tag 'drm-misc-fixes-2025-08-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A bunch of fixes for 6.17:
  - analogix_dp: devm_drm_bridge_alloc() error handling fix
  - gaudi: Memory deallocation fix
  - gpuvm: Documentation warning fix
  - hibmc: Various misc fixes
  - nouveau: Memory leak fixes, typos
  - panic: u64 division handling on 32 bits architecture fix
  - rockchip: Kconfig fix, register caching fix
  - rust: memory layout and safety fixes
  - tests: Endianness fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250821-economic-dandelion-rooster-c57fa9@houat

mips: lantiq: xway: sysctrl: rename the etop node

Bindig requires a node name matching ‘^ethernet@[0-9a-f]+$’. This patch
changes the clock name from “etop” to “ethernet”.

This fixes the following warning:
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): $nodename:0: 'etop@e180000' does not match '^ethernet@[0-9a-f]+$'
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#

Fixes: dac0bad93741 ("dt-bindings: net: lantiq,etop-xway: Document Lantiq Xway ETOP bindings")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Jakub Kicinski <kuba@kernel.org>

mips: dts: lantiq: danube: add missing burst length property

The upstream dts lacks the lantiq,{rx/tx}-burst-length property. Other
issues were also fixed:
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): 'interrupt-names' is a required property
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): 'lantiq,tx-burst-length' is a required property
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#
arch/mips/boot/dts/lantiq/danube_easy50712.dtb: etop@e180000 (lantiq,etop-xway): 'lantiq,rx-burst-length' is a required property
from schema $id: http://devicetree.org/schemas/net/lantiq,etop-xway.yaml#

Fixes: 14d4e308e0aa ("net: lantiq: configure the burst length in ethernet drivers")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 's390-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

- When kernel lockdown is active userspace tools that rely on read
   operations only are unnecessarily blocked. Fix that by avoiding ioctl
   registration during lockdown

- Invalid NULL pointer accesses succeed due to the lowcore is always
   mapped the identity mapping pinned to zero. To fix that never map the
   first two pages of physical memory with identity mapping

- Fix invalid SCCB present check in the SCLP interrupt handler

- Update defconfigs

* tag 's390-6.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/hypfs: Enable limited access during lockdown
  s390/hypfs: Avoid unnecessary ioctl registration in debugfs
  s390/mm: Do not map lowcore with identity mapping
  s390/sclp: Fix SCCB present check
  s390/configs: Set HZ=1000
  s390/configs: Update defconfigs

Merge tag 'for-linus-6.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"Two small cleanups which are both relevant only when running as a Xen
  guest"

* tag 'for-linus-6.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drivers/xen/xenbus: remove quirk for Xen 3.x
  compiler: remove __ADDRESSABLE_ASM{_STR,}() again

Merge tag 'platform-drivers-x86-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

- amd/hsmp:
     - Ensure sock->metric_tbl_addr is non-NULL
     - Register driver even if hwmon registration fails

- amd/pmc: Drop SMU F/W match for Cezanne

- dell-smbios-wmi: Separate "priority" from WMI device ID

- hp-wmi: mark Victus 16-r1xxx for Victus s fan and thermal profile
   support

- intel-uncore-freq: Check write blocked for efficiency latency control

* tag 'platform-drivers-x86-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: hp-wmi: mark Victus 16-r1xxx for victus_s fan and thermal profile support
  platform/x86/amd/hsmp: Ensure success even if hwmon registration fails
  platform/x86/amd/hsmp: Ensure sock->metric_tbl_addr is non-NULL
  platform/x86/intel-uncore-freq: Check write blocked for ELC
  platform/x86/amd: pmc: Drop SMU F/W match for Cezanne
  platform/x86: dell-smbios-wmi: Stop touching WMI device ID

Merge tag 'block-6.17-20250822' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"A set of fixes for block that should go into this tree. A bit larger
  than what I usually have at this point in time, a lot of that is the
  continued fixing of the lockdep annotation for queue freezing that we
  recently added, which has highlighted a number of little issues here
  and there. This contains:

   - MD pull request via Yu:

       - Add a legacy_async_del_gendisk mode, to prevent a user tools
         regression. New user tools releases will not use such a mode,
         the old release with a new kernel now will have warning about
         deprecated behavior, and we prepare to remove this legacy mode
         after about a year later

       - The rename in kernel causing user tools build failure, revert
         the rename in mdp_superblock_s

       - Fix a regression that interrupted resync can be shown as
         recover from mdstat or sysfs

   - Improve file size detection for loop, particularly for networked
     file systems, by using getattr to get the size rather than the
     cached inode size.

   - Hotplug CPU lock vs queue freeze fix

   - Lockdep fix while updating the number of hardware queues

   - Fix stacking for PI devices

   - Silence bio_check_eod() for the known case of device removal where
     the size is truncated to 0 sectors"

* tag 'block-6.17-20250822' of git://git.kernel.dk/linux:
  block: avoid cpu_hotplug_lock depedency on freeze_lock
  block: decrement block_rq_qos static key in rq_qos_del()
  block: skip q->rq_qos check in rq_qos_done_bio()
  blk-mq: fix lockdep warning in __blk_mq_update_nr_hw_queues
  block: tone down bio_check_eod
  loop: use vfs_getattr_nosec for accurate file size
  loop: Consolidate size calculation logic into lo_calculate_size()
  block: remove newlines from the warnings in blk_validate_integrity_limits
  block: handle pi_tuple_size in queue_limits_stack_integrity
  selftests: ublk: Use ARRAY_SIZE() macro to improve code
  md: fix sync_action incorrect display during resync
  md: add helper rdev_needs_recovery()
  md: keep recovery_cp in mdp_superblock_s
  md: add legacy_async_del_gendisk mode

Merge tag 'io_uring-6.17-20250822' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one that fixes inconsistent ->async_data vs
  REQ_F_ASYNC_DATA handling in futex, and a followup that just ensures
  that if other opcode handlers mess this up, it won't cause any issues"

* tag 'io_uring-6.17-20250822' of git://git.kernel.dk/linux:
  io_uring: clear ->async_data as part of normal init
  io_uring/futex: ensure io_futex_wait() cleans up properly on failure

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"All fixes in drivers. The largest diffstat in ufs is caused by the doc
  update with the next being the qcom null pointer deref fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: ufs-qcom: Fix ESI null pointer dereference
  scsi: ufs: core: Rename ufshcd_wait_for_doorbell_clr()
  scsi: ufs: core: Fix the return value documentation
  scsi: ufs: core: Remove WARN_ON_ONCE() call from ufshcd_uic_cmd_compl()
  scsi: ufs: core: Fix IRQ lock inversion for the SCSI host lock
  scsi: qla4xxx: Prevent a potential error pointer dereference
  scsi: ufs: ufs-pci: Add support for Intel Wildcat Lake
  scsi: fnic: Remove a useless struct mempool forward declaration

Merge tag 'mmc-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC host:
   - sdhci_am654: Disable HS400 for AM62P SR1.0 and SR1.1
   - sdhci-of-arasan: Ensure CD logic stabilization before power-up
   - sdhci-pci-gli: Mask the replay timer timeout of AER for GL9763e

  MEMSTICK:
   - Fix deadlock by moving removing flag earlier"

* tag 'mmc-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci_am654: Disable HS400 for AM62P SR1.0 and SR1.1
  memstick: Fix deadlock by moving removing flag earlier
  mmc: sdhci-of-arasan: Ensure CD logic stabilization before power-up
  mmc: sdhci-pci-gli: GL9763e: Mask the replay timer timeout of AER
  mmc: sdhci-pci-gli: GL9763e: Rename the gli_set_gl9763e() for consistency
  mmc: sdhci-pci-gli: Add a new function to simplify the code

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

- syzkaller found a WARN_ON in rxe due to poor lifecycle management of
   resources linked to skbs

- Missing error path handling in erdma qp creation

- Initialize the qp number for the GSI QP in erdma

- Mismatching of DIP, SCC and QP numbers in hns

- SRQ bug fixes in bnxt_re

- Memory leak and possibly uninited memory in bnxt_re

- Remove retired irdma maintainer

- Fix kfree() for kvalloc() in ODP

- Fix memory leak in hns

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/hns: Fix dip entries leak on devices newer than hip09
  RDMA/core: Free pfn_list with appropriate kvfree call
  MAINTAINERS: Remove bouncing irdma maintainer
  RDMA/bnxt_re: Fix to initialize the PBL array
  RDMA/bnxt_re: Fix a possible memory leak in the driver
  RDMA/bnxt_re: Fix to remove workload check in SRQ limit path
  RDMA/bnxt_re: Fix to do SRQ armena by default
  RDMA/hns: Fix querying wrong SCC context for DIP algorithm
  RDMA/erdma: Fix unset QPN of GSI QP
  RDMA/erdma: Fix ignored return value of init_kernel_qp
  RDMA/rxe: Flush delayed SKBs while releasing RXE resources

Merge tag 'iommu-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

- AMD-Vi: Fix potential stack buffer overflow via command line

- NVidia-Tegra: Fix endianess sparse warning

- ARM-SMMU: Fix ATS-masters reference count issue

- Virtio-IOMMU: Fix race condition on instance lookup

- RISC-V IOMMU: Fix potential NULL-ptr dereference in
   riscv_iommu_iova_to_phys()

* tag 'iommu-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/riscv: prevent NULL deref in iova_to_phys
  iommu/virtio: Make instance lookup robust
  iommu/arm-smmu-v3: Fix smmu_domain->nr_ats_masters decrement
  iommu/tegra241-cmdqv: Fix missing cpu_to_le64 at lvcmdq_err_map
  iommu/amd: Avoid stack buffer overflow from kernel cmdline

Merge tag 'sound-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Only small fixes.

   - ASoC Cirrus codec fixes

   - A regression fix for the recent TAS2781 codec refactoring

   - A fix for user-timer error handling

   - Fixes for USB-audio descriptor validators

   - Usual HD-audio and ASoC device-specific quirks"

* tag 'sound-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Use correct sub-type for UAC3 feature unit validation
  ALSA: timer: fix ida_free call while not allocated
  ASoC: cs35l56: Remove SoundWire Clock Divider workaround for CS35L63
  ASoC: cs35l56: Handle new algorithms IDs for CS35L63
  ASoC: cs35l56: Update Firmware Addresses for CS35L63 for production silicon
  ALSA: hda: tas2781: Fix wrong reference of tasdevice_priv
  ALSA: hda/realtek: Audio disappears on HP 15-fc000 after warm boot again
  ALSA: hda/realtek: Fix headset mic on ASUS Zenbook 14
  ASoC: codecs: ES9389: Modify the standby configuration
  ALSA: usb-audio: Fix size validation in convert_chmap_v3()
  ALSA: hda/tas2781: Add name prefix tas2781 for tas2781's dvc_tlv and amp_vol_tlv
  ALSA: hda/realtek: Add support for HP EliteBook x360 830 G6 and EliteBook 830 G6

Merge tag '6.17-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:
"Fix for netfs smb3 oops"

* tag '6.17-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops due to uninitialised variable

Merge tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fix from Trond Myklebust:

- NFS: Fix a data corrupting race when updating an existing write

* tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a race when updating an existing write

Merge tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"20 hotfixes. 10 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 17 of these
  fixes are for MM.

  As usual, singletons all over the place, apart from a three-patch
  series of KHO followup work from Pasha which is actually also a bunch
  of singletons"

* tag 'mm-hotfixes-stable-2025-08-21-18-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/mremap: fix WARN with uffd that has remap events disabled
  mm/damon/sysfs-schemes: put damos dests dir after removing its files
  mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m
  mm/damon/core: fix damos_commit_filter not changing allow
  mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn
  MAINTAINERS: mark MGLRU as maintained
  mm: rust: add page.rs to MEMORY MANAGEMENT - RUST
  iov_iter: iterate_folioq: fix handling of offset >= folio size
  selftests/damon: fix selftests by installing drgn related script
  .mailmap: add entry for Easwar Hariharan
  selftests/mm: add test for invalid multi VMA operations
  mm/mremap: catch invalid multi VMA moves earlier
  mm/mremap: allow multi-VMA move when filesystem uses thp_get_unmapped_area
  mm/damon/core: fix commit_ops_filters by using correct nth function
  tools/testing: add linux/args.h header and fix radix, VMA tests
  mm/debug_vm_pgtable: clear page table entries at destroy_args()
  squashfs: fix memory leak in squashfs_fill_super
  kho: warn if KHO is disabled due to an error
  kho: mm: don't allow deferred struct page with KHO
  kho: init new_physxa->phys_bits to fix lockdep

iommu/riscv: prevent NULL deref in iova_to_phys

The riscv_iommu_pte_fetch() function returns either NULL for
unmapped/never-mapped iova, or a valid leaf pte pointer that
requires no further validation.

riscv_iommu_iova_to_phys() failed to handle NULL returns.
Prevent null pointer dereference in
riscv_iommu_iova_to_phys(), and remove the pte validation.

Fixes: 488ffbf18171 ("iommu/riscv: Paging domain support")
Cc: Tomasz Jeznach <tjeznach@rivosinc.com>
Signed-off-by: XianLiang Huang <huangxianliang@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250820072248.312-1-huangxianliang@lanxincomputing.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/virtio: Make instance lookup robust

Much like arm-smmu in commit 7d835134d4e1 ("iommu/arm-smmu: Make
instance lookup robust"), virtio-iommu appears to have the same issue
where iommu_device_register() makes the IOMMU instance visible to other
API callers (including itself) straight away, but internally the
instance isn't ready to recognise itself for viommu_probe_device() to
work correctly until after viommu_probe() has returned. This matters a
lot more now that bus_iommu_probe() has the DT/VIOT knowledge to probe
client devices the way that was always intended. Tweak the lookup and
initialisation in much the same way as for arm-smmu, to ensure that what
we register is functional and ready to go.

Cc: stable@vger.kernel.org
Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/308911aaa1f5be32a3a709996c7bd6cf71d30f33.1755190036.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

iommu/arm-smmu-v3: Fix smmu_domain->nr_ats_masters decrement

The arm_smmu_attach_commit() updates master->ats_enabled before calling
arm_smmu_remove_master_domain() that is supposed to clean up everything
in the old domain, including the old domain's nr_ats_masters. So, it is
supposed to use the old ats_enabled state of the device, not an updated
state.

This isn't a problem if switching between two domains where:
- old ats_enabled = false; new ats_enabled = false
- old ats_enabled = true;  new ats_enabled = true
but can fail cases where:
- old ats_enabled = false; new ats_enabled = true
   (old domain should keep the counter but incorrectly decreased it)
- old ats_enabled = true;  new ats_enabled = false
   (old domain needed to decrease the counter but incorrectly missed it)

Update master->ats_enabled after arm_smmu_remove_master_domain() to fix
this.

Fixes: 7497f4211f4f ("iommu/arm-smmu-v3: Make changing domains be hitless for ATS")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Link: https://lore.kernel.org/r/20250801030127.2006979-1-nicolinc@nvidia.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

Merge tag 'cgroup-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

- Fix NULL de-ref in css_rstat_exit() which could happen after
   allocation failure

- Fix a cpuset partition handling bug and a couple other misc issues

- Doc spelling fix

* tag 'cgroup-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup: fixed spelling mistakes in documentation
  cgroup: avoid null de-ref in css_rstat_exit()
  cgroup/cpuset: Remove the unnecessary css_get/put() in cpuset_partition_write()
  cgroup/cpuset: Fix a partition error with CPU hotplug
  cgroup/cpuset: Use static_branch_enable_cpuslocked() on cpusets_insane_config_key

Merge tag 'spi-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A small collection of fixes that came in during the past week, a few
  driver specifics plus one fix for the spi-mem core where we weren't
  taking account of the frequency capabilities of the system when
  determining if it can support an operation"

* tag 'spi-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: st: fix PM macros to use CONFIG_PM instead of CONFIG_PM_SLEEP
  spi: spi-qpic-snand: fix calculating of ECC OOB regions' properties
  spi: spi-fsl-lpspi: Clamp too high speed_hz
  spi: spi-mem: add spi_mem_adjust_op_freq() in spi_mem_supports_op()
  spi: spi-mem: Add missing kdoc argument
  spi: spi-qpic-snand: use correct CW_PER_PAGE value for OOB write

Merge tag 'regulator-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of fairly minor device specific fixes that came in over the
  past week or so, plus the addition of an actual maintainer for the
  IR38060"

* tag 'regulator-fix-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps65219: regulator: tps65219: Fix error codes in probe()
  regulator: pca9450: Use devm_register_sys_off_handler
  regulator: dt-bindings: infineon,ir38060: Add Guenter as maintainer from IBM

Merge tag 'acpi-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix three new issues in the ACPI APEI error injection code and
  an ACPI platform firmware runtime update interface issue:

   - Make ACPI APEI error injection check the version of the request
     when mapping the EINJ parameter structure in the BIOS reserved
     memory to prevent injecting errors based on an uninitialized
     field (Tony Luck)

   - Fix potential NULL dereference in __einj_error_inject() that may
     occur when memory allocation fails (Charles Han)

   - Remove the __exit annotation from einj_remove(), so it can be
     called on errors during faux device probe (Uwe Kleine-König)

   - Use a security-version-number check instead of a runtime version
     check during ACPI platform firmware runtime driver updates to
     prevent those updates from failing due to false-positive driver
     version check failures (Chen Yu)"

* tag 'acpi-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: pfr_update: Fix the driver update version check
  ACPI: APEI: EINJ: Fix resource leak by remove callback in .exit.text
  ACPI: APEI: EINJ: fix potential NULL dereference in __einj_error_inject()
  ACPI: APEI: EINJ: Check if user asked for EINJV2 injection

Merge tag 'pm-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix a cpuidle menu governor issue and two issues in the cpupower
  utility:

   - Prevent the menu cpuidle governor from selecting idle states with
     exit latency exceeding the current PM QoS limit after stopping the
     scheduler tick (Rafael Wysocki)

   - Make the set subcommand's -t option in the cpupower utility work as
     documented and allow it to control the CPU boost feature of cpufreq
     beyond x86 (Shinji Nomoto)"

* tag 'pm-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: governors: menu: Avoid selecting states with too much latency
  cpupower: Allow control of boost feature on non-x86 based systems with boost support.
  cpupower: Fix a bug where the -t option of the set subcommand was not working.

Merge tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- Fix a subtle bug during SCX enabling where a dead task skips init
   but doesn't skip sched class switch leading to invalid task state
   transition warning

- Cosmetic fix in selftests

* tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  selftests/sched_ext: Remove duplicate sched.h header
  sched/ext: Fix invalid task state transitions on class switch

io_uring: clear ->async_data as part of normal init

Opcode handlers like POLL_ADD will use ->async_data as the pointer for
double poll handling, which is a bit different than the usual case
where it's strictly gated by the REQ_F_ASYNC_DATA flag. Be a bit more
proactive in handling ->async_data, and clear it to NULL as part of
regular init. Init is touching that cacheline anyway, so might as well
clear it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/futex: ensure io_futex_wait() cleans up properly on failure

The io_futex_data is allocated upfront and assigned to the io_kiocb
async_data field, but the request isn't marked with REQ_F_ASYNC_DATA
at that point. Those two should always go together, as the flag tells
io_uring whether the field is valid or not.

Additionally, on failure cleanup, the futex handler frees the data but
does not clear ->async_data. Clear the data and the flag in the error
path as well.

Thanks to Trend Micro Zero Day Initiative and particularly ReDress for
reporting this.

Cc: stable@vger.kernel.org
Fixes: 194bb58c6090 ("io_uring: add support for futex wake and wait")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drm/xe: Fix vm_bind_ioctl double free bug

If the argument check during an array bind fails, the bind_ops are freed
twice as seen below. Fix this by setting bind_ops to NULL after freeing.

==================================================================
BUG: KASAN: double-free in xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
Free of addr ffff88813bb9b800 by task xe_vm/14198

CPU: 5 UID: 0 PID: 14198 Comm: xe_vm Not tainted 6.16.0-xe-eudebug-cmanszew+ #520 PREEMPT(full)
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2411.A02.2110081023 10/08/2021
Call Trace:
<TASK>
dump_stack_lvl+0x82/0xd0
print_report+0xcb/0x610
? __virt_addr_valid+0x19a/0x300
? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
kasan_report_invalid_free+0xc8/0xf0
? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
check_slab_allocation+0x102/0x130
kfree+0x10d/0x440
? should_fail_ex+0x57/0x2f0
? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
xe_vm_bind_ioctl+0x1b2/0x21f0 [xe]
? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
? __lock_acquire+0xab9/0x27f0
? lock_acquire+0x165/0x300
? drm_dev_enter+0x53/0xe0 [drm]
? find_held_lock+0x2b/0x80
? drm_dev_exit+0x30/0x50 [drm]
? drm_ioctl_kernel+0x128/0x1c0 [drm]
drm_ioctl_kernel+0x128/0x1c0 [drm]
? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
? find_held_lock+0x2b/0x80
? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
? should_fail_ex+0x57/0x2f0
? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
drm_ioctl+0x352/0x620 [drm]
? __pfx_drm_ioctl+0x10/0x10 [drm]
? __pfx_rpm_resume+0x10/0x10
? do_raw_spin_lock+0x11a/0x1b0
? find_held_lock+0x2b/0x80
? __pm_runtime_resume+0x61/0xc0
? rcu_is_watching+0x20/0x50
? trace_irq_enable.constprop.0+0xac/0xe0
xe_drm_ioctl+0x91/0xc0 [xe]
__x64_sys_ioctl+0xb2/0x100
? rcu_is_watching+0x20/0x50
do_syscall_64+0x68/0x2e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa9acb24ded

Fixes: b43e864af0d4 ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250813101231.196632-2-christoph.manszewski@intel.com
(cherry picked from commit a01b704527c28a2fd43a17a85f8996b75ec8492a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Move ASID allocation and user PT BO tracking into xe_vm_create

Currently, ASID assignment for user VMs and page-table BO accounting for
client memory tracking are performed in xe_vm_create_ioctl.
To consolidate VM object initialization, move this logic to
xe_vm_create.

v2:
- removed unnecessary duplicate BO tracking code
- using the local variable xef to verify whether the VM is being created
by userspace

Fixes: 658a1c8e0a66 ("drm/xe: Assign ioctl xe file handler to vm in xe_vm_create")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-3-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 30e0c3f43a414616e0b6ca76cf7f7b2cd387e1d4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: Added fixes tag]

Merge branches 'acpi-apei' and 'acpi-pfrut'

Merge ACPI APEI fixes and an ACPI platform firmware runtime update fix
for 6.17-rc3.

* acpi-apei:
  ACPI: APEI: EINJ: Fix resource leak by remove callback in .exit.text
  ACPI: APEI: EINJ: fix potential NULL dereference in __einj_error_inject()
  ACPI: APEI: EINJ: Check if user asked for EINJV2 injection

* acpi-pfrut:
  ACPI: pfr_update: Fix the driver update version check

Merge branch 'pm-cpuidle'

Merge a menu governor fix for 6.17-rc3

* pm-cpuidle:
cpuidle: governors: menu: Avoid selecting states with too much latency

Merge tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth.

  Current release - fix to a fix:

   - usb: asix_devices: fix PHY address mask in MDIO bus initialization

  Current release - regressions:

   - Bluetooth: fixes for the split between BIS_LINK and PA_LINK

   - Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN
     flag", breaks compatibility with some existing device tree blobs

   - dsa: b53: fix reserved register access in b53_fdb_dump()

  Current release - new code bugs:

   - sched: dualpi2: run probability update timer in BH to avoid
     deadlock

   - eth: libwx: fix the size in RSS hash key population

   - pse-pd: pd692x0: improve power budget error paths and handling

  Previous releases - regressions:

   - tls: fix handling of zero-length records on the rx_list

   - hsr: reject HSR frame if skb can't hold tag

   - bonding: fix negotiation flapping in 802.3ad passive mode

  Previous releases - always broken:

   - gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM

   - sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit,
     avoid packet drops with low buffer_limit, remove unnecessary WARN()

   - sched: fix backlog accounting after modifying config of a qdisc in
     the middle of the hierarchy

   - mptcp: improve handling of skb extension allocation failures

   - eth: mlx5:
       - fixes for the "HW Steering" flow management method
       - fixes for QoS and device buffer management"

* tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  netfilter: nf_reject: don't leak dst refcount for loopback packets
  net/mlx5e: Preserve shared buffer capacity during headroom updates
  net/mlx5e: Query FW for buffer ownership
  net/mlx5: Restore missing scheduling node cleanup on vport enable failure
  net/mlx5: Fix QoS reference leak in vport enable error path
  net/mlx5: Destroy vport QoS element when no configuration remains
  net/mlx5e: Preserve tc-bw during parent changes
  net/mlx5: Remove default QoS group and attach vports directly to root TSAR
  net/mlx5: Base ECVF devlink port attrs from 0
  net: pse-pd: pd692x0: Skip power budget configuration when undefined
  net: pse-pd: pd692x0: Fix power budget leak in manager setup error path
  Octeontx2-af: Skip overlap check for SPI field
  selftests: tls: add tests for zero-length records
  tls: fix handling of zero-length records on the rx_list
  net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision
  selftests: bonding: add test for passive LACP mode
  bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU
  bonding: update LACP activity flag after setting lacp_active
  Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag"
  ipv6: sr: Fix MAC comparison to be constant-time
  ...

netfilter: nf_reject: don't leak dst refcount for loopback packets

recent patches to add a WARN() when replacing skb dst entry found an
old bug:

WARNING: include/linux/skbuff.h:1165 skb_dst_check_unset include/linux/skbuff.h:1164 [inline]
WARNING: include/linux/skbuff.h:1165 skb_dst_set include/linux/skbuff.h:1210 [inline]
WARNING: include/linux/skbuff.h:1165 nf_reject_fill_skb_dst+0x2a4/0x330 net/ipv4/netfilter/nf_reject_ipv4.c:234
[..]
Call Trace:
nf_send_unreach+0x17b/0x6e0 net/ipv4/netfilter/nf_reject_ipv4.c:325
nft_reject_inet_eval+0x4bc/0x690 net/netfilter/nft_reject_inet.c:27
expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
..

This is because blamed commit forgot about loopback packets.
Such packets already have a dst_entry attached, even at PRE_ROUTING stage.

Instead of checking hook just check if the skb already has a route
attached to it.

Fixes: f53b9b0bdc59 ("netfilter: introduce support for reject at prerouting stage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

s390/hypfs: Enable limited access during lockdown

When kernel lockdown is active, debugfs_locked_down() blocks access to
hypfs files that register ioctl callbacks, even if the ioctl interface
is not required for a function. This unnecessarily breaks userspace
tools that only rely on read operations.

Resolve this by registering a minimal set of file operations during
lockdown, avoiding ioctl registration and preserving access for affected
tooling.

Note that this change restores hypfs functionality when lockdown is
active from early boot (e.g. via lockdown=integrity kernel parameter),
but does not apply to scenarios where lockdown is enabled dynamically
while Linux is running.

Tested-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 5496197f9b08 ("debugfs: Restrict debugfs when the kernel is locked down")
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/hypfs: Avoid unnecessary ioctl registration in debugfs

Currently, hypfs registers ioctl callbacks for all debugfs files,
despite only one file requiring them. This leads to unintended exposure
of unused interfaces to user space and can trigger side effects such as
restricted access when kernel lockdown is enabled.

Restrict ioctl registration to only those files that implement ioctl
functionality to avoid interface clutter and unnecessary access
restrictions.

Tested-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 5496197f9b08 ("debugfs: Restrict debugfs when the kernel is locked down")
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

ALSA: usb-audio: Use correct sub-type for UAC3 feature unit validation

The entry of the validators table for UAC3 feature unit is defined
with a wrong sub-type UAC_FEATURE (= 0x06) while it should have been
UAC3_FEATURE (= 0x07). This patch corrects the entry value.

Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units")
Link: https://patch.msgid.link/20250821150835.8894-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'mlx5-misx-fixes-2025-08-20'

Mark Bloch says:

====================
mlx5 misx fixes 2025-08-20

This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.

v1: https://lore.kernel.org/1755095476-414026-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/20250820133209.389065-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Preserve shared buffer capacity during headroom updates

When port buffer headroom changes, port_update_shared_buffer()
recalculates the shared buffer size and splits it in a 3:1 ratio
(lossy:lossless) - Currently, the calculation is:
lossless = shared / 4;
lossy = (shared / 4) * 3;

Meaning, the calculation dropped the remainder of shared % 4 due to
integer division, unintentionally reducing the total shared buffer
by up to three cells on each update. Over time, this could shrink
the buffer below usable size.

Fix it by changing the calculation to:
lossless = shared / 4;
lossy = shared - lossless;

This retains all buffer cells while still approximating the
intended 3:1 split, preventing capacity loss over time.

While at it, perform headroom calculations in units of cells rather than
in bytes for more accurate calculations avoiding extra divisions.

Fixes: a440030d8946 ("net/mlx5e: Update shared buffer along with device buffer changes")
Signed-off-by: Armen Ratner <armeng@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Query FW for buffer ownership

The SW currently saves local buffer ownership when setting
the buffer.
This means that the SW assumes it has ownership of the buffer
after the command is set.

If setting the buffer fails and we remain in FW ownership,
the local buffer ownership state incorrectly remains as SW-owned.
This leads to incorrect behavior in subsequent PFC commands,
causing failures.

Instead of saving local buffer ownership in SW,
query the FW for buffer ownership when setting the buffer.
This ensures that the buffer ownership state is accurately
reflected, avoiding the issues caused by incorrect ownership
states.

Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Restore missing scheduling node cleanup on vport enable failure

Restore the __esw_qos_free_node() call removed by the offending commit.

Fixes: 97733d1e00a0 ("net/mlx5: Add traffic class scheduling support for vport QoS")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Fix QoS reference leak in vport enable error path

Add missing esw_qos_put() call when __esw_qos_alloc_node() fails in
mlx5_esw_qos_vport_enable().

Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Destroy vport QoS element when no configuration remains

If a VF has been configured and the user later clears all QoS settings,
the vport element remains in the firmware QoS tree. This leads to
inconsistent behavior compared to VFs that were never configured, since
the FW assumes that unconfigured VFs are outside the QoS hierarchy.
As a result, the bandwidth share across VFs may differ, even though
none of them appear to have any configuration.

Align the driver behavior with the FW expectation by destroying the
vport QoS element when all configurations are removed.

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/20250820133209.389065-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Preserve tc-bw during parent changes

When changing parent of a node/leaf with tc-bw configured, the code
saves and restores tc-bw values. However, it was reading the converted
hardware bw_share values (where 0 becomes 1) instead of the original
user values, causing incorrect tc-bw calculations after parent change.

Store original tc-bw values in the node structure and use them directly
for save/restore operations.

Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Remove default QoS group and attach vports directly to root TSAR

Currently, the driver creates a default group (`node0`) and attaches
all vports to it unless the user explicitly sets a parent group. As a
result, when a user configures tx_share on a group and tx_share on
a VF, the expectation is for the group and the VF to share bandwidth
relatively. However, since the VF is not connected to the same parent
(but to the default node), the proportional share logic is not applied
correctly.

To fix this, remove the default group (`node0`) and instead connect
vports directly to the root TSAR when no parent is specified. This
ensures that vports and groups share the same root scheduler and their
tx_share values are compared directly under the same hierarchy.

Fixes: 0fe132eac38c ("net/mlx5: E-switch, Allow to add vports to rate groups")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Base ECVF devlink port attrs from 0

Adjust the vport number by the base ECVF vport number so the port
attributes start at 0. Previously the port attributes would start 1
after the maximum number of host VFs.

Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pse-pd: pd692x0: Skip power budget configuration when undefined

If the power supply's power budget is not defined in the device tree,
the current code still requests power and configures the PSE manager
with a 0W power limit, which is undesirable behavior.

Skip power budget configuration entirely when the budget is zero,
avoiding unnecessary power requests and preventing invalid 0W limits
from being set on the PSE manager.

Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250820133321.841054-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pse-pd: pd692x0: Fix power budget leak in manager setup error path

Fix a resource leak where manager power budgets were freed on both
success and error paths during manager setup. Power budgets should
only be freed on error paths after regulator registration or during
driver removal.

Refactor cleanup logic by extracting OF node cleanup and power budget
freeing into separate helper functions for better maintainability.

Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250820132708.837255-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Octeontx2-af: Skip overlap check for SPI field

Octeontx2/CN10K silicon supports generating a 256-bit key per packet.
The specific fields to be extracted from a packet for key generation
are configurable via a Key Extraction (MKEX) Profile.

The AF driver scans the configured extraction profile to ensure that
fields from upper layers do not overwrite fields from lower layers in
the key.

Example Packet Field Layout:
LA: DMAC + SMAC
LB: VLAN
LC: IPv4/IPv6
LD: TCP/UDP

Valid MKEX Profile Configuration:

LA   -> DMAC   -> key_offset[0-5]
LC   -> SIP    -> key_offset[20-23]
LD   -> SPORT  -> key_offset[30-31]

Invalid MKEX profile configuration:

LA   -> DMAC   -> key_offset[0-5]
LC   -> SIP    -> key_offset[20-23]
LD   -> SPORT  -> key_offset[2-3]  // Overlaps with DMAC field

In another scenario, if the MKEX profile is configured to extract
the SPI field from both AH and ESP headers at the same key offset,
the driver rejecting this configuration. In a regular traffic,
ipsec packet will be having either AH(LD) or ESP (LE). This patch
relaxes the check for the same.

Fixes: 12aa0a3b93f3 ("octeontx2-af: Harden rule validation.")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250820063919.1463518-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: tls: add tests for zero-length records

Test various combinations of zero-length records.
Unfortunately, kernel cannot be coerced into producing those,
so hardcode the ciphertext messages in the test.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tls: fix handling of zero-length records on the rx_list

Each recvmsg() call must process either
- only contiguous DATA records (any number of them)
- one non-DATA record

If the next record has different type than what has already been
processed we break out of the main processing loop. If the record
has already been decrypted (which may be the case for TLS 1.3 where
we don't know type until decryption) we queue the pending record
to the rx_list. Next recvmsg() will pick it up from there.

Queuing the skb to rx_list after zero-copy decrypt is not possible,
since in that case we decrypted directly to the user space buffer,
and we don't have an skb to queue (darg.skb points to the ciphertext
skb for access to metadata like length).

Only data records are allowed zero-copy, and we break the processing
loop after each non-data record. So we should never zero-copy and
then find out that the record type has changed. The corner case
we missed is when the initial record comes from rx_list, and it's
zero length.

Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg>
Reported-by: Billy Jheng Bing-Jhong <billy@starlabs.sg>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250820021952.143068-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Fix a lot of build warnings for LTO-enabled objtool check, increase
  COMMAND_LINE_SIZE up to 4096, rename a missing GCC_PLUGIN_STACKLEAK to
  KSTACK_ERASE, and fix some bugs about arch timer, module loading, LBT
  and KVM"

* tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Add address alignment check in pch_pic register access
  LoongArch: KVM: Use kvm_get_vcpu_by_id() instead of kvm_get_vcpu()
  LoongArch: KVM: Fix stack protector issue in send_ipi_data()
  LoongArch: KVM: Make function kvm_own_lbt() robust
  LoongArch: Rename GCC_PLUGIN_STACKLEAK to KSTACK_ERASE
  LoongArch: Save LBT before FPU in setup_sigcontext()
  LoongArch: Optimize module load time by optimizing PLT/GOT counting
  LoongArch: Add cpuhotplug hooks to fix high cpu usage of vCPU threads
  LoongArch: Increase COMMAND_LINE_SIZE up to 4096
  LoongArch: Pass annotate-tablejump option if LTO is enabled
  objtool/LoongArch: Get table size correctly if LTO is enabled

block: avoid cpu_hotplug_lock depedency on freeze_lock

A recent lockdep[1] splat observed while running blktest block/005
reveals a potential deadlock caused by the cpu_hotplug_lock dependency
on ->freeze_lock. This dependency was introduced by commit 033b667a823e
("block: blk-rq-qos: guard rq-qos helpers by static key").

That change added a static key to avoid fetching q->rq_qos when
neither blk-wbt nor blk-iolatency is configured. The static key
dynamically patches kernel text to a NOP when disabled, eliminating
overhead of fetching q->rq_qos in the I/O hot path. However, enabling
a static key at runtime requires acquiring both cpu_hotplug_lock and
jump_label_mutex. When this happens after the queue has already been
frozen (i.e., while holding ->freeze_lock), it creates a locking
dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
potential deadlock reported by lockdep [1].

To resolve this, replace the static key mechanism with q->queue_flags:
QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
otherwise, the access is skipped.

Since q->queue_flags is commonly accessed in IO hotpath and resides in
the first cacheline of struct request_queue, checking it imposes minimal
overhead while eliminating the deadlock risk.

This change avoids the lockdep splat without introducing performance
regressions.

[1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250814082612.500845-4-nilay@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: decrement block_rq_qos static key in rq_qos_del()

rq_qos_add() increments the block_rq_qos static key when a QoS
policy is attached. When a QoS policy is removed via rq_qos_del(),
we must symmetrically decrement the static key. If this removal drops
the last QoS policy from the queue (q->rq_qos becomes NULL), the
static branch can be disabled and the jump label patched to a NOP,
avoiding overhead on the hot path.

This change ensures rq_qos_add()/rq_qos_del() keep the
block_rq_qos static key balanced and prevents leaving the branch
permanently enabled after the last policy is removed.

Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250814082612.500845-3-nilay@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: skip q->rq_qos check in rq_qos_done_bio()

If a bio has BIO_QOS_THROTTLED or BIO_QOS_MERGED set,
it implicitly guarantees that q->rq_qos is present.
Avoid re-checking q->rq_qos in this case and call
__rq_qos_done_bio() directly as a minor optimization.

Suggested-by : Yu Kuai <yukuai1@huaweicloud.com>

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250814082612.500845-2-nilay@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

spi: st: fix PM macros to use CONFIG_PM instead of CONFIG_PM_SLEEP

pm_sleep_ptr() depends on CONFIG_PM_SLEEP while pm_ptr() depends on
CONFIG_PM. Since ST SSC4 implements runtime PM it makes sense using
pm_ptr() here.

For the same reason replace PM macros that use CONFIG_PM. Doing so
prevents from using __maybe_unused attribute of runtime PM functions.

Link: https://lore.kernel.org/lkml/CAMuHMdX9nkROkAJJ5odv4qOWe0bFTmaFs=Rfxsfuc9+DT-bsEQ@mail.gmail.com
Fixes: 6f8584a4826f ("spi: st: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()")
Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250820180310.9605-1-rgallaispou@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fixes from Eric Biggers:
"Fix a regression where 'make clean' stopped removing some of the
  generated assembly files on arm and arm64"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: ensure generated *.S files are removed on make clean
  lib/crypto: sha: Update Kconfig help for SHA1 and SHA256

Merge tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- fix refcount issue that can cause memory leak

- rate limit repeated connections from IPv6, not just IPv4 addresses

- fix potential null pointer access of smb direct work queue

* tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix refcount leak causing resource not released
  ksmbd: extend the connection limiting mechanism to support IPv6
  smb: server: split ksmbd_rdma_stop_listening() out of ksmbd_rdma_destroy()