Will Deacon [Thu, 24 Jul 2025 10:18:23 +0000 (11:18 +0100)]
Merge branch 'arm/smmu/bindings' into next
* arm/smmu/bindings:
dt-bindings: arm-smmu: Remove sdm845-cheza specific entry
dt-bindings: arm-smmu: document the support on Milos
iommu/arm-smmu-qcom: Add SM6115 MDSS compatible
Will Deacon [Thu, 24 Jul 2025 10:17:59 +0000 (11:17 +0100)]
Merge branch 'amd/amd-vi' into next
* amd/amd-vi:
iommu/amd: Fix geometry.aperture_end for V2 tables
iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocks
iommu/amd: Add documentation for AMD IOMMU debugfs support
iommu/amd: Add debugfs support to dump IRT Table
iommu/amd: Add debugfs support to dump device table
iommu/amd: Add support for device id user input
iommu/amd: Add debugfs support to dump IOMMU command buffer
iommu/amd: Add debugfs support to dump IOMMU Capability registers
iommu/amd: Add debugfs support to dump IOMMU MMIO registers
iommu/amd: Refactor AMD IOMMU debugfs initial setup
iommu/amd: Enable PASID and ATS capabilities in the correct order
iommu/amd: Add efr[HATS] max v1 page table level
iommu/amd: Add HATDis feature support
Will Deacon [Thu, 24 Jul 2025 10:17:52 +0000 (11:17 +0100)]
Merge branch 'intel/vt-d' into next
* intel/vt-d:
iommu/vt-d: Fix UAF on sva unbind with pending IOPFs
iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range
iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all
iommu/vt-d: Split paging_domain_compatible()
iommu/vt-d: Split intel_iommu_enforce_cache_coherency()
iommu/vt-d: Create unique domain ops for each stage
iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags()
iommu/vt-d: Do not wipe out the page table NID when devices detach
iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free()
iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook
Lu Baolu [Wed, 23 Jul 2025 07:20:45 +0000 (15:20 +0800)]
iommu/vt-d: Fix UAF on sva unbind with pending IOPFs
Commit 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach
path") disables IOPF on device by removing the device from its IOMMU's
IOPF queue when the last IOPF-capable domain is detached from the device.
Unfortunately, it did this in a wrong place where there are still pending
IOPFs. As a result, a use-after-free error is potentially triggered and
eventually a kernel panic with a kernel trace similar to the following:
The intel_pasid_tear_down_entry() function is responsible for blocking
hardware from generating new page faults and flushing all in-flight
ones. Therefore, moving iopf_for_domain_remove() after this function
should resolve this.
Fixes: 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach path") Reported-by: Ethan Milon <ethan.milon@eviden.com> Closes: https://lore.kernel.org/r/e8b37f3e-8539-40d4-8993-43a1f3ffe5aa@eviden.com Suggested-by: Ethan Milon <ethan.milon@eviden.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250723072045.1853328-1-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
Lu Baolu [Mon, 21 Jul 2025 05:16:57 +0000 (13:16 +0800)]
iommu/vt-d: Make iotlb_sync_map a static property of dmar_domain
Commit 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for
non-caching/non-RWBF modes") dynamically set iotlb_sync_map. This causes
synchronization issues due to lack of locking on map and attach paths,
racing iommufd userspace operations.
Invalidation changes must precede device attachment to ensure all flushes
complete before hardware walks page tables, preventing coherence issues.
Make domain->iotlb_sync_map static, set once during domain allocation. If
an IOMMU requires iotlb_sync_map but the domain lacks it, attach is
rejected. This won't reduce domain sharing: RWBF and shadowing page table
caching are legacy uses with legacy hardware. Mixed configs (some IOMMUs
in caching mode, others not) are unlikely in real-world scenarios.
Fixes: 12724ce3fe1a ("iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20250721051657.1695788-1-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
Konrad Dybcio [Wed, 16 Jul 2025 10:16:09 +0000 (12:16 +0200)]
dt-bindings: arm-smmu: Remove sdm845-cheza specific entry
The firmware on SDM845-based Cheza boards did not provide the same
level of feature support for SMMUs (particularly around the Adreno
GPU integration).
Now that Cheza is being removed from the kernel (almost none exist at
this point in time), retire the entry as well.
Most notably, it's not being marked as deprecated instead, as there is
no indication that any more of those ~7 year old devboards will be
built.
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20250716-topic-goodnight_cheza-v2-3-6fa8d3261813@oss.qualcomm.com Signed-off-by: Will Deacon <will@kernel.org>
Jason Gunthorpe [Mon, 9 Jun 2025 23:58:05 +0000 (20:58 -0300)]
iommu/amd: Fix geometry.aperture_end for V2 tables
The AMD IOMMU documentation seems pretty clear that the V2 table follows
the normal CPU expectation of sign extension. This is shown in
Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
would have three regions in the page table: lower, non-canonical, upper.
The manual describes that the V1 table does not sign extend in section
2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
Further, Vasant has checked this and indicates the HW has an addtional
behavior that the manual does not yet describe. The AMDv2 table does not
have the sign extended behavior when attached to PASID 0, which may
explain why this has gone unnoticed.
The iommu domain geometry does not directly support sign extended page
tables. The driver should report only one of the lower/upper spaces. Solve
this by removing the top VA bit from the geometry to use only the lower
space.
This will also make the iommu_domain work consistently on all PASID 0 and
PASID != 1.
Adjust dma_max_address() to remove the top VA bit. It now returns:
5 Level:
Before 0x1ffffffffffffff
After 0x0ffffffffffffff
4 Level:
Before 0xffffffffffff
After 0x7fffffffffff
iommu/amd: Wrap debugfs ABI testing symbols snippets in literal code blocks
Commit 39215bb3b0d929 ("iommu/amd: Add documentation for AMD IOMMU
debugfs support") documents debugfs ABI symbols for AMD IOMMU, but
forgets to wrap examples snippets and their output in literal code
blocks, hence Sphinx reports indentation warnings:
Documentation/ABI/testing/debugfs-amd-iommu:31: ERROR: Unexpected indentation. [docutils]
Documentation/ABI/testing/debugfs-amd-iommu:31: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Documentation/ABI/testing/debugfs-amd-iommu:31: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Wrap them to fix the warnings.
Fixes: 39215bb3b0d9 ("iommu/amd: Add documentation for AMD IOMMU debugfs support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250716204207.73869849@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250717010331.8941-1-bagasdotme@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
iommu/amd: Add documentation for AMD IOMMU debugfs support
Add documentation describing how to use AMD IOMMU debugfs support to
dump IOMMU data structures - IRT table, Device table, Registers (MMIO and
Capability) and command buffer.
In cases where we have an issue in the device interrupt path with IOMMU
interrupt remapping enabled, dumping valid IRT table entries for the device
is very useful and good input for debugging the issue.
eg.
-> To dump irte entries for a particular device
#echo "c4:00.0" > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/irqtbl | less
or
#echo "0000:c4:00.0" > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/irqtbl | less
iommu/amd: Add debugfs support to dump device table
IOMMU uses device table data structure to get per-device information for
DMA remapping, interrupt remapping, and other functionalities. It's a
valuable data structure to visualize for debugging issues related to
IOMMU.
eg.
-> To dump device table entry for a particular device
#echo 0000:c4:00.0 > /sys/kernel/debug/iommu/amd/devid
#cat /sys/kernel/debug/iommu/amd/devtbl
Dumping IOMMU data structures like device table, IRT, etc., for all devices
on the system will be a lot of data dumped in a file. Also, user may want
to dump and analyze these data structures just for one or few devices. So
dumping IOMMU data structures like device table, IRT etc for all devices
is not a good approach.
Add "device id" user input to be used for dumping IOMMU data structures
like device table, IRT etc in AMD IOMMU debugfs.
iommu/amd: Add debugfs support to dump IOMMU command buffer
IOMMU driver sends command to IOMMU hardware via command buffer. In cases
where IOMMU hardware fails to process commands in command buffer, dumping
it is a valuable input to debug the issue.
IOMMU hardware processes command buffer entry at offset equals to the head
pointer. Dumping just the entry at the head pointer may not always be
useful. The current head may not be pointing to the entry of the command
buffer which is causing the issue. IOMMU Hardware may have processed the
entry and updated the head pointer. So dumping the entire command buffer
gives a broad understanding of what hardware was/is doing. The command
buffer dump will have all entries from start to end of the command buffer.
Along with that, it will have a head and tail command buffer pointer
register dump to facilitate where the IOMMU driver and hardware are in
the command buffer for injecting and processing the entries respectively.
Command buffer is a per IOMMU data structure. So dumping on per IOMMU
basis.
eg.
-> To get command buffer dump for iommu<x> (say, iommu00)
#cat /sys/kernel/debug/iommu/amd/iommu00/cmdbuf
iommu/amd: Add debugfs support to dump IOMMU Capability registers
IOMMU Capability registers defines capabilities of IOMMU and information
needed for initialising MMIO registers and device table. This is useful
to dump these registers for debugging IOMMU related issues.
e.g.
-> To get capability registers value at offset 0x10 for iommu<x> (say,
iommu00)
# echo "0x10" > /sys/kernel/debug/iommu/amd/iommu00/capability
# cat /sys/kernel/debug/iommu/amd/iommu00/capability
iommu/amd: Add debugfs support to dump IOMMU MMIO registers
Analyzing IOMMU MMIO registers gives a view of what IOMMU is
configured with on the system and is helpful to debug issues
with IOMMU.
eg.
-> To get mmio registers value at offset 0x18 for iommu<x> (say, iommu00)
# echo "0x18" > /sys/kernel/debug/iommu/amd/iommu00/mmio
# cat /sys/kernel/debug/iommu/amd/iommu00/mmio
Rearrange initial setup of AMD IOMMU debugfs to segregate per IOMMU
setup and setup which is common for all IOMMUs. This ensures that common
debugfs paths (introduced in subsequent patches) are created only once
instead of being created for each IOMMU.
With the change, there is no need to use lock as amd_iommu_debugfs_setup()
will be called only once during AMD IOMMU initialization. So remove lock
acquisition in amd_iommu_debugfs_setup().
The bootloader configures a reserved memory region for framebuffer,
which is protected by the IOMMU. The kernel-side driver is oblivious as
of which memory region is set up by the bootloader. In such case, the
IOMMU tries to reference the reserved region - which is not reserved in
the kernel anymore - and it results in an unrecoverable page fault. More
information about it is provided in [1].
Add support for reserved regions using iommu_dma_get_resv_regions().
For OF supported boards, this requires defining the region in the
iommu-addresses property of the IOMMU owner's node.
On SM8250 / QRB5165-RB5 using PRR bits resets the device, most likely
because of the hyp limitations. Disable PRR support on that platform.
Fixes: 7f2ef1bfc758 ("iommu/arm-smmu: Add support for PRR bit setup") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Reviewed-by: Rob Clark <robin.clark@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250705-iommu-fix-prr-v2-1-406fecc37cf8@oss.qualcomm.com Signed-off-by: Will Deacon <will@kernel.org>
Commit 33729a5fc0ca ("iommu/io-pgtable-arm: Remove split on unmap
behavior") removed the last user of the macro iopte_prot. Remove the
macro definition of iopte_prot as well as three other related
definitions.
Fixes: 33729a5fc0ca ("iommu/io-pgtable-arm: Remove split on unmap behavior") Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250708211705.1567787-1-danielmentz@google.com Signed-off-by: Will Deacon <will@kernel.org>
Alexey Klimov [Fri, 13 Jun 2025 17:32:38 +0000 (18:32 +0100)]
iommu/arm-smmu-qcom: Add SM6115 MDSS compatible
Add the SM6115 MDSS compatible to clients compatible list, as it also
needs that workaround.
Without this workaround, for example, QRB4210 RB2 which is based on
SM4250/SM6115 generates a lot of smmu unhandled context faults during
boot:
Jason Gunthorpe [Fri, 11 Jul 2025 13:16:38 +0000 (10:16 -0300)]
iommu/qcom: Fix pgsize_bitmap
qcom uses the ARM_32_LPAE_S1 format which uses the ARM long descriptor
page table. Eventually arm_32_lpae_alloc_pgtable_s1() will adjust
the pgsize_bitmap with:
cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
So the current declaration is nonsensical. Fix it to be just SZ_4K which
is what it has actually been using so far. Most likely the qcom driver
copy and pasted the pgsize_bitmap from something using the ARM_V7S format.
Fixes: db64591de4b2 ("iommu/qcom: Remove iommu_ops pgsize_bitmap") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYvif6kDDFar5ZK4Dff3XThSrhaZaJundjQYujaJW978yg@mail.gmail.com/ Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/0-v1-65a7964d2545+195-qcom_pgsize_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range
The logic in cache_tag_flush_all() to iterate over cache tags and issue
TLB invalidations is largely duplicated in cache_tag_flush_range(), with
the only difference being the range parameters.
Extend cache_tag_flush_range() to handle a full address space flush when
called with start = 0 and end = ULONG_MAX. This allows
cache_tag_flush_all() to simply delegate to cache_tag_flush_range()
iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all
The function cache_tag_flush_all() was originally implemented with
incorrect device TLB invalidation logic that does not handle PASID, in
commit c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers")
This causes regressions where full address space TLB invalidations occur
with a PASID attached, such as during transparent hugepage unmapping in
SVA configurations or when calling iommu_flush_iotlb_all(). In these
cases, the device receives a TLB invalidation that lacks PASID.
This incorrect logic was later extracted into
cache_tag_flush_devtlb_all(), in commit 3297d047cd7f ("iommu/vt-d:
Refactor IOTLB and Dev-IOTLB flush for batching")
The fix replaces the call to cache_tag_flush_devtlb_all() with
cache_tag_flush_devtlb_psi(), which properly handles PASID.
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:26 +0000 (12:50 +0800)]
iommu/vt-d: Split paging_domain_compatible()
Make First/Second stage specific functions that follow the same pattern in
intel_iommu_domain_alloc_first/second_stage() for computing
EOPNOTSUPP. This makes the code easier to understand as if we couldn't
create a domain with the parameters for this IOMMU instance then we
certainly are not compatible with it.
Check superpage support directly against the per-stage cap bits and the
pgsize_bitmap.
Add a note that the force_snooping is read without locking. The locking
needs to cover the compatible check and the add of the device to the list.
First Stage and Second Stage have very different ways to deny
no-snoop. The first stage uses the PGSNP bit which is global per-PASID so
enabling requires loading new PASID entries for all the attached devices.
Second stage uses a bit per PTE, so enabling just requires telling future
maps to set the bit.
Since we now have two domain ops we can have two functions that can
directly code their required actions instead of a bunch of logic dancing
around use_first_level.
Combine domain_set_force_snooping() into the new functions since they are
the only caller.
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:24 +0000 (12:50 +0800)]
iommu/vt-d: Create unique domain ops for each stage
Use the domain ops pointer to tell what kind of domain it is instead of
the internal use_first_level indication. This also protects against
wrongly using a SVA/nested/IDENTITY/BLOCKED domain type in places they
should not be.
The only remaining uses of use_first_level outside the paging domain are in
paging_domain_compatible() and intel_iommu_enforce_cache_coherency().
Thus, remove the useless sets of use_first_level in
intel_svm_domain_alloc() and intel_iommu_domain_alloc_nested(). None of
the unique ops for these domain types ever reference it on their call
chains.
Add a WARN_ON() check in domain_context_mapping_one() as it only works
with second stage.
This is preparation for iommupt which will have different ops for each of
the stages.
Create stage specific functions that check the stage specific conditions
if each stage can be supported.
Have intel_iommu_domain_alloc_paging_flags() call both stages in sequence
until one does not return EOPNOTSUPP and prefer to use the first stage if
available and suitable for the requested flags.
Move second stage only operations like nested_parent and dirty_tracking
into the second stage function for clarity.
Move initialization of the iommu_domain members into paging_domain_alloc().
Drop initialization of domain->owner as the callers all do it.
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:22 +0000 (12:50 +0800)]
iommu/vt-d: Do not wipe out the page table NID when devices detach
The NID is used to control which NUMA node memory for the page table is
allocated it from. It should be a permanent property of the page table
when it was allocated and not change during attach/detach of devices.
Jason Gunthorpe [Mon, 14 Jul 2025 04:50:20 +0000 (12:50 +0800)]
iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid()
Pass the phys_addr_t down through the call chain from the top instead of
passing a pgd_t * KVA. This moves the __pa() into
domain_setup_first_level() which is the first function to obtain the pgd
from the IOMMU page table in this call chain.
The SVA flow is also adjusted to get the pa of the mm->pgd.
iommput will move the __pa() into iommupt code, it never shares the KVA of
the page table with the driver.
Lu Baolu [Mon, 14 Jul 2025 04:50:19 +0000 (12:50 +0800)]
iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
The iotlb_sync_map iommu ops allows drivers to perform necessary cache
flushes when new mappings are established. For the Intel iommu driver,
this callback specifically serves two purposes:
- To flush caches when a second-stage page table is attached to a device
whose iommu is operating in caching mode (CAP_REG.CM==1).
- To explicitly flush internal write buffers to ensure updates to memory-
resident remapping structures are visible to hardware (CAP_REG.RWBF==1).
However, in scenarios where neither caching mode nor the RWBF flag is
active, the cache_tag_flush_range_np() helper, which is called in the
iotlb_sync_map path, effectively becomes a no-op.
Despite being a no-op, cache_tag_flush_range_np() involves iterating
through all cache tags of the iommu's attached to the domain, protected
by a spinlock. This unnecessary execution path introduces overhead,
leading to a measurable I/O performance regression. On systems with NVMes
under the same bridge, performance was observed to drop from approximately
~6150 MiB/s down to ~4985 MiB/s.
Introduce a flag in the dmar_domain structure. This flag will only be set
when iotlb_sync_map is required (i.e., when CM or RWBF is set). The
cache_tag_flush_range_np() is called only for domains where this flag is
set. This flag, once set, is immutable, given that there won't be mixed
configurations in real-world scenarios where some IOMMUs in a system
operate in caching mode while others do not. Theoretically, the
immutability of this flag does not impact functionality.
iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook
iommu init hook is wrapped in CONFI_X86 and is a remnant of dmar.c when
it was a common code in "drivers/pci/dmar.c". This was added in commit
(9d5ce73a64be2 x86: intel-iommu: Convert detect_intel_iommu to use
iommu_init hook)
Now this is built only for x86. This config wrap could be removed.
iommu/amd: Enable PASID and ATS capabilities in the correct order
Per the PCIe spec, behavior of the PASID capability is undefined if the
value of the PASID Enable bit changes while the Enable bit of the
function's ATS control register is Set. Unfortunately,
pdev_enable_caps() does exactly that by ordering enabling ATS for the
device before enabling PASID.
Robin Murphy [Fri, 27 Jun 2025 15:07:10 +0000 (16:07 +0100)]
iommu/mediatek-v1: Tidy up probe_finalize
Krzysztof points out that although the driver now supports COMPILE_TEST
for other architectures, it does not build cleanly with W=1 where the
stubbed-out ARM API can lead to an unused variable warning.
Since this is effectively the correct intent of the code in such cases,
mark it as __maybe_unused, tidying up some cruft in the process.
iommu/omap: Use syscon_regmap_lookup_by_phandle_args
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over
syscon_regmap_lookup_by_phandle() combined with getting the syscon
argument. Except simpler code this annotates within one line that given
phandle has arguments, so grepping for code would be easier.
There is also no real benefit in printing errors on missing syscon
argument, because this is done just too late: runtime check on
static/build-time data. Dtschema and Devicetree bindings offer the
static/build-time check for this already.
iommu/omap: Drop redundant check if ti,syscon-mmuconfig exists
The syscon_regmap_lookup_by_phandle() will fail if property does not
exist, so doing of_property_read_bool() earlier is redundant. Drop that
check and move error message to syscon_regmap_lookup_by_phandle() error
case while converting it to dev_err_probe().
Sven Peter [Thu, 12 Jun 2025 21:11:31 +0000 (21:11 +0000)]
iommu/apple-dart: Drop default ARCH_APPLE in Kconfig
When the first driver for Apple Silicon was upstreamed we accidentally
included `default ARCH_APPLE` in its Kconfig which then spread to almost
every subsequent driver. As soon as ARCH_APPLE is set to y this will
pull in many drivers as built-ins which is not what we want.
Thus, drop `default ARCH_APPLE` from Kconfig.
Jason Gunthorpe [Mon, 9 Jun 2025 20:41:27 +0000 (17:41 -0300)]
iommu: Remove ops.pgsize_bitmap from drivers that don't use it
These drivers all set the domain->pgsize_bitmap in their
domain_alloc_paging() functions, so the ops value is never used. Delete
it.
Reviewed-by: Sven Peter <sven@svenpeter.dev> # for Apple DART Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> # for RISC-V Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/3-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Jason Gunthorpe [Mon, 9 Jun 2025 20:41:26 +0000 (17:41 -0300)]
iommu/arm-smmu: Remove iommu_ops pgsize_bitmap
The driver never reads this value, arm_smmu_init_domain_context() always
sets domain.pgsize_bitmap to smmu->pgsize_bitmap, the per-instance value.
Remove the ops version entirely, the related dead code and make
arm_smmu_ops const.
Since this driver does not yet finalize the domain under
arm_smmu_domain_alloc_paging() add a page size initialization to alloc so
the page size is still setup prior to attach.
The driver never reads this value, arm_smmu_domain_finalise() always sets
domain.pgsize_bitmap to pgtbl_cfg, which comes from the per-smmu
calculated value.
Remove the ops version entirely, the related dead code and make
arm_smmu_ops const.
Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/1-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Ankit Soni [Wed, 4 Jun 2025 06:13:25 +0000 (06:13 +0000)]
iommu/amd: Add efr[HATS] max v1 page table level
The EFR[HATS] bits indicate maximum host translation level supported by
IOMMU. Adding support to set the maximum host page table level as indicated
by EFR[HATS]. If the HATS=11b (reserved), the driver will attempt to use
guest page table for DMA API.
Ankit Soni [Wed, 4 Jun 2025 06:13:24 +0000 (06:13 +0000)]
iommu/amd: Add HATDis feature support
Current AMD IOMMU assumes Host Address Translation (HAT) is always
supported, and Linux kernel enables this capability by default. However,
in case of emulated and virtualized IOMMU, this might not be the case.
For example,current QEMU-emulated AMD vIOMMU does not support host
translation for VFIO pass-through device, but the interrupt remapping
support is required for x2APIC (i.e. kvm-msi-ext-dest-id is also not
supported by the guest OS). This would require the guest kernel to boot
with guest kernel option iommu=pt to by-pass the initialization of
host (v1) table.
The AMD I/O Virtualization Technology (IOMMU) Specification Rev 3.10 [1]
introduces a new flag 'HATDis' in the IVHD 11h IOMMU attributes to indicate
that HAT is not supported on a particular IOMMU instance.
Therefore, modifies the AMD IOMMU driver to detect the new HATDis
attributes, and disable host translation and switch to use guest
translation if it is available. Otherwise, the driver will disable DMA
translation.
Linus Torvalds [Sun, 22 Jun 2025 17:50:36 +0000 (10:50 -0700)]
Merge tag 'i2c-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- subsystem: convert drivers to use recent callbacks of struct
i2c_algorithm A typical after-rc1 cleanup, which I couldn't send in
time for rc2
- tegra: fix YAML conversion of device tree bindings
- k1: re-add a check which got lost during upstreaming
* tag 'i2c-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: k1: check for transfer error
i2c: use inclusive callbacks in struct i2c_algorithm
dt-bindings: i2c: nvidia,tegra20-i2c: Specify the required properties
Linus Torvalds [Sun, 22 Jun 2025 17:30:44 +0000 (10:30 -0700)]
Merge tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure the array tracking which kernel text positions need to be
alternatives-patched doesn't get mishandled by out-of-order
modifications, leading to it overflowing and causing page faults when
patching
- Avoid an infinite loop when early code does a ranged TLB invalidation
before the broadcast TLB invalidation count of how many pages it can
flush, has been read from CPUID
- Fix a CONFIG_MODULES typo
- Disable broadcast TLB invalidation when PTI is enabled to avoid an
overflow of the bitmap tracking dynamic ASIDs which need to be
flushed when the kernel switches between the user and kernel address
space
- Handle the case of a CPU going offline and thus reporting zeroes when
reading top-level events in the resctrl code
* tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Fix int3 handling failure from broken text_poke array
x86/mm: Fix early boot use of INVPLGB
x86/its: Fix an ifdef typo in its_alloc()
x86/mm: Disable INVLPGB when PTI is enabled
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
Linus Torvalds [Sun, 22 Jun 2025 17:17:51 +0000 (10:17 -0700)]
Merge tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix missing prototypes warnings
- Properly initialize work context when allocating it
- Remove a method tracking when managed interrupts are suspended during
hotplug, in favor of the code using a IRQ disable depth tracking now,
and have interrupts get properly enabled again on restore
- Make sure multiple CPUs getting hotplugged don't cause wrong tracking
of the managed IRQ disable depth
* tag 'irq_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ath79-misc: Fix missing prototypes warnings
genirq/irq_sim: Initialize work context pointers properly
genirq/cpuhotplug: Restore affinity even for suspended IRQ
genirq/cpuhotplug: Rebalance managed interrupts across multi-CPU hotplug
Linus Torvalds [Sun, 22 Jun 2025 17:11:45 +0000 (10:11 -0700)]
Merge tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Avoid a crash on a heterogeneous machine where not all cores support
the same hw events features
- Avoid a deadlock when throttling events
- Document the perf event states more
- Make sure a number of perf paths switching off or rescheduling events
call perf_cgroup_event_disable()
- Make sure perf does task sampling before its userspace mapping is
torn down, and not after
* tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix crash in icl_update_topdown_event()
perf: Fix the throttle error of some clock events
perf: Add comment to enum perf_event_state
perf/core: Fix WARN in perf_cgroup_switch()
perf: Fix dangling cgroup pointer in cpuctx
perf: Fix cgroup state vs ERROR
perf: Fix sample vs do_exit()
Linus Torvalds [Sun, 22 Jun 2025 17:09:23 +0000 (10:09 -0700)]
Merge tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Borislav Petkov:
- Make sure the switch to the global hash is requested always under a
lock so that two threads requesting that simultaneously cannot get to
inconsistent state
- Reject negative NUMA nodes earlier in the futex NUMA interface
handling code
- Selftests fixes
* tag 'locking_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Verify under the lock if hash can be replaced
futex: Handle invalid node numbers supplied by user
selftests/futex: Set the home_node in futex_numa_mpol
selftests/futex: getopt() requires int as return value.
* tag 'edac_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/igen6: Fix NULL pointer dereference
EDAC/amd64: Correct number of UMCs for family 19h models 70h-7fh
Linus Torvalds [Sun, 22 Jun 2025 16:46:11 +0000 (09:46 -0700)]
Merge tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Multichannel channel allocation fix for Kerberos mounts
- Two reconnect fixes
- Fix netfs_writepages crash with smbdirect/RDMA
- Directory caching fix
- Three minor cleanup fixes
- Log error when close cached dirs fails
* tag 'v6.16-rc2-smb3-client-fixes-v2' of git://git.samba.org/sfrench/cifs-2.6:
smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size
smb: minor fix to use sizeof to initialize flags_string buffer
smb: Use loff_t for directory position in cached_dirents
smb: Log an error when close_all_cached_dirs fails
cifs: Fix prepare_write to negotiate wsize if needed
smb: client: fix max_sge overflow in smb_extract_folioq_to_rdma()
smb: client: fix first command failure during re-negotiation
cifs: Remove duplicate fattr->cf_dtype assignment from wsl_to_fattr() function
smb: fix secondary channel creation issue with kerberos by populating hostname when adding channels
Alex Elder [Mon, 16 Jun 2025 12:51:36 +0000 (07:51 -0500)]
i2c: k1: check for transfer error
If spacemit_i2c_xfer_msg() times out waiting for a message transfer to
complete, or if the hardware reports an error, it returns a negative
error code (-ETIMEDOUT, -EAGAIN, -ENXIO. or -EIO).
The sole caller of spacemit_i2c_xfer_msg() is spacemit_i2c_xfer(),
which is the i2c_algorithm->xfer callback function. It currently
does not save the value returned by spacemit_i2c_xfer_msg().
The result is that transfer errors go unreported, and a caller
has no indication anything is wrong.
When this code was out for review, the return value *was* checked
in early versions. But for some reason, that assignment got dropped
between versions 5 and 6 of the series, perhaps related to reworking
the code to merge spacemit_i2c_xfer_core() into spacemit_i2c_xfer().
Simply assigning the value returned to "ret" fixes the problem.
Fixes: 5ea558473fa31 ("i2c: spacemit: add support for SpacemiT K1 SoC") Signed-off-by: Alex Elder <elder@riscstar.com> Cc: <stable@vger.kernel.org> # v6.15+ Reviewed-by: Troy Mitchell <troymitchell988@gmail.com> Link: https://lore.kernel.org/r/20250616125137.1555453-1-elder@riscstar.com Signed-off-by: Andi Shyti <andi@smida.it> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Linus Torvalds [Sat, 21 Jun 2025 16:20:15 +0000 (09:20 -0700)]
Merge tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two fixes for commits in the nfsd-6.16 merge
- One fix for the recently-added NFSD netlink facility
- One fix for a remote SunRPC crasher
* tag 'nfsd-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
sunrpc: handle SVC_GARBAGE during svc auth processing as auth error
nfsd: use threads array as-is in netlink interface
SUNRPC: Cleanup/fix initial rq_pages allocation
NFSD: Avoid corruption of a referring call list
Bharath SM [Thu, 19 Jun 2025 15:35:34 +0000 (21:05 +0530)]
smb: minor fix to use SMB2_NTLMV2_SESSKEY_SIZE for auth_key size
Replaced hardcoded value 16 with SMB2_NTLMV2_SESSKEY_SIZE
in the auth_key definition and memcpy call.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Bharath SM [Thu, 19 Jun 2025 15:35:33 +0000 (21:05 +0530)]
smb: minor fix to use sizeof to initialize flags_string buffer
Replaced hardcoded length with sizeof(flags_string).
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Bharath SM [Thu, 19 Jun 2025 15:35:32 +0000 (21:05 +0530)]
smb: Use loff_t for directory position in cached_dirents
Change the pos field in struct cached_dirents from int to loff_t
to support large directory offsets. This avoids overflow and
matches kernel conventions for directory positions.
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Paul Aurich [Wed, 20 Nov 2024 16:01:54 +0000 (08:01 -0800)]
smb: Log an error when close_all_cached_dirs fails
Under low-memory conditions, close_all_cached_dirs() can't move the
dentries to a separate list to dput() them once the locks are dropped.
This will result in a "Dentry still in use" error, so add an error
message that makes it clear this is what happened:
[ 495.281119] CIFS: VFS: \\otters.example.com\share Out of memory while dropping dentries
[ 495.281595] ------------[ cut here ]------------
[ 495.281887] BUG: Dentry ffff888115531138{i=78,n=/} still in use (2) [unmount of cifs cifs]
[ 495.282391] WARNING: CPU: 1 PID: 2329 at fs/dcache.c:1536 umount_check+0xc8/0xf0
Also, bail out of looping through all tcons as soon as a single
allocation fails, since we're already in trouble, and kmalloc() attempts
for subseqeuent tcons are likely to fail just like the first one did.
Signed-off-by: Paul Aurich <paul@darkrain42.org> Acked-by: Bharath SM <bharathsm@microsoft.com> Suggested-by: Ruben Devos <rdevos@oxya.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
David Howells [Wed, 18 Jun 2025 15:39:47 +0000 (16:39 +0100)]
cifs: Fix prepare_write to negotiate wsize if needed
Fix cifs_prepare_write() to negotiate the wsize if it is unset.
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Fixes: c45ebd636c32 ("cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
zhangjian [Thu, 19 Jun 2025 01:18:29 +0000 (09:18 +0800)]
smb: client: fix first command failure during re-negotiation
after fabc4ed200f9, server_unresponsive add a condition to check whether client
need to reconnect depending on server->lstrp. When client failed to reconnect
for some time and abort connection, server->lstrp is updated for the last time.
In the following scene, server->lstrp is too old. This cause next command
failure in re-negotiation rather than waiting for re-negotiation done.
1. mount -t cifs -o username=Everyone,echo_internal=10 //$server_ip/export /mnt
2. ssh $server_ip "echo b > /proc/sysrq-trigger &"
3. ls /mnt
4. sleep 21s
5. ssh $server_ip "service firewalld stop"
6. ls # return EHOSTDOWN
If the interval between 5 and 6 is too small, 6 may trigger sending negotiation
request. Before backgrounding cifsd thread try to receive negotiation response
from server in cifs_readv_from_socket, server_unresponsive may trigger
cifs_reconnect which cause 6 to be failed:
ls thread
----------------
smb2_negotiate
server->tcpStatus = CifsInNegotiate
compound_send_recv
wait_for_compound_request
ls thread
----------------
cifs_sync_mid_result return EAGAIN
smb2_negotiate return EHOSTDOWN
Though server->lstrp means last server response time, it is updated in
cifs_abort_connection and cifs_get_tcp_session. We can also update server->lstrp
before switching into CifsInNegotiate state to avoid failure in 6.
Fixes: 7ccc1465465d ("smb: client: fix hang in wait_for_response() for negproto") Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Acked-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: zhangjian <zhangjian496@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sat, 21 Jun 2025 15:27:12 +0000 (08:27 -0700)]
Merge tag 'acpi-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a crash in ACPICA while attempting to evaluate a control method
that expects more arguments than are being passed to it, which was
exposed by a defective firmware update from a prominent OEM on
multiple systems (Rafael Wysocki)"
* tag 'acpi-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Refuse to evaluate a method if arguments are missing
Linus Torvalds [Sat, 21 Jun 2025 15:21:10 +0000 (08:21 -0700)]
Merge tag 'pci-v6.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Set up runtime PM even for devices that lack a PM Capability as we
did before 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when
initializing"), which broke resume in some VFIO scenarios (Mario
Limonciello)
- Ignore pciehp Presence Detect Changed events caused by DPC, even if
they occur after a Data Link Layer State Changed event, to fix a VFIO
GPU passthrough regression in v6.13 (Lukas Wunner)
* tag 'pci-v6.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: pciehp: Ignore belated Presence Detect Changed caused by DPC
PCI/PM: Set up runtime PM even for devices without PCI PM
Linus Torvalds [Sat, 21 Jun 2025 14:59:45 +0000 (07:59 -0700)]
Merge tag 'perf-tools-fixes-for-v6.16-1-2025-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix some file descriptor leaks that stand out with recent changes to
'perf list'
- Fix prctl include to fix building 'perf bench futex' hash with musl
libc
- Restrict 'perf test' uniquifying entry to machines with 'uncore_imc'
PMUs
- Document new output fields (op, cache, mem, dtlb, snoop) used with
'perf mem'
- Synchronize kernel header copies
* tag 'perf-tools-fixes-for-v6.16-1-2025-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools headers x86 cpufeatures: Sync with the kernel sources
perf bench futex: Fix prctl include in musl libc
perf test: Directory file descriptor leak
perf evsel: Missed close() when probing hybrid core PMUs
tools headers: Synchronize linux/bits.h with the kernel sources
tools arch amd ibs: Sync ibs.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers: Syncronize linux/build_bug.h with the kernel sources
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync the drm/drm.h with the kernel sources
perf beauty: Update copy of linux/socket.h with the kernel sources
tools headers UAPI: Sync kvm header with the kernel sources
tools headers x86 svm: Sync svm headers with the kernel sources
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
tools kvm headers arm64: Update KVM header from the kernel sources
tools headers UAPI: Sync linux/prctl.h with the kernel sources to pick FUTEX knob
perf mem: Document new output fields (op, cache, mem, dtlb, snoop)
tools headers: Update the fs headers with the kernel sources
perf test: Restrict uniquifying test to machines with 'uncore_imc'
Linus Torvalds [Sat, 21 Jun 2025 05:36:48 +0000 (22:36 -0700)]
Merge tag 'mtd/fixes-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"The main fix that really needs to get in is the revert of the patch
adding the new mtd_master class, because it entirely fails the
partitioning if a specific Kconfig option is set. We need to think how
to handle that differently, so let's revert it as we need to get back
to the pen and paper situation again.
Otherwise the definition of some Winbond SPI NAND chips are receiving
some fixes (geometry and maximum frequency, mostly).
And finally a small memory leak gets also fixed"
* tag 'mtd/fixes-for-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spinand: fix memory leak of ECC engine conf
mtd: spinand: winbond: Prevent unsupported frequencies on dual/quad I/O variants
mtd: spinand: winbond: Increase maximum frequency on an octal operation
mtd: spinand: winbond: Fix W35N number of planes/LUN
Revert "mtd: core: always create master device"
Currently the call_rcu() API does not check whether a callback
pointer is NULL. If NULL is passed, rcu_core() will try to invoke
it, resulting in NULL pointer dereference and a kernel crash.
To prevent this and improve debuggability, this patch adds a check
for NULL and emits a kernel stack trace to help identify a faulty
caller.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Binbin Wu [Tue, 10 Jun 2025 02:14:21 +0000 (10:14 +0800)]
KVM: TDX: Exit to userspace for GetTdVmCallInfo
Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX,
to allow userspace to provide information about the support of
TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API.
GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>,
<ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>,
<Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and
<Instruction.WRMSR>. They must be supported by VMM to support TDX guests.
For GetTdVmCallInfo
- When leaf (r12) to enumerate TDVMCALL functionality is set to 0,
successful execution indicates all GHCI base TDVMCALLs listed above are
supported.
Update the KVM TDX document with the set of the GHCI base APIs.
- When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it
indicates the TDX guest is querying the supported TDVMCALLs beyond
the GHCI base TDVMCALLs.
Exit to userspace to let userspace set the TDVMCALL sub-function bit(s)
accordingly to the leaf outputs. KVM could set the TDVMCALL bit(s)
supported by itself when the TDVMCALLs don't need support from userspace
after returning from userspace and before entering guest. Currently, no
such TDVMCALLs implemented, KVM just sets the values returned from
userspace.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Binbin Wu [Tue, 10 Jun 2025 02:14:20 +0000 (10:14 +0800)]
KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
Handle TDVMCALL for GetQuote to generate a TD-Quote.
GetQuote is a doorbell-like interface used by TDX guests to request VMM
to generate a TD-Quote signed by a service hosting TD-Quoting Enclave
operating on the host. A TDX guest passes a TD Report (TDREPORT_STRUCT) in
a shared-memory area as parameter. Host VMM can access it and queue the
operation for a service hosting TD-Quoting enclave. When completed, the
Quote is returned via the same shared-memory area.
KVM only checks the GPA from the TDX guest has the shared-bit set and drops
the shared-bit before exiting to userspace to avoid bleeding the shared-bit
into KVM's exit ABI. KVM forwards the request to userspace VMM (e.g. QEMU)
and userspace VMM queues the operation asynchronously. KVM sets the return
code according to the 'ret' field set by userspace to notify the TDX guest
whether the request has been queued successfully or not. When the request
has been queued successfully, the TDX guest can poll the status field in
the shared-memory area to check whether the Quote generation is completed
or not. When completed, the generated Quote is returned via the same
buffer.
Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is
required to handle the KVM exit reason as the initial support for TDX,
by reentering KVM to ensure that the TDVMCALL is complete. While at it,
add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com>
[Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Binbin Wu [Tue, 10 Jun 2025 02:14:19 +0000 (10:14 +0800)]
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs
Add the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED and
return it for unimplemented TDVMCALL subfunctions.
Returning TDVMCALL_STATUS_INVALID_OPERAND when a subfunction is not
implemented is vague because TDX guests can't tell the error is due to
the subfunction is not supported or an invalid input of the subfunction.
New GHCI spec adds TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED to avoid the
ambiguity. Use it instead of TDVMCALL_STATUS_INVALID_OPERAND.
Before the change, for common guest implementations, when a TDX guest
receives TDVMCALL_STATUS_INVALID_OPERAND, it has two cases:
1. Some operand is invalid. It could change the operand to another value
retry.
2. The subfunction is not supported.
For case 1, an invalid operand usually means the guest implementation bug.
Since the TDX guest can't tell which case is, the best practice for
handling TDVMCALL_STATUS_INVALID_OPERAND is stopping calling such leaf,
treating the failure as fatal if the TDVMCALL is essential or ignoring
it if the TDVMCALL is optional.
With this change, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED could be sent to
old TDX guest that do not know about it, but it is expected that the
guest will make the same action as TDVMCALL_STATUS_INVALID_OPERAND.
Currently, no known TDX guest checks TDVMCALL_STATUS_INVALID_OPERAND
specifically; for example Linux just checks for success.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Return it for untrapped KVM_HC_MAP_GPA_RANGE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 20 Jun 2025 16:59:20 +0000 (09:59 -0700)]
Merge tag 'sound-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All changes are device-specific at this
time:
- Fixes for Cirrus codecs with SoundWire, including firmware name
updates
- Fix for i.MX8 SoC DSP
- Usual HD-audio, USB-audio, and ASoC AMD quirks
- Fixes for legendary SoundBlaster AWE32 ISA device (a real one, we
still got a bug report after 25 years)
- Minor build fixes"
* tag 'sound-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda/realtek: Enable headset Mic on Positivo P15X
ASoC: Intel: sof-function-topology-lib: Print out the unsupported dmic count
ASoC: doc: cs35l56: Add CS35L63 to the list of supported devices
ASoC: SOF: imx8: add core shutdown operation for imx8/imx8x
ALSA: hda/realtek: Add quirk for Asus GA605K
ALSA: hda/realtek: enable headset mic on Latitude 5420 Rugged
ASoC: amd: yc: update quirk data for HP Victus
ASoC: apple: mca: Drop default ARCH_APPLE in Kconfig
ALSA: usb-audio: Rename ALSA kcontrol PCM and PCM1 for the KTMicro sound card
ASoC: amd: yc: Add quirk for MSI Bravo 17 D7VF internal mic
ASoC: doc: cs35l56: Update to add new SoundWire firmware filename suffix
ASoC: cs35l56: Use SoundWire address as alternate firmware suffix on L56 B0
ASoC: cs35l56: Use SoundWire address as firmware name suffix for new silicon
ASoC: sdw_utils: Fix potential NULL pointer deref in is_sdca_endpoint_present()
ALSA: sb: Force to disable DMAs once when DMA mode is changed
ALSA: sb: Don't allow changing the DMA mode during operations
ALSA: hda/realtek: Add quirk for Asus GU605C
ALSA: hda/realtek: Fix built-in mic on ASUS VivoBook X513EA
ALSA: hda/realtek - Add mute LED support for HP Victus 16-s1xxx and HP Victus 15-fa1xxx
ALSA: ctxfi: Replace deprecated strcpy() with strscpy()
...
Linus Torvalds [Fri, 20 Jun 2025 16:54:24 +0000 (09:54 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There's nothing major (even the vmalloc one is just suppressing a
potential warning) but all worth having, nonetheless.
- Suppress KASAN false positive in stack unwinding code
- Drop redundant reset of the GCS state on exec()
- Don't try to descend into a !present PMD when creating a huge
vmap() entry at the PUD level
- Fix a small typo in the arm64 booting Documentation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()
arm64/gcs: Don't call gcs_free() during flush_gcs()
arm64: Restrict pagetable teardown to avoid false warning
docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
Jens Axboe [Fri, 20 Jun 2025 13:41:21 +0000 (07:41 -0600)]
io_uring/net: always use current transfer count for buffer put
A previous fix corrected the retry condition for when to continue a
current bundle, but it missed that the current (not the total) transfer
count also applies to the buffer put. If not, then for incrementally
consumed buffer rings repeated completions on the same request may end
up over consuming.
Reported-by: Roy Tang (ErgoniaTrading) <royonia@ergonia.io> Cc: stable@vger.kernel.org Fixes: 3a08988123c8 ("io_uring/net: only retry recv bundle for a full transfer") Link: https://github.com/axboe/liburing/issues/1423 Signed-off-by: Jens Axboe <axboe@kernel.dk>
Takashi Iwai [Fri, 20 Jun 2025 07:58:57 +0000 (09:58 +0200)]
Merge tag 'asoc-fix-v6.16-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.16
A relatively large collection of fixes and updates that came in since
the merge window. Of note are a couple of Cirrus ones which change the
firmware naming for some newly added devices, and a fix from Laurentiu
for issues booting firmwares on the DSPs on i.MX8 SoCs.
Linus Torvalds [Fri, 20 Jun 2025 06:29:35 +0000 (23:29 -0700)]
Merge tag 'block-6.16-20250619' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Two fixes for aoe which fixes issues dating back to when this driver
was converted to blk-mq
- Fix for ublk, checking for valid queue depth and count values before
setting up a device
* tag 'block-6.16-20250619' of git://git.kernel.dk/linux:
ublk: santizize the arguments from userspace when adding a device
aoe: defer rexmit timer downdev work to workqueue
aoe: clean device rq_list in aoedev_downdev()
Linus Torvalds [Fri, 20 Jun 2025 06:25:28 +0000 (23:25 -0700)]
Merge tag 'io_uring-6.16-20250619' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Two fixes for error injection failures. One fixes a task leak issue
introduced in this merge window, the other an older issue with
handling allocation of a mapped buffer.
- Fix for a syzbot issue that triggers a kmalloc warning on attempting
an allocation that's too large
- Fix for an error injection failure causing a double put of a task,
introduced in this merge window
* tag 'io_uring-6.16-20250619' of git://git.kernel.dk/linux:
io_uring: fix potential page leak in io_sqe_buffer_register()
io_uring/sqpoll: don't put task_struct on tctx setup failure
io_uring: remove duplicate io_uring_alloc_task_context() definition
io_uring: fix task leak issue in io_wq_create()
io_uring/rsrc: validate buffer count with offset for cloning
Linus Torvalds [Fri, 20 Jun 2025 06:18:59 +0000 (23:18 -0700)]
Merge tag 'drm-fixes-2025-06-20' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Bit of an uptick in fixes for rc3, msm and amdgpu leading the way,
with i915/xe/nouveau with a few each and then some scattered misc
bits, nothing looks too crazy:
msm:
- Display:
- Fixed DP output on SDM845
- Fixed 10nm DSI PLL init
- GPU:
- SUBMIT ioctl error path leak fixes
- drm half of stall-on-fault fixes
- a7xx: Missing CP_RESET_CONTEXT_STATE
- Skip GPU component bind if GPU is not in the device table
i915:
- Fix MIPI vtotal programming off by one on Broxton
- Fix PMU code for GCOV and AutoFDO enabled build
xe:
- A workaround update
- Fix memset on iomem
- Fix early wedge on GuC Load failure