Ryan Chen [Fri, 12 Jun 2026 08:33:27 +0000 (18:03 +0930)]
arm64: dts: aspeed: Fix duplicate pinctrl labels and address scheme
A report from shashiko-bot highlighted some concerns concurrent to
application of the series[1].
Fix duplicate pinctrl_tach{0-15} and pinctrl_n{cts,dcd,dsr,ri}5 labels
in aspeed-g7-soc1-pinctrl.dtsi. These didn't cause errors from dtc
because dtc accepts duplicate labels for duplicate nodes specified
through a node reference[2].
Drop the cpu-index from secondary/tertiary container nodes: reduce
the "#address-cells" from 2 to 1 and update unit-addresses and reg
accordingly. The 2-cell scheme was proposed in an early mailing list
sketch to prompt discussion[3], but the design evolved in ways that made
it unnecessary.
Also remove URL comments from the DTS. The links were to comments in
the kernel sources with discussion justifying the approach, but are not
necessary to carry forward.
introduced in commit 44e55f1f3088 ("serial: 8250_pci: Consistently
define pci_device_ids using named initializers") has conflicting
assignments. In only some configurations (i.e. W=1 for me) that makes
the compiler unhappy.
So convert the two affected items to PCI_DEVICE which doesn't have that
hidden assigment to .class and .class_mask.
Fixes: 44e55f1f3088 ("serial: 8250_pci: Consistently define pci_device_ids using named initializers") Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Closes: https://lore.kernel.org/linux-serial/ah_5qVKOf8LXG1Xo@ashevche-desk.local/T/#ma6eab90ca801b4292639f5c255a89b4033b33d21 Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260603095616.937968-2-u.kleine-koenig@baylibre.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yi Yang [Thu, 4 Jun 2026 06:07:34 +0000 (06:07 +0000)]
vc_screen: fix null-ptr-deref in vcs_notifier() during concurrent vcs_write
A KASAN null-ptr-deref was observed in vcs_notifier():
BUG: KASAN: null-ptr-deref in vcs_notifier+0x98/0x130
Read of size 2 at addr qmp_cmd_name: qmp_capabilities, arguments: {}
The issue is a race condition in vcs_write(). When the console_lock is
temporarily dropped (to copy data from userspace), the vc_data pointer
obtained from vcs_vc() may become stale. After re-acquiring the lock,
vcs_vc() is called again to re-validate the pointer. If the vc has been
deallocated in the meantime, vcs_vc() returns NULL, and the while loop
breaks (with written > 0). However, after the loop, vcs_scr_updated(vc)
is still called with the now-NULL vc pointer, leading to a null pointer
dereference in the notifier chain (vcs_notifier dereferences param->vc).
Fix this by adding a NULL check for vc before calling vcs_scr_updated().
Fixes: 8fb9ea65c9d1 ("vc_screen: reload load of struct vc_data pointer in vcs_write() to avoid UAF") Cc: stable@vger.kernel.org Signed-off-by: Yi Yang <yiyang13@huawei.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://patch.msgid.link/20260604060734.2914976-1-yiyang13@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Viken Dadhaniya [Thu, 28 May 2026 17:18:07 +0000 (22:48 +0530)]
serial: qcom_geni: Fix RX DMA stall when SE_DMA_RX_LEN_IN is zero
In qcom_geni_serial_handle_rx_dma(), geni_se_rx_dma_unprep() clears
port->rx_dma_addr before SE_DMA_RX_LEN_IN is read. If the register is zero,
for example when the RX stale counter fires on an idle line, the handler
returns without calling geni_se_rx_dma_prep().
The next RX DMA interrupt then hits the !port->rx_dma_addr guard and
returns immediately, so the RX DMA buffer is never rearmed and later input
is lost.
Keep the handler on the rearm path when rx_in is zero. Warn about the
unexpected zero-length DMA completion, skip received-data handling, and
always call geni_se_rx_dma_prep().
Fixes: 2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA") Cc: stable@vger.kernel.org Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com> Link: https://patch.msgid.link/20260528-serial-rx-0-byte-fix-v2-1-b4195cfe342f@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ata: Use named initializers for pci_device_id arrays
While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.
The mentioned robustness is relevant for a planned change to struct
pci_device_id that replaces .driver_data by an anonymous union.
Also drop the comma after a few list terminators.
This patch doesn't modify the compiled array, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
ata: Drop unused assignments of pci_device_id driver data
The drivers explicitly set the .driver_data member of struct
pci_device_id to zero without relying on that value. Drop these unused
assignments.
While touching these arrays, convert the one driver not using PCI_DEVICE
to use that macro and align the array's coding style to what is used
most for these. (i.e. break very long lines, a single space in the list
terminator and no trailing comma.)
This patch doesn't modify the compiled array, only its representation in
source form benefits. The former was confirmed with builds on x86 and
arm64.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Heiko Carstens [Thu, 11 Jun 2026 15:37:46 +0000 (17:37 +0200)]
s390: Revert support for DCACHE_WORD_ACCESS
load_unaligned_zeropad() reads eight bytes from unaligned addresses and may
cross page boundaries. It handles exceptions which may happen if reading
from the second page results in an exception.
For pages which are donated to the Ultravisor for secure execution purposes
the do_secure_storage_access() exception handler however does not handle
such exceptions correctly. Such an exception may result in an endless
exception loop which will never be resolved.
An attempt to fix this [1] turned out to be not sufficient. For now revert
load_unaligned_zeropad() until this problem has been resolved in a proper
way.
Note that the implementation of load_unaligned_zeropad() itself is
correct. The revert is just a temporary workaround until there is complete
fix for secure storage access exceptions.
Merge branch 'slab/for-7.2/alloc_token' into slab/for-next
Merge series "slab: support for compiler-assisted type-based slab cache
partitioning" from Marco Elver. From the cover letter [6]:
Rework the general infrastructure around RANDOM_KMALLOC_CACHES into more
flexible KMALLOC_PARTITION_CACHES, with the former being a partitioning
mode of the latter.
Introduce a new mode, KMALLOC_PARTITION_TYPED, which leverages a feature
available in Clang 22 and later, called "allocation tokens" via
__builtin_infer_alloc_token() [1]. Unlike KMALLOC_PARTITION_RANDOM
(formerly RANDOM_KMALLOC_CACHES), this mode deterministically assigns a
slab cache to an allocation of type T, regardless of allocation site.
The builtin __builtin_infer_alloc_token(<malloc-args>, ...) instructs
the compiler to infer an allocation type from arguments commonly passed
to memory-allocating functions and returns a type-derived token ID. The
implementation passes kmalloc-args to the builtin: the compiler performs
best-effort type inference, and then recognizes common patterns such as
`kmalloc(sizeof(T), ...)`, `kmalloc(sizeof(T) * n, ...)`, but also
`(T *)kmalloc(...)`. Where the compiler fails to infer a type the
fallback token (default: 0) is chosen.
Note: kmalloc_obj(..) APIs fix the pattern how size and result type are
expressed, and therefore ensures there's not much drift in which
patterns the compiler needs to recognize. Specifically, kmalloc_obj()
and friends expand to `(TYPE *)KMALLOC(__obj_size, GFP)`, which the
compiler recognizes via the cast to TYPE*.
Clang's default token ID calculation is described as [1]:
typehashpointersplit: This mode assigns a token ID based on the hash
of the allocated type's name, where the top half ID-space is reserved
for types that contain pointers and the bottom half for types that do
not contain pointers.
Separating pointer-containing objects from pointerless objects and data
allocations can help mitigate certain classes of memory corruption
exploits [2]: attackers who gains a buffer overflow on a primitive
buffer cannot use it to directly corrupt pointers or other critical
metadata in an object residing in a different, isolated heap region.
It is important to note that heap isolation strategies offer a
best-effort approach, and do not provide a 100% security guarantee,
albeit achievable at relatively low performance cost. Note that this
also does not prevent cross-cache attacks: while waiting for future
features like SLAB_VIRTUAL [3] to provide physical page isolation, this
feature should be deployed alongside SHUFFLE_PAGE_ALLOCATOR and
init_on_free=1 to mitigate cross-cache attacks and page-reuse attacks as
much as possible today.
With all that, my kernel (x86 defconfig) shows me a histogram of slab
cache object distribution per /proc/slabinfo (after boot):
The above /proc/slabinfo snapshot shows me there are 6673 allocated
objects (slabs 00 - 07) that the compiler claims contain no pointers or
it was unable to infer the type of, and 12015 objects that contain
pointers (slabs 08 - 15). On a whole, this looks relatively sane.
Additionally, when I compile my kernel with -Rpass=alloc-token, which
provides diagnostics where (after dead-code elimination) type inference
failed, I see 186 allocation sites where the compiler failed to identify
a type (down from 966 when I sent the RFC [4]). Some initial review
confirms these are mostly variable sized buffers, but also include
structs with trailing flexible length arrays.
Merge branch 'slab/for-7.2/alloc_bulk' into slab/for-next
Merge two separately sent but vaguely related patches from Christoph
Hellwig. One changes the kmem_cache_alloc_bulk() API to return bool,
because it was already actiong as all-or-nothing, and that aspect was
not documented. Existing callers are updated.
The second patch simplifies the mempool_alloc_bulk() API to stop
skipping over non-NULL entries in the array, and removes a related
parameter that said how many are non-NULL.
A similar simplification of alloc_pages_bulk() is being discussed as
well and should follow in near future.
mm/slab: do not limit zeroing to orig_size when only red zoning is enabled
When init (zeroing) on allocation is requested, for kmalloc() we
generally have to zero the full object size even if a smaller size is
requested, in order to provide krealloc()'s __GFP_ZERO guarantees.
But if we track the requested size, krealloc() uses that information to
do the right thing, so we can zero only the requested size. With red
zoning also enabled, any extra size became part of the red zone, so it
must not be zeroed and thus we must zero only the requested size.
However the current check is imprecise, and will trigger also when only
SLAB_RED_ZONE is enabled without SLAB_STORE_USER (which enables tracking
the requested size). This means enabling red zoning alone can compromise
krealloc()'s __GFP_ZERO contract.
Fix this by using slub_debug_orig_size() instead, which is the exact
check for whether the requested size is tracked. We don't need to care
if red zoning is also enabled or not. Also update and expand the
comment accordingly.
Fixes: 9ce67395f5a0 ("mm/slub: only zero requested size of buffer for kzalloc when debug enabled") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260610-slab_alloc_flags-v2-1-7190909db118@kernel.org Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Hao Li <hao.li@linux.dev> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Wolfram Sang [Tue, 9 Jun 2026 09:16:14 +0000 (11:16 +0200)]
MAINTAINERS: hand over I2C to Andi Shyti
After 13.5 years of maintaining I2C, it is finally time for me to move
to other areas. So, I hereby transfer I2C maintainership to Andi Shyti.
He has been taking care of the I2C host drivers for a while now and
kindly agreed to look after the whole subsystem. Thank you, Andi! I also
want to thank all contributors, reviewers, and fellow maintainers making
all these years a mostly smooth ride. Happy hacking, everyone!
Cássio Gabriel [Fri, 12 Jun 2026 03:43:55 +0000 (00:43 -0300)]
ALSA: pcxhr: Share PLL frequency register calculation
The PCXHR and HR222 clock paths duplicate the PLL divider calculation and
register encoding. The HR222 variant extends the same format with an
additional range for rates above those supported by the older boards.
Move the complete encoding into pcxhr_pll_freq_register() and pass each
hardware path its existing maximum frequency. The additional encoding
branch is unreachable with the older 110 kHz limit, so this preserves both
paths' accepted ranges and generated register values while removing the
duplicate implementation and its long-standing TODO.
qmi_stop_session() conditionally looks up the cached data and sync
endpoints, but removes each endpoint unconditionally.
The data endpoint is always present for an active offload stream, while
the sync endpoint is optional. When no sync endpoint exists, ep still
refers to the data endpoint and the code attempts to remove that endpoint
a second time. The current sideband implementation rejects the duplicate
removal, but the teardown path should not pass an unrelated endpoint for
an absent sync endpoint.
Only look up and remove an endpoint when its cached pipe exists, check the
lookup result, and clear the cached pipe after handling it. This matches
the normal stream-disable path.
Paolo Bonzini [Fri, 12 Jun 2026 08:51:42 +0000 (10:51 +0200)]
Merge tag 'kvmarm-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 7.2
* New features:
- None. Zilch. Nada. Que dalle.
* Fixes and other improvements:
- Significant cleanup of the vgic-v5 PPI support which was merged in
7.1. This makes the code more maintainable, and squashes a couple
of bugs in the meantime.
- Set of fixes for the handling of the MMU in an NV context,
particularly VNCR-triggered faults. S1POE support is fixed
as well.
- Large set of pKVM fixes, mostly addressing recurring issues
around hypervisor tracking of donated pages in obscure cases
where the donation could fail and leave things in a bizarre
state.
- Fixes for the so-called "lazy vgic init", which resulted in
sleeping operations in non-preemptible sections. This turned
out to be far more invasive than initially expected...
- Reduce the overhead of L1/L2 context switch by not touching
the FP registers.
- Fix the way non-implemented page sizes are dealt with when
a guest insist on using them for S2 translation.
- The usual set of low-impact fixes and cleanups all over the map.
Paolo Bonzini [Fri, 12 Jun 2026 08:47:24 +0000 (10:47 +0200)]
Merge branch 'kvm-single-pdptrs' into HEAD
The non-MMU changes/preliminary cleanups from the "split kvm_mmu in
three" series[1]. The final outcome is to have a single copy of the
PDPTRs (in vcpu->arch) instead of two (in root_mmu and nested_mmu).
Merge tag 'thunderbolt-for-v7.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next
Mika writes:
thunderbolt: Changes for v7.2 merge window
This includes following USB4/Thunderbolt changes for the v7.2 merge
window:
- Make the driver more compliant with the connection manager guide.
- Improvements over Thunderbolt XDomain service handling.
- USB4STREAM driver.
- Split out PCIe bits into pci.c to allow the driver to work on
non-PCIe hosts as well.
- Various fixes and improvements.
All these have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v7.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (41 commits)
thunderbolt: debugfs: Fix sideband write size check
thunderbolt: debugfs: Fix margining error counter buffer leak
thunderbolt: test: Release third DP tunnel
thunderbolt: Prevent XDomain delayed work use-after-free on disconnect
thunderbolt: test: Add KUnit tests for property parser bounds checks
thunderbolt: Add some more descriptive probe error messages
thunderbolt: Require nhi->ops be valid
thunderbolt: Separate out common NHI bits
thunderbolt: Move pci_device out of tb_nhi
thunderbolt: Increase Notification Timeout to 255 ms for USB4 routers
thunderbolt: Increase timeout for Configuration Ready bit
thunderbolt: Verify Router Ready bit is set after router enumeration
thunderbolt: Verify PCIe adapter in detect state before tunnel setup
thunderbolt: Activate path hops from source to destination
thunderbolt: Fix lane bonding log when bonding not possible
thunderbolt: Don't access path config space on Lane 1 adapters in tb_switch_reset_host()
thunderbolt: Improve multi-display DisplayPort tunnel allocation
docs: admin-guide: thunderbolt: Add instructions how to use USB4STREAM
thunderbolt: Add support for USB4STREAM
thunderbolt: Add support for ConfigFS
...
Paolo Bonzini [Sat, 30 May 2026 16:55:45 +0000 (12:55 -0400)]
KVM: x86/mmu: move pdptrs out of the MMU
PDPTRs are part of the CPU state. A bit unconventionally, they are
reached via vcpu->arch.walk_mmu instead of being stored in vcpu->arch
directly. That is nice in principle---it would allow TDP shadow paging
to have its own PDPTRs---but it is not necessary, because EPT has no
PDPTRs and NPT does not cache them.
Since kvm_pdptr_read does not otherwise need the MMU, drop the pdptrs
from the MMU altogether. There is however something to be careful
about, in that PDPTRs are now not stored separately in root_mmu and
nested_mmu for L1 and L2 guests. In practice this was already not
an issue:
- for EPT the VMCS0x has to keep them up to date; and for the purpose
of emulation they are always loaded from the VMCS on vmentry/vmexit,
thanks to the clearing of dirty and available register bitmaps in
vmx_switch_vmcs()
- for NPT, VCPU_EXREG_PDPTR is similarly cleared for nNPT, which does
not cache the PDPTRs; while for non-nNPT the PDPTRs are loaded
together with the load of CR3.
Note that page table PDPTRs are not affected, since they are stored
in pae_root.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260530165545.25599-6-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sat, 30 May 2026 16:55:44 +0000 (12:55 -0400)]
KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging
This is true for both Intel and AMD. On Intel, "enable INVPCID" is
set unconditionally if supported, but the vmexit is triggered by the
"INVLPG exiting" control which is disabled by enable_ept. On AMD, KVM
can intercept INVPCID if NPT is enabled but only in order to inject #UD
in the guest.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260530165545.25599-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sat, 30 May 2026 16:55:43 +0000 (12:55 -0400)]
KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
When L2 runs under nested NPT and uses PAE paging, KVM's cached PDPTRs
in mmu->pdptrs[] can hold stale or wrong values after nested
transitions and across migration restore, because both
nested_svm_load_cr3() and svm_get_nested_state_pages() only refresh
PDPTRs on the !nested_npt path.
The user-visible bug is on migration restore of an L2 running with nested
NPT and 32-bit PAE paging, if userspace uses KVM_SET_SREGS rather than
KVM_SET_SREGS2. In that case, load_pdptrs() leaves VCPU_EXREG_PDPTR
marked as available, and kvm_pdptr_read() will use a stale translation
that used L1 GPAs instead of L2 nGPAs. svm_get_nested_state_pages()
runs on first KVM_RUN but skips the refresh because nested_npt_enabled()
is true. The CPU itself reads L2's PDPTRs correctly from memory via
L1's NPT, but KVM-side walking of guest PAE page tables uses the bogus
cached values.
Unlike Intel's GUEST_PDPTR0..3 fields in the VMCS, SVM has no
VMCB-cached PDPTR state: the in-memory PDPTEs at the current CR3 are
the only source of truth, and svm_cache_reg(VCPU_EXREG_PDPTR) simply
reloads them from memory via load_pdptrs(). Clearing the avail
bit (and the dirty bit because !avail/dirty is invalid) to force
a reload when PDPTRs as needed fixes the bug.
Do the same for nested_svm_load_cr3()'s nested_npt branch, so that
the invariant "PDPTRs need reloading" is handled similarly for both
immediate and deferred loading.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260530165545.25599-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sat, 30 May 2026 16:55:42 +0000 (12:55 -0400)]
KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare
The early vmwrite of the PDPTRs in prepare_vmcs02_rare() is redundant, because
every write it does will be performed by prepare_vmcs02() if it is actually
needed.
In any case where the emulator or the processor need the PDPTR, either
is_pae_paging() is true on vmentry, or a write of CR0, CR4 or EFER will
cause a vmexit to L0. The next vmentry will refresh the PDPTRs in the
vmcs02 from vmcs12.
In fact, the original version[1] of what ended up being commit c7554efc8335 ("KVM: nVMX: Copy PDPTRs to/from vmcs12 only when
necessary"), the writes in what is now prepare_vmcs02_rare() were removed.
When the mega-collection of optimizations was posted[2], the removal of
that code got dropped as a rebase good, so reinstate it.
Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260530165545.25599-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sat, 30 May 2026 16:55:41 +0000 (12:55 -0400)]
KVM: x86: remove nested_mmu from mmu_is_nested()
nested_mmu is always stored into vcpu->arch.walk_mmu at the same time as
guest_mmu is stored into vcpu->arch.mmu. But nested_mmu is not even
a proper MMU, it is only used for page walking; plus the fact that
walk_mmu has to be switched at all is just an implementation detail.
In the end what matters here is whether the guest is using nested
page tables; vmx/nested.c and svm/nested.c check it to see if they
are in nEPT or nNPT context respectively. So switch to checking
root_mmu vs. guest_mmu, which is a more cogent test.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260511150648.685374-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260530165545.25599-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Marc Zyngier [Fri, 12 Jun 2026 08:29:34 +0000 (09:29 +0100)]
Merge branch kvm-arm64/nv-mmu-7.2 into kvmarm-master/next
* kvm-arm64/nv-mmu-7.2:
: .
: Assorted collection of fixes for NV MMU bugs
:
: - Correctly plug AT S1E1A handling in the emulation backend
:
: - Make CPTR_EL2.E0POE depend on FEAT_S1POE
:
: - Drop the reference on the page if the VNCR translation
: races with an MMU notifier
:
: - Correctly synthesise an SEA if a page table walk fails due
: to a guest error
:
: - Fully invalidate the VNCR TLB and fixmap when translating
: for a new VNCR
:
: - Restart S1 walk when the S2 walk fails due to a race condition
:
: - Correctly return -EAGAIN when a S1 walk fails
:
: - Fix block mapping validity check in stage-1 walker for 64kB pages
:
: - Fix potential NULL dereference when performing an EL2 TLBI targeting
: the VNCR page
:
: - Hold kvm->mmu_lock while initialising the vncr_tlb pointer
: .
KVM: arm64: nv: Hold kvm->mmu_lock while initialising vcpu->arch.vncr_tlb
KVM: arm64: nv: Avoid dereferencing NULL VNCR pseudo-TLB
KVM: arm64: Fix block mapping validity check in stage-1 walker
KVM: arm64: nv: Restart stage-1 walk if stage-2 desc update fails
KVM: arm64: Restart instruction upon race in __kvm_at_s12()
KVM: arm64: nv: Inject SEA TTW when desc update can't write to GPA
KVM: arm64: nv: Fully update VNCR fixmap state in kvm_translate_vncr()
KVM: arm64: Don't leak PFN when kvm_translate_vncr() races MMU notifier
arm64: cpufeature: Expose ID_AA64ISAR2_EL1.ATS1A to KVM
KVM: arm64: Wire AT S1E1A in the system instruction handling table
KVM: arm64: Key CPTR_EL2.E0POE propagation on FEAT_S1POE
Marc Zyngier [Fri, 12 Jun 2026 08:29:31 +0000 (09:29 +0100)]
Merge branch kvm-arm64/misc-7.2 into kvmarm-master/next
* kvm-arm64/misc-7.2:
: .
: - Check for a valid vcpu pointer upon deactivating traps when handling
: a HYP panic in VHE mode
:
: - Make the __deactivate_fgt() macro use its arguments instead of the
: surrounding context
:
: - Don't bother with initialising TPIDR_EL2 in the hyp stubs, as this
: is already taken care of in more obvious places
:
: - Drop the unused kvm_arch pointer passed to __load_stage2()
:
: - Return -EOPNOTSUPP when a hypercall fails for some reason, instead of
: returning whatever was in the result structure
:
: - Make the ITS ABI selection helpers return void, which avoids wondering
: about the nature of the return code (always 0)
: .
KVM: arm64: vgic-its: Make ABI commit helpers return void
KVM: arm64: Set a Linux errno on SMCCC error in kvm_call_hyp_nvhe()
KVM: arm64: Remove @arch from __load_stage2()
KVM: arm64: Don't populate TPIDR_EL2 in finalise_el2()
KVM: arm64: Fix __deactivate_fgt macro parameter typo
KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
Paolo Bonzini [Fri, 12 Jun 2026 08:12:22 +0000 (10:12 +0200)]
Merge tag 'kvm-x86-selftests-7.2' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 7.2
- Randomize the dirty log test's delay when reaping the bitmap on the first
pass, as always waiting only 1ms hid a KVM RISC-V bug as the test reaped the
bitmap before KVM could build up enough state to hit the bug.
Paolo Bonzini [Fri, 12 Jun 2026 08:11:59 +0000 (10:11 +0200)]
Merge tag 'kvm-x86-mmu-7.2' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 7.2
- Use the kernel's "enum pg_level" in the TDX APIs instead of the TDX-Module's
level definitions (which are 0-based).
- Rework the TDX memory APIs to not require/assume that guest memory is
backed by "struct page" (in prepartion for guest_memfd hugepage support).
- Overhaul the TDP MMU => S-EPT code to move as much S-EPT specific logic as
possible into the TDX code, and to funnel (almost) all S-EPT updates into
a single chokepoint. The motivation is largely to prepare for upcoming
Dynamic PAMT support, but the cleanups are nice to have on their own.
- Plug a hole in the shadow MMU where KVM fails to recursively zap nested TDP
shadow when L1 is tearing its TDP page tables from the bottom up, as KVM's
TDP MMU now does.
Paolo Bonzini [Fri, 12 Jun 2026 08:11:09 +0000 (10:11 +0200)]
Merge tag 'kvm-x86-misc-7.2' of https://github.com/kvm-x86/linux into HEAD
KVM misc x86 changes for 7.2
- Handle EXIT_FASTPATH_EXIT_USERSPACE in vendor code to ensure vendor code
gets a chance to handle things like reaping the PML buffer.
- Ensure KVM's copy of CR0 and CR3 are up-to-date on SVM prior to invoking
fastpath handlers.
- Update KVM's view of PV async enabling if and only if the MSR write fully
succeeds.
- Fix a variety of issues where the emulator doesn't honor guest-debug state,
and clean up related code along the way.
- Synthesize EPT Violation and #NPF "error code" bits when injecting faults
into L1 that didn't originate in hardware (in which case the VMCS/VMCB
doesn't hold relevant information).
- Add support for virtualizing (well, emulating) AMD's flavor of CPL>0 CPUID
faulting.
- Clean up the GPR APIs so that KVM's use of "raw" is consistent, and fix a
variety of minor bugs along the way.
- Fix an OOB memory access due to not checking the VP ID when handling a
Hyper-V PV TLB flush for L2.
- Fix a bug in the mediated PMU's handling of fixed counters that allowed the
guest to bypass the PMU event filter.
- Allow userspace to return EAGAIN when handling SNP and TDX hypercalls, so
the KVM can forward a "retry" status code to the guest, and reserve all
unused error codes for future usage.
Paolo Bonzini [Fri, 12 Jun 2026 08:08:52 +0000 (10:08 +0200)]
Merge tag 'kvm-x86-gmem-7.2' of https://github.com/kvm-x86/linux into HEAD
KVM guest_memfd changes for 7.2
- Return -EEXIST instead of -EINVAL if userspace attempts to bind a gmem
range to multiple memslots, and fix the test that was supposed to ensure
KVM returns -EEXIST.
- Treat memslot binding offsets and sizes as unsigned values to fix a bug
where KVM interprets a large "offset + size" as a negative value and allows
a nonsensical offset.
- Use the inode number instead of the page offset for the NUMA interleaving
index to fix a bug where the effective index would jump by two for
consecutive pages (the caller also adds in the page offset).
Marc Zyngier [Fri, 12 Jun 2026 08:08:31 +0000 (09:08 +0100)]
Merge branch kvm-arm64/vgic-v5-PPI-fixes into kvmarm-master/next
* kvm-arm64/vgic-v5-PPI-fixes:
: .
: Substantial cleanup of the vgic-v5 PPI support. From the original
: cover letter:
:
: "With the GICv5 PPi support merged in, it has become obvious that a few
: things could be improved, both from the correctness and maintainability
: angles."
: .
KVM: arm64: Fix arch timer interrupts for GICv3-on-GICv5 guests
irqchip/gic-v5: Immediately exec priority drop following activate
Documentation: KVM: Clarify that PMU_V3_IRQ IntID requirements for GICv5
Documentation: KVM: Fix typos in VGICv5 documentation
KVM: arm64: selftests: Improve error handling for GICv5 PPI selftest
KVM: arm64: selftests: Cleanup unused vars in GICv5 PPI selftest
KVM: arm64: selftests: Add missing GIC CDEN to no-vgic-v5 selftest
KVM: arm64: vgic-v5: Atomically assign bits to PPI DVI bitmap
KVM: arm64: vgic-v5: Add missing trap handing for NV triage
KVM: arm64: vgic-v5: Limit support to 64 PPIs
KVM: arm64: vgic: Rationalise per-CPU irq accessor
KVM: arm64: vgic-v5: Drop defensive checks from vgic_v5_ppi_queue_irq_unlock()
KVM: arm64: vgic: Consolidate vgic_allocate_private_irqs_locked()
KVM: arm64: vgic: Constify struct irq_ops usage
KVM: arm64: vgic-v5: Drop pointless ARM64_HAS_GICV5_CPUIF check
KVM: arm64: vgic-v5: Remove use of __assign_bit() with a constant
KVM: arm64: vgic-v5: Move PPI caps into kvm_vgic_global_state
KVM: arm64: vgic-v5: Add for_each_visible_v5_ppi() iterator
Marc Zyngier [Fri, 12 Jun 2026 08:08:25 +0000 (09:08 +0100)]
Merge branch kvm-arm64/pkvm-fixes-7.2 into kvmarm-master/next
* kvm-arm64/pkvm-fixes-7.2:
: .
: Assorted pKVM fixes for 7.2:
:
: - Ensure that the vcpu memcache is filled in a number of cases (donate,
: share, selftest)
:
: - Fix vmemmap page order handling by resetting it when initialising the
: memory pool
:
: - Don't leak page references on failed memory donation
:
: - Add sanity-check for refcounted pages when donating/sharing pages
:
: - Clear __hyp_running_vcpu on state flush
:
: - Check LR upper bound against a trusted value
:
: - Assorted fixes for the host-side tracking of the pages shared with
: EL2 as a result of some Sashiko testing from Fuad
:
: - Correctly forward HCR_EL2.VSE from host to guest, so that protected
: guests can see SErrors
: .
KVM: arm64: Roll back partial shares on kvm_share_hyp() failure
KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure
KVM: arm64: Free hyp-share tracking node when share hypercall fails
KVM: arm64: Flush HCR_EL2.VSE to deliver SErrors to pKVM guests
KVM: arm64: Bound used_lrs when flushing the pKVM hyp vCPU
KVM: arm64: Clear __hyp_running_vcpu when flushing the pKVM hyp vCPU
KVM: arm64: Pre-check vcpu memcache for host->guest donate
KVM: arm64: Pre-check vcpu memcache for host->guest share
KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host
KVM: arm64: Fix __pkvm_init_vm error path
KVM: arm64: Reset page order in pKVM hyp_pool
Marc Zyngier [Fri, 12 Jun 2026 08:04:24 +0000 (09:04 +0100)]
Merge branch kvm-arm64/nv-granule-sizes into kvmarm-master/next
* kvm-arm64/nv-granule-sizes:
: .
: Tidying up of the behaviour when the selected page size in not
: implemented, courtesy of Wei-Lin Chang. From the initial cover
: letter:
:
: "This small series fixes the granule size selection for software stage-1
: and stage-2 walks. Previously we treat the guest's TCR/VTCR.TGx as-is
: and use the encoded granule size for the walks. However this is
: incorrect if the granule sizes are not advertised in the guest's
: ID_AA64MMFR0_EL1.TGRAN*. The architecture specifies that when an
: unsupported size is programed in TGx, it must be treated as an
: implemented size. Fix this by choosing an available one while
: prioritizing PAGE_SIZE."
: .
KVM: arm64: Fallback to a supported value for unsupported guest TGx
KVM: arm64: nv: Use literal granule size in TLBI range calculation
KVM: arm64: Factor out TG0/1 decoding of VTCR and TCR
KVM: arm64: nv: Rename vtcr_to_walk_info() to setup_s2_walk()
Marc Zyngier [Fri, 12 Jun 2026 08:03:57 +0000 (09:03 +0100)]
Merge branch kvm-arm64/nv-fp-elision into kvmarm-master/next
* kvm-arm64/nv-fp-elision:
: .
: Significantly reduce the overhead of the context switch between L1 and
: L2 guests by eliding the save/restore of the FP/SIMD/SVE registers, as
: this state is shared between the two guests, and therefore can be left
: live.
: .
KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
KVM: arm64: nv: Track L2 to L1 exception emulation
Marc Zyngier [Fri, 12 Jun 2026 08:03:24 +0000 (09:03 +0100)]
Merge branch kvm-arm64/no-lazy-vgic-init into kvmarm-master/next
* kvm-arm64/no-lazy-vgic-init:
: .
: Fix an ugly situation where the vgic lazy init could happen in
: non-preemtible contexts such as vcpu reset, resulting in lockdep
: splats.
:
: This requires revamping the way in-kernel emulation of devices
: (timers, PMU) are presenting their interrupt to the vgic, and
: make sure there is no need to init the vgic on the back of that.
: .
KVM: arm64: vgic-v2: Don't init the vgic on in-kernel interrupt injection
KVM: arm64: vgic-v2: Force vgic init on injection outside the run loop
KVM: arm64: pmu: Kill the PMU interrupt level cache
KVM: arm64: timer: Kill the per-timer irq level cache
KVM: arm64: Simplify userspace notification of interrupt state
KVM: arm64: timer: Repaint kvm_timer_{should,irq_can}_fire() to kvm_timer_{pending,enabled}()
Jackie Liu [Thu, 4 Jun 2026 07:51:47 +0000 (15:51 +0800)]
KVM: arm64: vgic-its: Make ABI commit helpers return void
The return values of vgic_its_set_abi() and vgic_its_commit_v0() are always
0 and do not carry useful error information. Simplify by changing them to
void.
Suggested-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Oliver Upton <oupton@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://patch.msgid.link/20260604075147.53299-1-liu.yun@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
Mikhail Lobanov [Wed, 10 Jun 2026 19:19:04 +0000 (22:19 +0300)]
xfs: shut down the filesystem on a failed mount
A corrupt/crafted XFS image can make mount fail after background inode
inactivation has already been enabled. xfs_mountfs() turns on inodegc
(xfs_inodegc_start()) right after log recovery, but the quota subsystem
(mp->m_quotainfo) is only allocated much later, in xfs_qm_newmount() /
xfs_qm_mount_quotas(). The quota accounting flags in mp->m_qflags are
parsed from the mount options before xfs_mountfs() even runs.
If the mount then aborts in between - e.g. xfs_rtmount_inodes() failing
with "failed to read RT inodes" - the unwind path flushes the inodegc
queue, which inactivates the inodes that are still queued, and
xfs_inactive() calls xfs_qm_dqattach(). That path trusts
XFS_IS_QUOTA_ON() (the flag is set) and dereferences the not yet
allocated mp->m_quotainfo:
XFS (loop0): failed to read RT inodes
Oops: general protection fault, probably for non-canonical address
0xdffffc000000002a: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000150-0x0000000000000157]
Workqueue: xfs-inodegc/loop0 xfs_inodegc_worker
RIP: 0010:__mutex_lock+0xfe/0x930
Call Trace:
xfs_qm_dqget_cache_lookup+0x63/0x7f0
xfs_qm_dqget_inode+0x336/0x860
xfs_qm_dqattach_one+0x232/0x4e0
xfs_qm_dqattach_locked+0x2c6/0x470
xfs_qm_dqattach+0x46/0x70
xfs_inactive+0x988/0xe80
xfs_inodegc_worker+0x27c/0x730
The NULL m_quotainfo deref is only one symptom. The deeper problem is
that a failed mount should not be inactivating inodes at all: it must
not write to the (possibly corrupt, only partially set up) persistent
metadata of a filesystem we just refused to mount, and the subsystems
inactivation relies on may not be initialised.
Mark the filesystem shut down before flushing the inodegc queue in the
xfs_mountfs() failure path. With the preceding patch a shut down mount
no longer inactivates the queued inodes: xfs_inactive() returns early so
they are dropped straight to reclaim instead. They are still pulled down
so reclaim can free them (which is why the flush was added in commit ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")), but
without touching the on-disk structures - matching that comment's own
"pull down all the state and flee" intent.
Use SHUTDOWN_META_IO_ERROR for the shutdown: it is the generic "cannot
safely touch metadata" reason already used elsewhere in this file and in
the xfs_ifree() failure path, and unlike SHUTDOWN_FORCE_UMOUNT it does
not log a misleading "User initiated shutdown received". A failed mount
is not necessarily on-disk corruption (it can be a transient I/O or
resource error), so SHUTDOWN_CORRUPT_ONDISK would not be accurate either.
Found by fuzzing XFS with syzkaller (corrupt image mount); reproduced and
verified under QEMU/KASAN.
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Mikhail Lobanov [Wed, 10 Jun 2026 19:19:03 +0000 (22:19 +0300)]
xfs: skip inode inactivation on a shut down mount
XFS already declines to inactivate inodes on a shut down mount, but only
at queue time: xfs_inode_mark_reclaimable() calls
xfs_inode_needs_inactive(), which returns false when the mount is shut
down ("If the log isn't running, push inodes straight to reclaim"), and
then drops the dquots and marks the inode reclaimable directly.
An inode that was queued for background inactivation while the mount was
still live is not covered by that check: the inodegc worker still calls
xfs_inactive() on it even after the mount has been shut down in the
meantime. Inactivation modifies persistent metadata and runs
transactions that cannot complete on a shut down mount, and it relies on
subsystems (e.g. quota) that a torn down, or never fully set up, mount
may not have available.
Honour the same invariant in xfs_inactive() itself: if the mount is shut
down, return early before doing any inactivation work. The dquots
attached to the inode are released by the existing xfs_qm_dqdetach() at
the out: label, so references are not leaked, and the caller then makes
the inode reclaimable exactly as before.
On its own this is a consistency fix with the existing queue-time
behaviour; it is also a prerequisite for shutting the mount down in the
xfs_mountfs() failure path in the following patch.
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Because CYCLE_LSN/BLOCK_LSN are defined in xfs_log_format.h, XFS_LSN_CMP
forces a xfs_log_format.h dependency in xfs_log.h. Move XFS_LSN_CMP
to xfs_log_format.h and drop the macro/inline indirection to clean up
our header mess a little bit.
This also helps xfsprogs, which doesn't have xfs_log.h, but needs
XFS_LSN_CMP.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Yao Sang [Fri, 12 Jun 2026 02:44:30 +0000 (10:44 +0800)]
xfs: shut down zoned file systems on writeback errors
Zoned writeback allocates space from an open zone and advances the
in-memory allocation state before submitting the bio. The completion
path only records the written blocks and updates the mapping on success.
If the write fails, XFS cannot tell how far the device write pointer
advanced and cannot safely roll the open zone accounting back.
This was observed while investigating xfs/643 and xfs/646 on an external
ZNS realtime device. A writeback error after consuming space from an
open zone left later writers waiting for open-zone or GC progress that
could not happen. xfs/643 exposed this through the GC defragmentation
path, while xfs/646 exposed the same failure mode through the
truncate/EOF-zeroing space wait path.
There is no local recovery path in ioend completion that can restore a
consistent zoned allocation state after the device has rejected the
write. Treat writeback errors for zoned inodes as fatal and force a
file system shutdown from the ioend completion path. The existing
shutdown path wakes zoned allocation waiters and makes future space
waits return -EIO instead of leaving tasks stuck waiting for progress.
Signed-off-by: Yao Sang <sangyao@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Jason Gunthorpe [Fri, 27 Mar 2026 15:23:58 +0000 (12:23 -0300)]
iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
The spec in Table 14 defines the "Entire Cache" case as having the low
12 bits as zero. Indeed the command format doesn't even have the low
12 bits. Since there is only one user now, fix the constant to have 0
in the low 12 bits instead of 1 and remove the masking.
Jason Gunthorpe [Fri, 27 Mar 2026 15:23:57 +0000 (12:23 -0300)]
iommu/amd: Have amd_iommu_domain_flush_pages() use last
Finish clearing out the size/last/end switching by converting
amd_iommu_domain_flush_pages() to use last-based logic.
This algorithm is simpler than the previous. Ultimately all this wants
to do is select powers of two that are aligned to address and not
longer than the distance to last.
The new version is fully safe for size = U64_MAX and last = U64_MAX.
Finally, the gather can be passed through natively without risking an
overflow in (gather->end - gather->start + 1).
Jason Gunthorpe [Fri, 27 Mar 2026 15:23:56 +0000 (12:23 -0300)]
iommu/amd: Pass last in through to build_inv_address()
This is the trivial call chain below amd_iommu_domain_flush_pages().
Cases that are doing a full invalidate will pass a last of U64_MAX.
This avoids converting between size and last, and type confusion with
size_t, unsigned long and u64 all being used in different places along
the driver's invalidation path. Consistently use u64 in the internals.
Arnd Bergmann [Fri, 12 Jun 2026 07:01:25 +0000 (09:01 +0200)]
Merge tag 'bst-arm64-emmc-driver-dts-for-v7.2' of https://github.com/BlackSesame-SoC/linux into soc/dt
arm64: BST C1200 eMMC DTS for v7.2
Black Sesame Technologies:
Enable eMMC controller on BST C1200 CDCU1.0 board:
- Add mmc0 node in bstc1200.dtsi (DWCMSHC SDHCI controller)
- Add fixed clock definition and reserved SRAM bounce buffer
- Enable mmc0 with 8-bit bus on CDCU1.0 ADAS 4C2G board
The MMC driver was merged via mmc-next in v7.1-rc1.
this is the remaining DTS piece.
Signed-off-by: Gordon Ge <gordon.ge@bst.ai>
* tag 'bst-arm64-emmc-driver-dts-for-v7.2' of https://github.com/BlackSesame-SoC/linux:
arm64: dts: bst: enable eMMC controller in C1200
Albert Yang [Fri, 12 Jun 2026 00:40:23 +0000 (08:40 +0800)]
arm64: dts: bst: enable eMMC controller in C1200
Add mmc0 node for the DWCMSHC SDHCI controller with basic configuration
(disabled by default) and fixed clock definition in bstc1200.dtsi.
Enable mmc0 with board-specific configuration including 8-bit bus
width and reserved SRAM bounce buffer on the CDCU1.0 ADAS 4C2G board.
The bounce buffer in reserved SRAM addresses hardware constraints
where the eMMC controller cannot access main system memory through
SMMU due to a hardware bug, and all DRAM is located outside the
4GB boundary.
Signed-off-by: Albert Yang <yangzh0906@thundersoft.com> Acked-by: Gordon Ge <gordon.ge@bst.ai> Signed-off-by: Gordon Ge <gordon.ge@bst.ai>
Dave Airlie [Fri, 12 Jun 2026 03:57:16 +0000 (13:57 +1000)]
Merge tag 'drm-xe-fixes-2026-06-11' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- fix oops in suspend/shutdown without display (Jani)
- RAS fixes (Raag)
- Use HW_ERR prefix in log (Raag)
- include all registered queues in TLB invalidation (Tangudu)
- Fix refcount leak in xe_range_tree in error paths (Wentao)
- fix job timeout recovery for unstarted jobs and kernel queues (Rodrigo)
Wentao Liang [Thu, 4 Jun 2026 10:27:06 +0000 (10:27 +0000)]
crypto: tegra - fix refcount leak in tegra_se_host1x_submit()
The timeout error path in tegra_se_host1x_submit() returns without
calling host1x_job_put(), while all other paths (success, submit
error, pin error) properly release the job reference through the
job_put label. Since host1x_job_alloc() initializes the reference
count and host1x_job_put() is required to drop it, omitting it on
timeout causes a permanent refcount leak.
Fix this by redirecting the timeout return to the existing job_put
label, ensuring the job reference and any associated syncpt
references are consistently released.
Ilya Dryomov [Wed, 3 Jun 2026 15:50:04 +0000 (17:50 +0200)]
crypto: testmgr - allow authenc(hmac(sha{256,384}),cts(cbc(aes))) in FIPS mode
hmac(sha256), hmac(sha384) and cts(cbc(aes)) algorithms have been
marked as FIPS allowed for years. Mark the respective authenc()
constructions per RFC 8009 ("AES Encryption with HMAC-SHA2 for
Kerberos 5") as such as well.
SP 800-57 Part 3 Rev. 1 from Jan 2015 [1] links the draft of what
became RFC 8009 in Oct 2016 as approved in section 6.3 Procurement
Guidance (item/recommendation 3).
Wentao Liang [Wed, 3 Jun 2026 11:03:27 +0000 (11:03 +0000)]
hwrng: jh7110 - fix refcount leak in starfive_trng_read()
The starfive_trng_read() function acquires a runtime PM reference
via pm_runtime_get_sync() but fails to release it on two error
paths. If starfive_trng_wait_idle() or starfive_trng_cmd() returns
an error, the function exits without calling
pm_runtime_put_sync_autosuspend(), leaving the runtime PM usage
counter permanently elevated and preventing the device from entering
runtime suspend.
Refactor the function to use a unified error path that calls
pm_runtime_put_sync_autosuspend() before returning.
Thorsten Blum [Tue, 2 Jun 2026 22:25:19 +0000 (00:25 +0200)]
crypto: atmel-ecc - drop dead code in atmel_ecdh_max_size
atmel_ecdh_init_tfm() always allocates ctx->fallback, so it is never
NULL in atmel_ecdh_max_size(). Remove the dead code and return
crypto_kpp_maxsize() directly.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Felix Gu [Tue, 2 Jun 2026 14:55:35 +0000 (22:55 +0800)]
crypto: cavium/cpt - fix DMA cleanup using wrong loop index
The sg_cleanup error path used list[i] instead of list[j] when unmapping
DMA buffers, leaking successfully mapped entries and repeatedly unmapping
the failed one.
Fixes: c694b233295b ("crypto: cavium - Add the Virtual Function driver for CPT") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Felix Gu [Tue, 2 Jun 2026 14:38:26 +0000 (22:38 +0800)]
crypto: marvell/octeontx - fix DMA cleanup using wrong loop index
The sg_cleanup path used list[i] instead of list[j] when unmapping DMA
buffers, leaking successfully mapped entries and repeatedly unmapping
the failed one.
Fixes: 10b4f09491bf ("crypto: marvell - add the Virtual Function driver for CPT") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
MAINTAINERS: make myself the maintainer of the Qualcomm QCE driver
Qualcomm wants to keep supporting and extending the crypto engine driver.
Thara has not been active for many months, so change the maintainer to
myself and upgrade the driver to Supported.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rosen Penev [Tue, 2 Jun 2026 01:46:45 +0000 (18:46 -0700)]
crypto: amcc - convert irq_of_parse_and_map to platform_get_irq
Replace the deprecated irq_of_parse_and_map() call with the modern
platform_get_irq() in the probe function. This also improves error
handling: platform_get_irq() returns a negative errno on failure,
whereas irq_of_parse_and_map() returned 0.
Change the irq field in struct crypto4xx_core_device from u32 to int
to match the return type of platform_get_irq().
Assisted-by: opencode:big-pickle Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Mon, 1 Jun 2026 16:07:57 +0000 (16:07 +0000)]
crypto: sun4i-ss - Remove insecure and unused rng_alg
Remove sun4i_ss_rng, as it is insecure and unused:
- It has multiple vulnerabilities. sun4i_ss_prng_seed() is missing
locking and has a buffer overflow. sun4i_ss_prng_generate() fails to
fill the entire buffer with cryptographic random bytes, because it
rounds the destination length down and also doesn't actually wait for
the hardware to be ready before pulling bytes from it.
- No user of this code is known. It's usable only theoretically via the
"rng" algorithm type of AF_ALG. But userspace actually just uses the
actual Linux RNG (/dev/random etc) instead. And rng_algs don't
contribute entropy to the actual Linux RNG either. (This may have
been confused with hwrng, which does contribute entropy.)
The sun4i_ss_prng_seed() buffer overflow was reported by Tianchu Chen
and discovered by Atuin - Automated Vulnerability Discovery Engine
There's no point in fixing all these vulnerabilities individually when
this is unused code, so let's just remove it.
Fixes: b8ae5c7387ad ("crypto: sun4i-ss - support the Security System PRNG") Cc: stable@vger.kernel.org Reported-by: Tianchu Chen <flynnnchen@tencent.com> Closes: https://lore.kernel.org/r/af749a8447bd7f0e9dd26ca6c87e9c6afecb09d9@linux.dev/ Acked-by: Corentin LABBE <clabbe.montjoie@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dave Jiang [Thu, 11 Jun 2026 23:03:55 +0000 (16:03 -0700)]
cxl/test: Unregister cxl_acpi in cxl_test_init() error path
In cxl_test_init(), Once cxl_mock_platform_device_add() succeeds, all
error paths after needs to call platform_device_unregister() instead of
platform_device_put() to clean up.
Fixes: 67dcdd4d3b83 ("tools/testing/cxl: Introduce a mocked-up CXL port hierarchy") Reported-by: sashiko-bot Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260611230355.198912-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Chun-Tse Shao [Thu, 11 Jun 2026 21:56:32 +0000 (14:56 -0700)]
perf stat: Fix false NMI watchdog warning in aggregation modes
In aggregation modes (e.g. --per-socket, --per-die, etc.), a counter
might not be scheduled or counted on specific aggregate groups if it was
not assigned to the CPUs belonging to those groups. However, the
printout() check triggers the "print_free_counters_hint" logic
unconditionally for any supported counter with a missing count. This
results in a false "Some events weren't counted. Try disabling the NMI
watchdog" warning.
Furthermore, the NMI watchdog only reserves performance counters on core
PMUs. Uncore PMU events (e.g. CHA, IMC) are not affected by the NMI
watchdog, but their failures also falsely triggered this warning.
This warning was originally introduced in commit 02d492e5dcb72c00 ("perf
stat: Issue a HW watchdog disable hint")
To fix this, restrict setting of print_free_counters_hint to only
trigger for core PMU events by checking counter->pmu and
counter->pmu->is_core.
Example before/after:
$ perf stat -M lpm_miss_lat --metric-only --per-socket -a -- sleep 1
James Clark [Thu, 11 Jun 2026 11:13:46 +0000 (12:13 +0100)]
perf test: Compile named_threads workload with -O0
The work loop relies on the compiler not optimizing it away, although
named_threads_work is not static for that reason, the compiler could
still do it.
Fix it by compiling without optimization. Also add -fno-inline for
consistency and in case anyone wants to look at callstacks.
Fixes: b5dd510be55e8670 ("perf test: Add named_threads workload") Closes: https://lore.kernel.org/all/20260609160001.2739E1F00893@smtp.kernel.org Reported-by: sashiko-bot <sashiko-bot@kernel.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jakub Kicinski [Fri, 12 Jun 2026 00:08:03 +0000 (17:08 -0700)]
Merge tag 'for-net-next-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
core:
- hci_sync: Add support for HCI_LE_Set_Host_Feature [v2]
- SMP: Use AES-CMAC library API
- sockets: convert to getsockopt_iter
- Add SPDX id lines to some source files
drivers:
- btintel_pcie: Support Product level reset
- btintel_pcie: Add support for smart trigger dump
- btintel_pcie: Add 50 ms delay before MAC init on BlazarIW
- btintel_pcie: Separate coredump work from RX work
- btmtk: add event filter to filter specific event
- btrtl: fix RTL8761B/BU broken LE extended scan
- btusb: Add Realtek RTL8922AE VID/PID 0bda/d922
- btusb: Add Realtek RTL8922AE VID/PID 0bda/d923
- btusb: MT7922: Add VID/PID 0e8d/223c
- btusb: MT7925: Add VID/PID 0e8d/8c38
- btusb: Add support for TP-Link TL-UB250
- btusb: Add Mercusys MA530 for Realtek RTL8761BUV
- btusb: Add TP-Link UB600 for Realtek 8761BUV
- btusb: Add support for Intel Lizard Peak 2 (0x8087:0x0040)
- btusb: Add USB ID 2c4e:0128 for Mercusys MA60XNB
- btusb: MT7925: Add VID/PID 13d3/3609
* tag 'for-net-next-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (49 commits)
Bluetooth: btintel_pcie: Separate coredump work from RX work
Bluetooth: btmtksdio: fix infinite loop in btmtksdio_txrx_work()
Bluetooth: qca: Add BT FW build version to kernel log
Bluetooth: vhci: validate devcoredump state before side effects
Bluetooth: L2CAP: validate connectionless PSM length
Bluetooth: hci: validate codec capability element length
Bluetooth: L2CAP: Fix UAF in channel timeout by holding conn ref
Bluetooth: btintel_pcie: Load IOSF debug regs by controller variant
Bluetooth: btintel_pcie: Add 50 ms delay before MAC init on BlazarIW
Bluetooth: Add SPDX id lines to some source files
Bluetooth: btintel_pcie: Add support for smart trigger dump
Bluetooth: hci_h5: reset hci_uart::priv in the close() method
Bluetooth: btusb: clean up probe error handling
Bluetooth: btusb: fix wakeup irq devres lifetime
Bluetooth: btusb: fix wakeup source leak on probe failure
Bluetooth: btusb: fix use-after-free on marvell probe failure
Bluetooth: btusb: fix use-after-free on registration failure
Bluetooth: btmtk: fix URB leak in alloc_mtk_intr_urb error path
Bluetooth: hci_core: Fix UAF in hci_unregister_dev()
Bluetooth: hci_event: fix simultaneous discovery stuck in FINDING
...
====================
Jakub Kicinski [Fri, 12 Jun 2026 00:06:55 +0000 (17:06 -0700)]
Merge tag 'nfc-net-next-20260611' of https://codeberg.org/linux-nfc/linux
David Heidelberg says:
====================
NFC updates for net-next 20260611
- nxp-nci: Add ISO15693 support
- nxp-nci: treat -ENXIO in IRQ thread as no data available
- nci: uart: Constify struct tty_ldisc_ops
- trf7970a: fix comment typos
- Use named initializers for struct i2c_device_id
- MAINTAINERS: Update address for David Heidelberg
* tag 'nfc-net-next-20260611' of https://codeberg.org/linux-nfc/linux:
MAINTAINERS: Update address for David Heidelberg
nfc: Use named initializers for struct i2c_device_id
nfc: nxp-nci: treat -ENXIO in IRQ thread as no data available
nfc: nxp-nci: Add ISO15693 support
nfc: nci: uart: Constify struct tty_ldisc_ops
nfc: trf7970a: fix comment typos
====================
Chunkai Deng [Tue, 26 May 2026 03:38:03 +0000 (11:38 +0800)]
dt-bindings: mailbox: qcom: Add IPCC support for Maili Platform
Document the Inter-Processor Communication Controller on the Qualcomm
Maili Platform, which will be used to route interrupts across various
subsystems found on the SoC.
====================
tipc: fix netlink gate and receive-path bugs
This is v4 of the public TIPC series. The only change from v3 is in
patch 1: TIPC_NL_MEDIA_SET now uses GENL_UNS_ADMIN_PERM like the other
mutators, instead of GENL_ADMIN_PERM, so the whole series uses the
namespace-aware CAP_NET_ADMIN check that matches the legacy TIPC netlink
path. Patches 2 and 3 are unchanged.
Patch 1 gives the TIPCv2 mutating generic-netlink operations the admin
gate the legacy API already has, so a local unprivileged process can no
longer change TIPC state. Patch 2 drops CONN_ACK messages that
acknowledge more outstanding sends than exist, preventing the
snt_unacked underflow. Patch 3 rejects peer bindings with lower > upper,
which would otherwise leak binding-table memory.
====================
tipc: reject inverted service ranges from peer bindings
tipc_update_nametbl() inserts a binding advertised by a peer node using
the lower and upper service-range bounds taken directly from the wire,
without checking that lower <= upper. The local bind path validates the
ordering (tipc_uaddr_valid()), but the name-distribution path does not.
A binding with lower > upper is inserted at the far end of the
service-range rbtree (keyed on lower) where no lookup or withdrawal can
ever match it (service_range_foreach_match() requires sr->lower <= end).
The publication, its service_range node and the augmented rbtree entry
are then leaked for the lifetime of the namespace, and there is no
per-peer cap equivalent to TIPC_MAX_PUBL on locally created bindings.
Reject inverted ranges in the network path as well. A peer node can
otherwise leak unbounded binding-table memory by sending PUBLICATION
items with lower > upper.
Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table") Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20260610124003.3831170-4-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tipc_sk_conn_proto_rcv() subtracts the peer-supplied connection ack count
from the unsigned 16-bit send counter snt_unacked without checking that it
does not exceed the number of messages actually outstanding:
tsk->snt_unacked -= msg_conn_ack(hdr);
msg_conn_ack() is read straight from a received CONN_MANAGER/CONN_ACK
message. If the ack count is larger than snt_unacked, the subtraction
wraps to a near-maximum value, leaving tsk_conn_cong() permanently true
and starving the connection of further transmits.
Validate the ACK count at the start of the CONN_ACK block and drop the
message if it acknowledges more messages than are outstanding. A peer (or,
for a local connection, the connected peer socket) can otherwise wedge a
TIPC connection's send side by sending an oversized connection ack.
tipc: require net admin for TIPCv2 netlink mutators
TIPCv2 registers mutating generic-netlink operations without admin
permission flags. Generic netlink only checks CAP_NET_ADMIN when an
operation sets GENL_ADMIN_PERM or GENL_UNS_ADMIN_PERM, so a local
unprivileged process can currently change TIPC state through commands
such as TIPC_NL_NET_SET, TIPC_NL_KEY_SET, TIPC_NL_KEY_FLUSH, and
bearer enable/disable.
The legacy TIPC netlink API already checks netlink_net_capable(...,
CAP_NET_ADMIN) for administrative commands. Give the TIPCv2 mutators
the equivalent generic-netlink gate. Use GENL_UNS_ADMIN_PERM, which
maps to the same namespace-aware CAP_NET_ADMIN check that
netlink_net_capable() performs, so the behaviour matches the legacy
path and keeps working for CAP_NET_ADMIN holders in a non-initial user
namespace (containers).
A QEMU/KASAN repro run as uid/gid 65534 with zero effective
capabilities previously succeeded in changing the network id and node
identity, setting and flushing key material, and enabling/disabling a
UDP bearer. With this patch applied the same operations fail with
-EPERM.
Lorenzo Bianconi [Wed, 10 Jun 2026 13:25:13 +0000 (15:25 +0200)]
net: airoha: simplify WAN device check in airoha_dev_init()
airoha_register_gdm_devices() iterates eth->ports[] in order, so GDM2's
netdev is always registered before GDM3/GDM4. This means the explicit
check for eth->ports[1] && eth->ports[1]->devs[0] is a redundant
special-case of what airoha_get_wan_gdm_dev() already covers, since
GDM2 is always marked as WAN during its own ndo_init.
Remove the redundant check and rely solely on airoha_get_wan_gdm_dev()
which handles both the GDM2-present and GDM2-absent cases.
Victor Nogueira [Wed, 10 Jun 2026 13:28:24 +0000 (10:28 -0300)]
net/sched: sch_hfsc: Don't make class passive twice
update_vf() is called from two places for the same class during a single
dequeue when the class's child qdisc (e.g. codel/fq_codel) drops its last
packets while dequeuing:
1. The child calls qdisc_tree_reduce_backlog(), which, now that the child
is empty, invokes hfsc_qlen_notify() -> update_vf(cl, 0, 0) and turns
the class passive (cl_nactive is decremented up the hierarchy).
2. hfsc_dequeue() then calls update_vf(cl, qdisc_pkt_len(skb), cur_time)
to charge the dequeued bytes.
On the second call the class is already passive, but its child qdisc is
still empty, so update_vf() arms go_passive again:
The leaf is then skipped by the cl_nactive == 0 check inside the loop,
which does not clear go_passive, so the stale go_passive propagates to the
parent and decrements its cl_nactive a second time. A parent that still
has other active children is driven to cl_nactive == 0 and removed from
the vttree, even though those siblings are still backlogged. They are
never dequeued again and the qdisc stalls.
Fix this by only arming go_passive when the class is actually active, so an
already-passive class no longer triggers a second passive transition. The
byte accounting (cl->cl_total += len) still runs for every ancestor, so
dequeued bytes continue to be counted exactly once.
Fixes: 51eb3b65544c ("sch_hfsc: make hfsc_qlen_notify() idempotent") Reported-by: Anirudh Gupta <anirudhrudr@gmail.com> Closes: https://lore.kernel.org/netdev/CAN2cbVe79oj0O9==m4+4x3v+O+qzRagA=2=wkrp9i9=CqYvyZA@mail.gmail.com/ Tested-by: Anirudh Gupta <anirudhrudr@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260610132824.3027549-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann [Tue, 9 Jun 2026 21:22:40 +0000 (23:22 +0200)]
net: Stop leased rxq before uninstalling its memory provider
netif_rxq_cleanup_unlease() tears down the memory provider that was
installed on a physical RX queue through a netkit queue lease. It
currently revokes the provider's DMA mappings before stopping the
physical queue:
This inverts the ordering used by the regular teardown paths (normal
device unregister and the io_uring zcrx close path), which stop the
queue before revoking the provider's mappings.
With the physical queue still live, its NAPI can keep consuming
net_iov entries from the page_pool alloc cache after the
__netif_mp_uninstall_rxq() has already cleared their dma_addr,
opening a window for the device to DMA to a stale or zero address.
Fix it by swapping the two calls so the queue is stopped (and its
NAPI quiesced) before the provider is uninstalled. No functional
regression was observed across repeated runs of the nk_qlease.py
HW selftest, which exercises the lease teardown path; this was
tested against fbnic QEMU emulation.
Fixes: 5602ad61ebee ("net: Proxy netif_mp_{open,close}_rxq for leased queues") Reported-by: Ahmed Abdelmoemen <ahmedabdelmoumen05@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Wei <dw@davidwei.uk> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260609212240.677889-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wentao Liang [Tue, 9 Jun 2026 08:47:30 +0000 (08:47 +0000)]
mlxsw: fix refcount leak in mlxsw_sp_vrs_lpm_tree_replace()
When mlxsw_sp_vrs_lpm_tree_replace() fails after replacing some VRs,
the error rollback loop does not correctly revert the preceding
replacements. The loop decrements the index but fails to update the
vr pointer, which still points to the VR that caused the failure. As
a result, the condition and the rollback call always operate on the
same VR, potentially calling mlxsw_sp_vr_lpm_tree_replace() multiple
times on it while never rolling back the earlier VRs. Those VRs
continue to hold a reference to new_tree acquired via
mlxsw_sp_lpm_tree_hold(), leaking the reference count of new_tree.
Fix by reinitializing vr inside the error loop with the updated index:
vr = &mlxsw_sp->router->vrs[i];
so that the loop correctly iterates over all VRs that were actually
replaced.
Cc: stable@vger.kernel.org Fixes: fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260609084730.215732-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wentao Liang [Tue, 9 Jun 2026 08:37:09 +0000 (08:37 +0000)]
mlxsw: fix refcount leak in mlxsw_sp_port_lag_join()
When mlxsw_sp_port_lag_index_get() fails, mlxsw_sp_port_lag_join()
returns an error without releasing the lag reference obtained by
the earlier mlxsw_sp_lag_get(). All other error paths in the
function jump to the cleanup label that ends with
mlxsw_sp_lag_put(), so this is a single missed release.
Fix the leak by replacing the bare 'return err' with a goto to the
existing error cleanup label, which will drop the reference safely.
Cc: stable@vger.kernel.org Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260609083709.209743-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
ksz87xx: add support for low-loss cable equalizer errata
This patch implements the KSZ87xx short cable erratum
described in Microchip document DS80000687C for KSZ87xx switches
and the following support article:
Microchip documents two independent mechanisms to mitigate this issue:
adjusting the receiver low‑pass filter bandwidth and reducing the DSP
equalizer initial value. These registers are located in the switch’s
internal LinkMD table and cannot be accessed directly through a
stand‑alone PHY driver.
To keep the PHY‑facing API clean, this series models the erratum handling
as vendor‑specific Clause 22 PHY registers, virtualized by the KSZ8 DSA
driver. Accesses are intercepted by ksz8_r_phy() / ksz8_w_phy() and
translated into the appropriate indirect LinkMD register writes. The
erratum affects the shared PHY analog front‑end and therefore applies
globally to the switch.
Based on review feedback, the user‑visible interface is kept deliberately
simple and predictable:
- A boolean “short‑cable” PHY tunable applies a documented and
conservative preset (LPF bandwidth 62MHz, DSP EQ initial value 0).
This is the recommended KISS interface for the common short‑cable
scenario.
- Two additional integer PHY tunables allow advanced or experimental
tuning of the LPF bandwidth and the DSP EQ initial value. These
controls are orthogonal, have no ordering requirements, and simply
override the corresponding setting when written.
The tunables act as simple setters with no implicit state machine or
invalid combinations, avoiding surprises for userspace and not relying
on extended error reporting or netlink ethtool support.
This series contains:
1. Support for the KSZ87xx low‑loss cable erratum in the KSZ8 DSA driver,
including the short‑cable preset and orthogonal tuning controls.
2. Addition of vendor‑specific PHY tunable identifiers for the
short‑cable preset, LPF bandwidth, and DSP EQ initial value.
3. Exposure of these tunables through the Micrel PHY driver via
get_tunable / set_tunable callbacks.
This version follows the design agreed upon during v3 review and
reworks the interface accordingly.
====================
Add support for the KSZ87xx low-loss cable PHY tunables in the Micrel
PHY driver by implementing get_tunable and set_tunable callbacks.
These callbacks expose vendor-specific PHY tunables used to control the
KSZ87xx embedded PHY receiver behavior when operating with short or
low-loss Ethernet cables. The tunables provide:
- a boolean short-cable preset applying known good settings;
- an integer LPF bandwidth control;
- an integer DSP EQ initial value control.
The Micrel PHY driver forwards these tunables via standard phy_read() /
phy_write() operations, which are virtualized by the KSZ8 DSA driver and
translated into the appropriate indirect switch register accesses.
This patch implements the KSZ87xx short cable erratum
described in Microchip document DS80000687C for KSZ87xx switches
and the following support article:
KSZ87xx devices require a workaround for the Module 3 low-loss cable
condition, controlled through the switch TABLE_LINK_MD_V indirect
registers.
This change models the erratum handling as vendor-specific Clause 22 PHY
registers, virtualized by the KSZ8 DSA driver and accessed via
ksz8_r_phy() / ksz8_w_phy(). The following controls are provided:
- A boolean “short-cable” preset, which applies a documented and
conservative configuration (LPF 62 MHz bandwidth and DSP EQ initial
value 0), and is the recommended interface for typical use cases.
- Separate LPF bandwidth and DSP EQ initial value controls intended for
advanced or experimental tuning. These are orthogonal and independent,
and override the corresponding settings without requiring any specific
ordering.
The preset and tunables act as simple setters with no implicit state
machine or invalid combinations, keeping the API predictable and aligned
with the KISS principle.
The erratum affects the shared PHY analog front-end and therefore applies
globally to the switch.
Minxi Hou [Tue, 9 Jun 2026 16:57:25 +0000 (00:57 +0800)]
selftests/net/openvswitch: add flow modify test
Add mod_flow() and the mod-flow CLI command to ovs-dpctl.py, exercising
OVS_FLOW_CMD_SET. Add test_flow_set which first modifies an existing
flow with new actions and verifies the change via traffic, then modifies
the same flow without actions and verifies the kernel handles the
no-actions case gracefully.
The no-actions path is unreachable from userspace OVS tools (dpctl
mod-flow requires actions) but reachable via raw netlink. This is the
code path where Adrian Moreno found a possible kfree_skb of ERR_PTR
when reply allocation fails after locking.
Make parse() skip OVS_FLOW_ATTR_ACTIONS when actstr is None so the
kernel enters the post-lock allocation branch in ovs_flow_cmd_set().
After the no-actions set, verify via dump-flows that the flow retained
its drop action.
Nicolai Buchwitz [Wed, 10 Jun 2026 11:48:35 +0000 (13:48 +0200)]
net: bcmgenet: convert RX path to page_pool
Replace the per-packet __netdev_alloc_skb() + dma_map_single() in the
RX path with page_pool. SKBs are built from pool pages via
napi_build_skb() with skb_mark_for_recycle() so the network stack
returns pages to the pool, and DMA mapping happens once per page
instead of once per packet.
Reject HW-reported lengths smaller than the RSB so a runt cannot
underflow the SKB build path.
Drop the now-unused priv->rx_buf_len field and the rx_dma_failed soft
MIB counter (nothing increments it after the conversion). This
removes the "rx_dma_failed" entry from ethtool -S, which is a
user-visible change for monitoring tools that key on stat names.
net: airoha: move get_sport() callback at the beginning of airoha_enable_gdm2_loopback()
Move the get_sport() callback invocation at the beginning of
airoha_enable_gdm2_loopback() routine in order to avoid leaving the
hardware in a partially configured state if get_sport() fails.
Previously, get_sport() was called after GDM2 forwarding, loopback,
channel, length, VIP and IFC registers had already been programmed.
A failure at that point would return an error leaving GDM2 with
loopback enabled but WAN port, PPE CPU port and flow control mappings
not configured.
Performing the get_sport() lookup before any register write guarantees
the routine either completes the full configuration sequence or exits
with no side effects on the hardware.
====================
mptcp: pm: drop TCP TS with ADD_ADDRv6 + port
Up to this series, it was possible to add a "signal" MPTCP endpoint with
an IPv6 address and a port, or to directly request to send an ADD_ADDR
with a v6 address and a port, but the expected ADD_ADDR wasn't sent when
TCP timestamps was used for the connection.
In fact, such signalling option cannot be sent when TCP timestamps is
used due to a lack of option space: the limit is at 40 bytes, and, with
padding, TCP timestamps is taking 12 bytes, while an ADD_ADDR IPv6 +
port is taking 30 bytes. The selected solution here is to simply drop
the TCP timestamps option when such ADD_ADDR of 30 bytes needs to be
sent.
- Patches 1-3: small cleanups to avoid computing ADD/RM_ADDR twice.
- Patches 4-7: the new feature, controlled by a new sysctl knob.
- Patch 8: extra checks in the MPTCP Join selftests.
- Patches 9-15: A bunch of refactoring: renamed confusing helpers and
variables, and prevent future misused functions.
====================
mptcp_pm_announced_del_timer() removes the matched ADD_ADDR entry (if
found) from the ADD_ADDR list only if check_id is false. That's
dangerous, and not clear, because it means the caller should be free the
entry only in some cases, and it easy to miss that.
Instead, make it static, and call it from mptcp_pm_add_addr_echoed,
which is the only other case where mptcp_pm_add_addr_del_timer should be
called with check_id set to true. Bonus with that: a second call to
mptcp_pm_add_addr_lookup_by_addr() can be avoided.
Note that instead of adding the signature above to avoid a compilation
issue because this helper is called before the definition of the
function, the whole helper is moved above where it is first called. Its
content is untouched, except the addition of the 'static' keyboard.
Note that the signature is added above: it is easier than moving the
code around, because this helper depends on mptcp_pm_schedule_work which
is declared below.
While at it, explicitly mark it as to be called while pm->lock is held.
Similar to the two previous commits, using the 'add' prefix is
confusing, also confirmed by [1].
Now that the structure has been renamed to include 'add_addr' in its
name, easier to know the timer is linked to the ADD_ADDR, no need to
add the confusing prefix, or an unneeded longer one.
While at it, also update the ADD_ADDR timer helper to clearly specify it
is linked to ADD_ADDR, and it is not there to add a new timer.
Similar to the previous commit, only using the 'add' or 'anno' prefixes
is confusing -- generally associated to the action of adding something,
or the Latin name for "year" -- and lack of uniformity.
This has been causing issues in the past, e.g. del_add_timer seemed to
suggest the goal is to delete a previously added timer.
Instead, use the mptcp_pm_announced_ prefix.
While at it, slightly improves some helpers:
- mptcp_lookup_anno_list_by_saddr: no need to specify what is used to do
the lookup: mptcp_pm_announced_lookup.
- mptcp_pm_sport_in_anno_list: it doesn't just compare the port, but the
whole address linked to the sublow: mptcp_pm_announced_has_ssk.
- mptcp_pm_alloc_anno_list: it allocates one item of the list, not a
whole list: mptcp_pm_announced_alloc.
Using only the 'add' prefix is confusing: does it refer to a generic
added entry or address, or specifically to ADD_ADDRs. Using add_addr
removes this confusion.
Similar to most places in the MPTCP code. So instead of passing the
subflow list and use list_for_each_entry(subflow, list, node), pass the
msk and use mptcp_for_each_subflow(msk, subflow).
That's clearer and more uniform with the rest.
While at it, add 'pm_' prefix for the exported one to easily identify
the origin. Plus replace 'lookup' by 'has', because a bool is returned.
Before, they were only checked on demand, but it seems better to check
them each time received ADD_ADDRs are checked.
Errors are only reported when the counter exists, and the value is not
the expected one. This is similar to what is done in chk_join_nr: it
reduces the output, and avoids a lot of 'skip' when validating older
kernels. Also here, some tests need to adapt the default expected
counters, e.g. when ADD_ADDR echo are dropped on the reception side, or
it is not possible to send an ADD_ADDR due to the limited option space.