git.ipfire.org Git - thirdparty/linux.git/log

s390/percpu: Use new percpu code section for arch_this_cpu_add_return()

Convert arch_this_cpu_add_return() to make use of the new percpu code
section infrastructure.

With this the text size of the kernel image is reduced by ~4k
(defconfig). Also 66 generated preempt_schedule_notrace() function
calls within the kernel image (modules not counted) are removed.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/percpu: Use new percpu code section for arch_this_cpu_add()

Convert arch_this_cpu_add() to make use of the new percpu code section
infrastructure.

With this the text size of the kernel image is reduced by ~76kb
(defconfig). Also more than 5300 generated preempt_schedule_notrace()
function calls within the kernel image (modules not counted) are removed.

With:

DEFINE_PER_CPU(long, foo);
void bar(long a) { this_cpu_add(foo, a); }

Old arch_this_cpu_add() looks like this:

00000000000000c0 <bar>:
  c0:   c0 04 00 00 00 00       jgnop   c0 <bar>
  c6:   eb 01 03 a8 00 6a       asi     936,1
  cc:   c4 18 00 00 00 00       lgrl    %r1,cc <bar+0xc>
                        ce: R_390_GOTENT        foo+0x2
  d2:   e3 10 03 b8 00 08       ag      %r1,952
  d8:   eb 22 10 00 00 e8       laag    %r2,%r2,0(%r1)
  de:   eb ff 03 a8 00 6e       alsi    936,-1
  e4:   a7 a4 00 05             jhe     ee <bar+0x2e>
  e8:   c0 f4 00 00 00 00       jg      e8 <bar+0x28>
                        ea: R_390_PC32DBL       __s390_indirect_jump_r14+0x2
  ee:   c0 f4 00 00 00 00       jg      ee <bar+0x2e>
                        f0: R_390_PLT32DBL      preempt_schedule_notrace+0x2

New arch_this_cpu_add() looks like this:

00000000000000c0 <bar>:
  c0:   c0 04 00 00 00 00       jgnop   c0 <bar>
  c6:   c4 38 00 00 00 00       lgrl    %r3,c6 <bar+0x6>
                        c8: R_390_GOTENT        foo+0x2
  cc:   b9 04 00 43             lgr     %r4,%r3
  d0:   eb 00 43 c0 00 52       mviy    960(%r0),4
  d6:   e3 40 03 b8 00 08       ag      %r4,952
  dc:   eb 52 40 00 00 e8       laag    %r5,%r2,0(%r4)
  e2:   eb 00 03 c0 00 52       mviy    960,0
  e8:   c0 f4 00 00 00 00       jg      e8 <bar+0x28>
                        ea: R_390_PC32DBL       __s390_indirect_jump_r14+0x2

Note that the conditional function call is removed.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/percpu: Add missing do { } while (0) constructs

Add missing do { } while (0) constructs in order to avoid potential
build failures.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260319120503.4046659-1-hca%40linux.ibm.com
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/percpu: Infrastructure for more efficient this_cpu operations

With the intended removal of PREEMPT_NONE this_cpu operations based on
atomic instructions, guarded with preempt_disable()/preempt_enable() pairs
become more expensive: the preempt_disable() / preempt_enable() pairs are
not optimized away anymore during compile time.

In particular the conditional call to preempt_schedule_notrace() after
preempt_enable() adds additional code and register pressure.

E.g. this simple C code sequence

DEFINE_PER_CPU(long, foo);
long bar(long a) { return this_cpu_add_return(foo, a); }

generates this code:

  11a976:       eb af f0 68 00 24       stmg    %r10,%r15,104(%r15)
  11a97c:       b9 04 00 ef             lgr     %r14,%r15
  11a980:       b9 04 00 b2             lgr     %r11,%r2
  11a984:       e3 f0 ff c8 ff 71       lay     %r15,-56(%r15)
  11a98a:       e3 e0 f0 98 00 24       stg     %r14,152(%r15)
  11a990:       eb 01 03 a8 00 6a       asi     936,1            <- __preempt_count_add(1)
  11a996:       c0 10 00 d2 ac b5       larl    %r1,1b70300      <- address of percpu var
  11a9a0:       e3 10 23 b8 00 08       ag      %r1,952          <- add percpu offset
  11a9a6:       eb ab 10 00 00 e8       laag    %r10,%r11,0(%r1) <- atomic op
  11a9ac:       eb ff 03 a8 00 6e       alsi    936,-1           <- __preempt_count_dec_and_test()
  11a9b2:       a7 54 00 05             jnhe    11a9bc <bar+0x4c>
  11a9b6:       c0 e5 00 76 d1 bd       brasl   %r14,ff4d30 <preempt_schedule_notrace>
  11a9bc:       b9 e8 b0 2a             agrk    %r2,%r10,%r11
  11a9c0:       eb af f0 a0 00 04       lmg     %r10,%r15,160(%r15)
  11a9c6        07 fe                   br      %r14

Even though the above example is more or less the worst case, since the
branch to preempt_schedule_notrace() requires a stackframe, which
otherwise wouldn't be necessary, there is also the conditional jnhe branch
instruction.

Get rid of the conditional branch with the following code sequence:

  11a8e6:       c0 30 00 d0 c5 0d       larl    %r3,1b33300
  11a8ec:       b9 04 00 43             lgr     %r4,%r3
  11a8f0:       eb 00 43 c0 00 52       mviy    960,4
  11a8f6:       e3 40 03 b8 00 08       ag      %r4,952
  11a8fc:       eb 52 40 00 00 e8       laag    %r5,%r2,0(%r4)
  11a902:       eb 00 03 c0 00 52       mviy    960,0
  11a908:       b9 08 00 25             agr     %r2,%r5
  11a90c        07 fe                   br      %r14

The general idea is that this_cpu operations based on atomic instructions
are guarded with mviy instructions:

- The first mviy instruction writes the register number, which contains
  the percpu address variable to lowcore. This also indicates that a
  percpu code section is executed.

- The first instruction following the mviy instruction must be the ag
  instruction which adds the percpu offset to the percpu address register.

- Afterwards the atomic percpu operation follows.

- Then a second mviy instruction writes a zero to lowcore, which indicates
  the end of the percpu code section.

- In case of an interrupt/exception/nmi the register number which was
  written to lowcore is copied to the exception frame (pt_regs), and a zero
  is written to lowcore.

- On return to the previous context it is checked if a percpu code section
  was executed (saved register number not zero), and if the process was
  migrated to a different cpu. If the percpu offset was already added to
  the percpu address register (instruction address does _not_ point to the
  ag instruction) the content of the percpu address register is adjusted so
  it points to percpu variable of the new cpu.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/zcrypt: Replace get_zeroed_page() with kzalloc()

zcrypt_rng_device_add() allocates a buffer for the software random
number generator data cache.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/trng: Replace __get_free_page() with kmalloc()

trng_read() allocates a temporary staging buffer for CPACF TRNG
random data before copying it to userspace.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/qeth: Replace get_zeroed_page() with kzalloc()

qeth_get_trap_id() allocates a temporary buffer for STSI system
information queries used to build trap identification strings.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/hvc_iucv: Replace get_zeroed_page() with kzalloc()

hvc_iucv_alloc() allocates a send staging buffer for accumulating
outbound terminal characters before they are copied into a separate
IUCV message buffer for transmission to the hypervisor. The staging
buffer itself is never passed to any IUCV function.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/dasd: Replace get_zeroed_page() with kzalloc()

DASD driver uses get_zeroed_page() to allocate pages for the Extended Error
Reporting software ring buffer and for a scratch buffer for formatting
sense dump diagnostic text.

These buffers can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of get_zeroed_page() with kzalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/con3270: Replace __get_free_page() with kmalloc()

con3270_alloc_view() allocates a staging buffer used to assemble
3270 datastream content before it is copied into channel program
requests.

This buffer can be allocated with kmalloc() as there's nothing special
about it to go directly to the page allocator.

kmalloc() provides a better API that does not require ugly casts and
kfree() does not need to know the size of the freed object.

Performance difference between kmalloc() and __get_free_pages() is not
measurable as both allocators take an object/page from a per-CPU list for
fast path allocations.

For the slow path the performance is anyway determined by the amount of
reclaim involved rather than by what allocator is used.

Replace use of __get_free_page() with kmalloc() and free_page() with
kfree().

Link: https://lore.kernel.org/all/635405e4-9423-4a25-a6e7-e03c8ea0bcbe@redhat.com
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/fpu: Move GR_NUM / VX_NUM macros to separate header file

Move GR_NUM / VX_NUM macros to separate insn-common-asm.h header file
so they can be reused for non-fpu insn constructs.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/fpu: Shorten GR_NUM / VX_NUM macros

Use the ".irp" directive to get rid of all the repeated ".ifc" usages
in fpu-insn-asm.h.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

s390/ap/zcrypt: Rearrange fields within AP and zcrypt structs

Rearrange some fields within AP and zcrypt structs to reduce
memory consumption and unused holes with the help of pahole
analysis of the code.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Finn Callies <fcallies@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>

gpu: nova-core: Hopper/Blackwell: select FSP Chain of Trust version

The FSP Chain of Trust handshake is versioned: Hopper speaks version 1
and Blackwell speaks version 2. Provide the version through the FSP HAL
so the boot message carries the value FSP expects, and so chipsets that
do not use FSP need not express a version at all.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260603-b4-blackwell-v13-5-d9f3a06939e0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging

FSP exchanges are request/response: the driver sends an MCTP/NVDM
message and must match the reply against the request before acting on
it. Add the synchronous send-and-wait path that validates the response
transport and message headers and confirms the reply corresponds to the
request that was sent.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260603-b4-blackwell-v13-4-d9f3a06939e0@nvidia.com
[acourbot: make `MessageToFsp` private.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>

gpu: nova-core: add MCTP/NVDM protocol types for firmware communication

Add the MCTP (Management Component Transport Protocol) and NVDM (NVIDIA
Data Model) wire-format types used for communication between the kernel
driver and GPU firmware processors.

This includes typed MCTP transport headers, NVDM message headers, and
NVDM message type identifiers. Both the FSP boot path and the upcoming
GSP RPC message queue share this protocol layer.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260603-b4-blackwell-v13-3-d9f3a06939e0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure

FSP communication uses a pair of non-circular queues in the FSP
falcon's EMEM, one for messages from the driver to FSP and one for
replies, with the driver polling for response data. Add the queue
registers and the low-level helpers used by the higher-level FSP
message layer.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260603-b4-blackwell-v13-2-d9f3a06939e0@nvidia.com
[acourbot: align register fields names with OpenRM.]
[acourbot: represent registers as arrays of 8 instances, as per OpenRM.]
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>

gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations

Add external memory (EMEM) read/write operations to the GPU's FSP falcon
engine. These operations use Falcon PIO (Programmed I/O) to communicate
with the FSP through indirect memory access.

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Link: https://patch.msgid.link/20260603-b4-blackwell-v13-1-d9f3a06939e0@nvidia.com
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>

selftests: livepatch: set LC_ALL=C to fix locale-dependent test failure

When executing the command
"make -C tools/testing/selftests TARGETS=livepatch run_tests",
the following error message was reported.

TEST: livepatch interaction with ftrace_enabled sysctl ... not ok
...
livepatch: sysctlo
: setting key "kernel.ftrace_enabled": Device or resource busy
livepatch: sysctl: setting key "kernel.ftrace_enabled": 设备或资源忙
...
ERROR: livepatch kselftest(s) failed
not ok 5 selftests: livepatch: test-ftrace.sh # exit=1

To fix it, set LC_ALL=C.

Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260527095929.1504032-1-maqianga@uniontech.com
Signed-off-by: Petr Mladek <pmladek@suse.com>

KVM: riscv: Fast-path dirty logging write faults

With dirty logging enabled, guest writes often fault on an existing 4K
G-stage leaf that was write-protected only for dirty tracking. The slow
path still performs the full fault handling flow and takes mmu_lock for
write, even though the page-table shape does not change.

x86 handles the analogous case in its fast page fault path by atomically
making a writable SPTE writable again when the fault is only a
write-protection fault. Add the same style of fast path for RISC-V. If a
write fault hits an existing 4K leaf in a writable dirty-log memslot,
mark the page dirty and atomically set the PTE writable and dirty under
the read side of mmu_lock.

The dirty bitmap is updated before the PTE becomes writable again. The
PTE D bit is also set so systems that trap on a clear D bit do not fall
back to the slow path for a writable but clean PTE.

Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-6-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>

KVM: riscv: Update G-stage PTE permissions atomically

When a fault hits an existing G-stage leaf with the same PFN, KVM only
needs to update the PTE permissions. This path will be used by read-side
fault handling, so it must not overwrite a concurrent PTE update.

Use the cmpxchg helper when relaxing permissions on an existing leaf,
following the same concurrency model used by x86 for atomic SPTE
permission updates. Retry if another CPU changed the PTE first, and use
cpu_relax() while spinning.

Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-5-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>

KVM: riscv: Add a G-stage PTE cmpxchg helper

Permission-only G-stage PTE updates can run in parallel once they are
moved to the read side of mmu_lock. Plain set_pte() is not enough for
that case because another CPU may update the same PTE first.

x86 handles the same class of SPTE races with cmpxchg-based updates in
its fast page fault and TDP MMU paths. Add a small RISC-V helper for
atomic G-stage PTE updates. The helper reports contention to the caller
and flushes the target range only when the PTE value actually changes.

Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-4-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>

KVM: riscv: Use an rwlock for mmu_lock

RISC-V KVM currently uses a spinlock for mmu_lock. That serializes all
G-stage MMU operations, including permission-only updates that do not
allocate or free page-table pages.

Use KVM's rwlock form of mmu_lock, as x86 and arm64 already do. Keep the
existing map, unmap and teardown paths on the write side. This prepares
RISC-V for read-side handling of G-stage permission updates.

Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-3-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>

KVM: riscv: Rely on common MMU notifier locking

The common KVM invalidation paths call kvm_unmap_gfn_range() with
mmu_lock already held for write.

For the standard MMU notifier path, the call chain is:

  kvm_mmu_notifier_invalidate_range_start()
    kvm_handle_hva_range()
      kvm_unmap_gfn_range()

kvm_mmu_notifier_invalidate_range_start() leaves range.lockless clear.
kvm_handle_hva_range() therefore takes KVM_MMU_LOCK(kvm) before invoking
the handler.

The guest_memfd path has the same locking contract:

  __kvm_gmem_invalidate_begin()
    kvm_mmu_unmap_gfn_range()
      kvm_unmap_gfn_range()

__kvm_gmem_invalidate_begin() explicitly takes KVM_MMU_LOCK(kvm) before
calling kvm_mmu_unmap_gfn_range().

So remove the local trylock and make the common locking contract explicit
with lockdep_assert_held_write() like x86.

Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-2-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>

KVM: selftests: Add a test for gPAT handling in L2

When KVM_X86_QUIRK_NESTED_SVM_SHARED_PAT is disabled, verify that KVM
correctly virtualizes the host PAT MSR and the guest PAT register for
nested SVM guests.

With nested NPT disabled:
* L1 and L2 share the same PAT
* The vmcb12.g_pat is ignored

With nested NPT enabled:
* An invalid g_pat in vmcb12 causes VMEXIT_INVALID
* RDMSR(IA32_PAT) from L2 returns the value of the guest PAT register
* WRMSR(IA32_PAT) from L2 is reflected in vmcb12's g_pat on VMEXIT
* RDMSR(IA32_PAT) from L1 returns the value of the host PAT MSR

Verify that save/restore with the vCPU in guest mode behaves as expected in
both cases, e.g. preserves both hPAT and gPAT when NPT is enabled.

Originally-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
[sean: use even fancier macro shenanigans]
Link: https://patch.msgid.link/20260528231052.404737-1-seanjc@google.com
[sean: avoid use of goto, print skips]
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Add guest_memfd regression test signed offset+size bug

Add a regression (and proof-of-bug) testcase to ensure KVM rejects an
offset+size that would result in a negative value when computed as a signed
64-bit value. KVM had a flaw where it would allow binding a memslot to a
guest_memfd instance even with a wildly out-of-range offset, if the offset
and size were both positive values, but the combined offset+size was
negative.

Use "0x7fffffffffffffffull - page_size", i.e. "INT64_MAX - page_size", for
the offset as the size of the guest_memfd file must be at least page_size
(KVM requires memslots and gmem files to be host page-size aligned). I.e.
"INT64_MAX - page_size + size" is guaranteed to generate an offset+size
that is negative when converted to a signed 64-bit value *and* honors KVM's
alignment requirements.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://patch.msgid.link/20260602170921.1304394-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: selftests: Expand the guest_memfd test macros to allow passing the VM

Expand the gmem test macros to allow passing the VM to testcases, without
needing to plumb the VM into _every_ testcase, as the vast majority of
testcases only need the fd and size.

No functional change intended.

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Link: https://patch.msgid.link/20260602170921.1304394-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: guest_memfd: Treat memslot binding offset+size as unsigned values

When binding a memslot to a guest_memfd file, treat the offset and size as
unsigned values to fix a bug where the sum of the two can result in a false
negative when checking for overflow against the size of the file.  Passing
unsigned values also avoids relying on somewhat obscure checks in other
flows for safety, and tracks the offset and size as they are intended to be
tracked, as unsigned values.

On 64-bit kernels, the number of pages a memslot contains and thus the size
(and offset) of its guest_memfd binding are unsigned 64-bit values.  Taking
the offset+size as an loff_t instead of a uoff_t inadvertently converts
the unsigned value to a signed value if the offset and/or size is massive.

Locally storing the offset and size as signed values is benign in and of
itself (though even that is *extremely* difficult to discern), but
operating on their sum is not.

For the offset, KVM explicitly checks against a negative value, which might
seem like a bug as KVM could incorrectly reject a legitimate binding, but
that's not actually the case as KVM_CREATE_GUEST_MEMFD takes a signed value
for its size, i.e. a would-be-negative offset is also greater than the
maximum possible size of any guest_memfd file.

Regarding the size, while KVM lacks an explicit check for a negative value,
i.e. seemingly has a flawed overflow check, KVM restricts the number of
pages in a single memslot to the largest positive signed 32-bit value:

        if (id < KVM_USER_MEM_SLOTS &&
            (mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES)
                return -EINVAL;

and so that maximum "size" will ever be is 0x7fffffff000.

The sum of the two is, however, problematic.  While the size is restricted
by KVM's memslot logic, the offset is not, i.e. the offset is completely
unchecked until the "offset + size > i_size_read(inode)" check.  If the
offset is the (nearly) largest possible _positive_ value, then adding size
to the offset can result in a signed, negative 64-bit value.  When compared
against the size of the file (guaranteed to be positive), the negative sum
is always smaller, and KVM incorrectly allows the absurd offset.

Opportunistically add missing includes in kvm_mm.h (instead of relying on
its parents).

Fixes: a7800aa80ea4 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
Cc: stable@vger.kernel.org
Cc: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Michael Roth <michael.roth@amd.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Link: https://patch.msgid.link/20260602170921.1304394-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration.

Remove a dead kvm_load_segment_descriptor() declaration, no functional
change intended.

Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260529222223.870923-30-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

RDMA/umem: Fix truncation for block sizes >= 4G

When the iommu is used the linearization of the mapping can give a single
block that is very large split across multiple SG entries.

When __rdma_block_iter_next() reassembles the split SG entries it is
overflowing the 32 bit stack values and computed the wrong DMA addresses
for blocks after the truncation.

Use the right types to hold DMA addresses.

Link: https://patch.msgid.link/r/1-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: a808273a495c ("RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

KVM: x86: Drop defunct vcpu_tsc_khz() declaration

Remove a dead vcpu_tsc_khz() declaration. No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260529222223.870923-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Move async #PF helpers to x86.h (as inlines)

Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
anticipation of extracting the majority of register and MSR specific code
out of x86.c.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260529222223.870923-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Move update_cr8_intercept() to lapic.c

Move update_cr8_intercept() to lapic.c so that it's globally visible
in anticipation of extracting most of the register-specific code out of
x86.c and into a new compilation unit. Opportunistically prefix the
helper kvm_lapic_ to make its role/scope more obvious.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260529222223.870923-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels

Unconditionally return %false for is_64_bit_hypercall() on 32-bit kernels
to guard against incorrectly setting guest_state_protected, and because
in a (very) hypothetical world where 32-bit KVM supports protected guests,
assuming a hypercall was made in 64-bit mode is flat out wrong.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode"

Now that kvm_<reg>_read() are mode aware, i.e. are functionally equivalent
to kvm_register_read(), revert aback to the less verbose versions.

No functional change intended.

This reverts commit 60919eccf6764c71cef31a1afeaa1a36b8e5ab85.

Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: nSVM: Use kvm_rax_read() now that it's mode-aware

Now that kvm_rax_read() truncates the output value to 32 bits if the
vCPU isn't in 64-bit mode, use it instead of the more verbose (and very
technically slower) kvm_register_read().

Note! VMLOAD, VMSAVE, and VMRUN emulation are still technically buggy,
as they can use EAX (versus RAX) in 64-bit mode via an operand size
prefix. Don't bother trying to handle that case, as it would require
decoding the code stream, which would open an entirely different can of
worms, and in practice no sane guest would shove garbage into RAX[63:32]
and then execute VMLOAD/VMSAVE/VMRUN with just EAX.

No functional change intended.

Cc: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Link: https://patch.msgid.link/20260529222223.870923-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Drop non-raw kvm_<reg>_write() helpers

Drop the non-raw, mode-aware kvm_<reg>_write() helpers as there is no
usage in KVM, and in all likelihood there will never be usage in KVM as
use of hardcoded registers in instructions is uncommon, and *modifying*
hardcoded registers is practically unheard of. While there are a few
instructions that modify registers in mode-aware ways, e.g. REP string
and some ENCLS varieties, the odds of KVM needing to emulate such
instructions (outside of the fully emulator) are vanishingly small.

Drop kvm_<reg>_write() to prevent incorrect usage; _if_ a new instruction
comes along that needs to modify a hardcoded register, this can be
reverted.

No functional change intended.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers

Make kvm_<reg>_{read,write}() mode-aware (where the value is truncated to
32 bits if the vCPU isn't in 64-bit mode), and convert all the intentional
"raw" accesses to kvm_<reg>_{read,write}_raw() versions. To avoid
confusion and bikeshedding over whether or not explicit 32-bit accesses
should use the "raw" or mode-aware variants, add and use "e" versions, e.g.
for things like RDMSR, WRMSR, and CPUID, where the instruction uses only
bits 31:0, regardless of mode.

No functional change intended (all use of "e" versions is for cases where
the value is already truncated due to bouncing through a u32).

Cc: Binbin Wu <binbin.wu@linux.intel.com>
Cc: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20260529222223.870923-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Move inlined GPR, CR, and DR helpers from x86.h to regs.h

Move inlined General Purpose Register, Control Register, and Debug
Register helpers from x86.h to the aptly named regs.h, to help trim
down x86.h (and x86.c in the future).

Move *very* select EFER functionality as well, but leave behind the bulk of
EFER handling and all other MSR handling.  There is more than enough MSR
code to carve out msrs.{c,h} in the future.  Give is_long_bit_mode()
special treatment as it's more along the lines of a CR4 bit check, but just
happens to be accessed through an MSR interface.  And more importantly,
because giving regs.h access to is_long_bit_mode() greatly simplifies
dependency chains.

No functional change intended.

Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Rename kvm_cache_regs.h => regs.h

Rename kvm_cache_regs.h to simply regs.h, as the "cache" nomenclature is
already a lie (the file deals with state/registers that aren't cached per
se), and so that more code/functionality can be landed in the header
without making it a truly horrible misnomer.

Deliberately drop the kvm_ prefix/namespace to align with other "local"
headers, and to further differentiate regs.h from the public/global
arch/x86/include/asm/kvm_vcpu_regs.h, which sadly needs to stay in asm/
so that the number of registers can be referenced by kvm_vcpu_arch.

No functional change intended.

Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86: Trace hypercall register *after* truncating values for 32-bit

When tracing hypercalls, invoke the tracepoint *after* truncating the
register values for 32-bit guests so as not to record unused garbage (in
the extremely unlikely scenario that the guest left garbage in a register
after transitioning from 64-bit mode to 32-bit mode).

Fixes: 229456fc34b1 ("KVM: convert custom marker based tracing to event traces")
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode

When getting register values for ENCLS emulation, use kvm_register_read()
instead of kvm_<reg>_read() so that bits 63:32 of the register are dropped
if the guest is in 32-bit mode.

Note, the misleading/surprising behavior of kvm_<reg>_read() being "raw"
variants under the hood will be addressed once all non-benign bugs are
fixed.

Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
Fixes: b6f084ca5538 ("KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)")
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest

Don't truncate RAX when handling a Xen hypercall for a guest with protected
state, as KVM's ABI is to assume the guest is in 64-bit for such cases
(the guest leaving garbage in 63:32 after a transition to 32-bit mode is
far less likely than 63:32 being necessary to complete the hypercall).

Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/20260529222223.870923-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall

Bug the VM if 32-bit KVM attempts to handle a 64-bit hypercall, primarily
so that a future change to set "input" in mode-specific code doesn't
trigger a false positive warn=>error:

  arch/x86/kvm/xen.c:1687:6: error: variable 'input' is used uninitialized
                                    whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
   1687 |         if (!longmode) {
        |             ^~~~~~~~~
  arch/x86/kvm/xen.c:1708:31: note: uninitialized use occurs here
   1708 |         trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
        |                                      ^~~~~
  x86/kvm/xen.c:1687:2: note: remove the 'if' if its condition is always true
   1687 |         if (!longmode) {
        |         ^~~~~~~~~~~~~~
  arch/x86/kvm/xen.c:1677:11: note: initialize the variable 'input' to silence this warning
   1677 |         u64 input, params[6], r = -ENOSYS;
        |                  ^
  1 error generated.

Note, params[] also has the same flaw, but -Wsometimes-uninitialized
doesn't seem to be enforced for arrays, presumably because it's difficult
to avoid false positives on specific entries.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: SVM: Truncate INVLPGA address in compatibility mode

Check for full 64-bit mode, not just long mode, when truncating the
virtual address as part of INVLPGA emulation.  Compatibility mode doesn't
support 64-bit addressing.

Note, the FIXME still applies, e.g. if the guest deliberately targeted
EAX while in 64-bit via an address size override.  That flaw isn't worth
fixing as it would require decoding the code stream, which would open an
entirely different can of worms, and in practice no sane guest would shove
garbage into RAX[63:32] and execute INVLPGA.

Note #2, VMSAVE, VMLOAD, and VMRUN all suffer from the same architectural
flaw of not providing the full linear address in a VMCB exit information
field, because, quoting the APM verbatim:

  the linear address is available directly from the guest rAX register

(VMSAVE, VMLOAD, and VMRUN take a physical address, but their behavior
with respect to rAX is otherwise identical).

Fixes: bc9eff67fc35 ("KVM: SVM: Use default rAX size for INVLPGA emulation")
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://patch.msgid.link/20260529222223.870923-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

Merge tag 'samsung-soc-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/arm

Samsung mach/soc changes for v7.2

Remove raw GPIO number usage from S3C6410-based crag6410 board.

* tag 'samsung-soc-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: s3c: use gpio lookup table for LEDs

Signed-off-by: Linus Walleij <linusw@kernel.org>

HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()

hid_go_cfg_probe() initialises drvdata.go_cfg_setup and schedules it
to run 2 ms later:

    INIT_DELAYED_WORK(&drvdata.go_cfg_setup, &cfg_setup);
    schedule_delayed_work(&drvdata.go_cfg_setup, msecs_to_jiffies(2));

cfg_setup() dereferences drvdata.hdev to issue MCU command requests.
hid_go_cfg_remove() tears down sysfs and stops the HID device, but
never drains the delayed work.  If the device is unbound within the
2 ms scheduling delay (a probe failure rolling back via remove, or a
fast rmmod after probe), the work fires after hid_destroy_device()
has dropped its reference and released the underlying hdev struct,
leaving cfg_setup() with a stale drvdata.hdev pointer.

Mirror the sibling driver hid-lenovo-go-s.c, whose hid_gos_cfg_remove()
already calls cancel_delayed_work_sync() on its analogous work, and
drain go_cfg_setup at the top of hid_go_cfg_remove().  The cancel
must come before guard(mutex)(&drvdata.cfg_mutex) because cfg_setup()
acquires that mutex; reversing the order would deadlock.

Fixes: d69ccfcbc955 ("HID: hid-lenovo-go: Add Lenovo Legion Go Series HID Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Manish Khadka <maskmemanish@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>

Merge tag 'stm32-dt-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into soc/dt

STM32 DT for v7.2, round 1

Highlights:
----------

- MPU:
  - STM32MP13:
    - Enable PHY SSC (Spread Spectrum) on DHCORE DHSBC board.
    - Add board pin documentation stm32mp135f-dk to help user.

  - STMP32MP15:
    -  Protonic:
      - Update MECIOR0 ans MECIOR1 boards:
        - Define ADC channels and GPIO line definitions in board and
  no longer in common file.
        - Fix ADC sampling.

  - STM32MP25:
    - Fix SAI addresses.

* tag 'stm32-dt-for-7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32:
  arm64: dts: st: Fix SAI addresses on stm32mp251
  ARM: dts: stm32: stm32mp15x-mecio1-io: Move expander gpio-line-names to board files
  ARM: dts: stm32: stm32mp15x-mecio1-io: Fix expander gpio line typo
  ARM: dts: stm32: stm32mp15x-mecio1-io: Move gpio-line-names to board files
  ARM: dts: stm32: stm32mp15x-mecio1-io: Fix GPIO names typo
  ARM: dts: stm32: stm32mp15x-mecio1-io: Move divergent mecio1 ADC channels to board files
  ARM: dts: stm32: stm32mp15x-mecio1-io: Fix ADC sampling times
  ARM: dts: stm32: stm32mp15x-mecio1-io: Enable internal ADC reference
  ARM: dts: stm32: add board pin documentation stm32mp135f-dk
  ARM: dts: stm32: Enable PHY SSC on DH STM32MP13xx DHCOR DHSBC board

Signed-off-by: Linus Walleij <linusw@kernel.org>

Merge tag 'samsung-dt64-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt

Samsung DTS ARM64 changes for v7.2

1. Exynos850: Implement proper power off to fully shutdown the board and
   reduce drawn current.

2. ExynosAutov920: Add UFS storage.

3. Add Peter Griffin as a co-maintainer.
   I have multiple subsystems to care of and limited time. Also, with
   joining to SoC team I figured out it is good to plan my succession.
   Or backup.

   Peter shown both time and interest in keeping Samsung Exynos code
   working.  He already works on and maintains Google Tensor SoC, which
   shares a lot with Samsung Exynos processors.
   Considering all this, I proposed Peter to become a co-maintainer here
   (same for pinctrl, which went via different tree).

   I will still be the one handling patches for this and (probably) next
   cycle, but in a further timeframe the roles could reverse with me
   only providing acks or reviews. If this works then depending on other
   duties and amount of work, I might be slowly transitioning to leave
   Samsung SoC maintainership.

* tag 'samsung-dt64-7.2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
  MAINTAINERS: Add Peter Griffin as a co-maintainer of Samsung Exynos SoCs
  arm64: dts: exynos: Add EL2 virtual timer interrupt
  arm64: dts: exynosautov920: enable support for ufs controller
  arm64: dts: exynosautov920: Add syscon hsi2 node
  arm64: dts: exynos850: Add syscon-poweroff node

Signed-off-by: Linus Walleij <linusw@kernel.org>

Merge tag 'tegra-for-7.2-arm-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt

ARM: tegra: Device tree changes for v7.2-rc1

The bulk of this is various improvements for some of the older ASUS and
LG devices, but there's also support for interconnects on Tegra114 to
help improve memory frequency scaling.

* tag 'tegra-for-7.2-arm-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
  ARM: tegra: tf600t: Invert accelerometer calibration matrix
  ARM: tegra: tf600t: Drop backlight regulator
  ARM: tegra: tf600t: Configure panel
  ARM: tegra: transformers: Add connector node for common trees
  ARM: tegra: transformer: Add support for front camera
  ARM: tegra: grouper: Add support for front camera
  ARM: tegra: p880: Lower CPU thermal limit
  ARM: tegra: lg-x3: Set PMIC's RTC address
  ARM: tegra: lg-x3: Complete video device graph
  ARM: tegra: Configure Tegra114 power domains
  ARM: tegra: Add DC interconnections for Tegra114
  ARM: tegra: Add EMC OPP and ICC properties to Tegra114 EMC and ACTMON device-tree nodes
  ARM: tegra: Add #{address,size}-cells to Chromium-based /firmware
  dt-bindings: memory: Document Tegra114 External Memory Controller
  dt-bindings: memory: Document Tegra114 Memory Controller

Signed-off-by: Linus Walleij <linusw@kernel.org>

Merge tag 'tegra-for-7.2-dt-bindings' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/dt

dt-bindings: Changes for v7.2-rc1

This adds a compatible string for an upcoming new chip as well as
changes some maintainership information.

* tag 'tegra-for-7.2-dt-bindings' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
dt-bindings: tegra: pmc: Add Tegra238 compatible
dt-bindings: reserved-memory: Change maintainer for BPMP SHMEM

Signed-off-by: Linus Walleij <linusw@kernel.org>

Merge tag 'rtw-next-2026-06-03' of https://github.com/pkshih/rtw

Ping-Ke Shih says:
==================
rtw-next patches for -next

Pull-request includes many random fixes and new features.

Major changes are listed below:

rtl8xxxu:

* declare supported channel width by firmware report

rtw88:

* validate RX descriptor to avoid malformed data causing warnings

rtw89:

* support USB devices RTL8922AU

* add sysfs entry to show SN and UUID for specific USB devices

* support to switch USB 3.0 mode for higher performance

* add more fields (mainly SIG-A/SIG-B) to radiotap in monitor mode

* offload packed IO to firmware to reduce IO time (for USB devices)

* add debugfs to diagnose BB healthy

* more preparations for RTL8922DE
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211_hwsim: claim DBE capability

Claim DBE capability in UHR MAC capabilities, hostapd will
have to sort out the actual DBE capabilities based on the
EHT capabilities.

Link: https://patch.msgid.link/20260529102644.4db84674e8c2.I8731be8ea589c94ece5623e7e716cbbc03f50466@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: AP: handle DBE for clients

In AP mode, track the BSS non-DBE bandwidth and apply
that to all non-DBE clients, then track OMP updates
from the clients and enable/disable DBE accordingly.

For now don't send a response, clients need to have a
timer anyway (it's up to the driver to set the right
timeout in UHR capabilities.)

Link: https://patch.msgid.link/20260529102644.be84f2b055cc.I4d2c067dfe54c47621d5a872ca07a0e754d6c20f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: parse and apply UHR DBE channel

When a UHR AP has DBE enabled, parse the channel and apply it
to the chandef. Apply for TX only after the OMP response (or
timeout) so that the AP doesn't receive frames with DBE width
before the station completed transition to DBE.

Link: https://patch.msgid.link/20260529102644.cb810f212128.Ife37c2673251346e84e4250b242b31f0895520ab@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: refactor link STA bandwidth update

There's similar code in two places in HT and HE, and we need to add
the same again for UHR. Rename ieee80211_link_sta_rc_update_omi()
to ieee80211_link_sta_update_rc_bw() and move it to sta_info.c and
update existing code that can use it to do so.

Link: https://patch.msgid.link/20260529102644.577c2f304d33.I09df4fce83c4e3e6deddfecbea74ffdbeedb4927@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: Update UHR MAC capabilities to D1.4

There are now 8 more reserved bits in D1.4, update the code
accordingly.

Link: https://patch.msgid.link/20260529102644.6e27c54cfceb.Id395c07ffde286011494fc75190dc6060117436e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: Update UHR PHY capabilities to D1.4

There are new capabilities in D1.4, and some reserved
bits. Update the code accordingly.

Link: https://patch.msgid.link/20260529102644.f146932b21e2.I12bad84157bf809fbe285b79420143b3c456d9d2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: explain ieee80211_determine_chan_mode() parsing

Looking through element parsing behaviour for multi-BSSID
and multi-link, this one seemed odd. Add a comment that
explains why it's written this way.

Link: https://patch.msgid.link/20260529102644.25f75c4df338.I1f1f17cc0ae8e413659654d4bbaa34260ef68e2c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: mlme: allow UHR only with MLO

UHR requires MLO, not just formally but also in order
for the client to understand AP BSS parameter changes,
since the Critical Update Counter is inside the Multi-
Link Element. Require MLO for UHR connections to avoid
otherwise needed complexity such as not enabling any
feature that would require tracking critical updates.

Link: https://patch.msgid.link/20260529102644.43817ce87042.If4562ae9c5ca83339b397d9a344b68631cb17c4a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: always expose multi-link element

During beacon processing, the parser is always called with
a BSS to find the correct multi-BSSID profile (if any) and
therefore never attempts to parse a multi-link element.
This means the code to handle cross-link CSA can effectively
never do anything.

Fix this by parsing the multi-link element in the regular
parser as well.

Fixes: 7ef8f6821d16 ("wifi: mac80211: mlme: handle cross-link CSA")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Link: https://patch.msgid.link/20260529102644.2a74b2659f50.I8f9454bf5e05c419a9b1eb23ecad302a6bf63fbb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: harden cfg80211_defragment_element()

A previous commit changed mac80211 to no longer make wrong
calls to cfg80211_defragment_element() with the element
pointing outside of the buffer. Additionally, harden this
function itself against that and always return -EINVAL in
case the element isn't inside the source buffer.

Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Link: https://patch.msgid.link/20260529102644.198945754054.I5ae8fdebf9008abc6e15d0b0f10c3a7b73d02eab@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: use local ml_basic_elem in parsing

There's no need to store this pointer on the heap, it's
only used in a single function. Move it there. Also
clarify the comment referencing it, ml_basic_elem is
not actually relevant (any more.)

Link: https://patch.msgid.link/20260529102644.50187b7a6ca2.Ifef23bda96651eed0f5cd2c3ecd4817d2fb08af4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: clarify beacon parsing with MBSSID/EMA

When connected to a non-transmitting BSS of multiple BSSID
set with EMA, the correct profile for the connection isn't
always present in the beacon. Indicate this in the parser
and use the information to not check everything in beacon
processing, since the information might not be correct if
taken only from the transmitted BSS.

Link: https://patch.msgid.link/20260529102644.97527a7dfd7b.Iecd0ef578b85a5a0057538cfff5fdff41d19b7ea@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: rename "multi_link_inner" variable

This variable name seems a bit misleading now (I added it
myself a year ago or so), it indicates that the parsing is
happening on the inner elements of a multi-link element.
Rename it to "inside_multilink" to clarify.

Link: https://patch.msgid.link/20260529102644.7ccd55a411cf.I4101e1cfd133a2ce2374340712da8bb1f0292a40@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: clean up return in ieee802_11_find_bssid_profile()

There's no need to define 'profile_len' at the outer scope
and initialize it, move it where needed and just return 0
if nothing can be found.

Link: https://patch.msgid.link/20260529102644.46f25609ddef.I9e651a0018e66953f4fb508f784188e00351c07f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: unify link STA removal in vif link removal

There are multiple cases where interface links are removed
and the station links need to be removed with them, e.g.
in mlme.c we have both received and transmitted multi-link
reconfiguration, doing the two things in different order,
the former deleting STA links when the vif link change may
still fail.

It's also not clear that userspace (hostapd) couldn't, at
least in theory, remove a link from an interface without
removing the station links first, or even leave stations
that aren't MLO-capable, using that link.

Unify this code into ieee80211_vif_update_links() so that
it always happens, always happens in the right order and
is transactional (i.e. failures are handled correctly.)

Link: https://patch.msgid.link/20260529102644.c352f73a4658.I7219a5d72dab2abcecea9b5c52e7eb7a50e68d9b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: ieee80211: define some UHR link reconfiguration frame types

Define some values needed for UHR link reconfiguration frames,
in particular to prepare for UHR mode change request/handling.

Link: https://patch.msgid.link/20260529102644.03029bae6447.If22b0c1e10d9db712dca408a420469b3d385b4ea@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: basic S1G rx rate reporting support

Introduce basic rate encoding/decoding for S1G stas such that the
usermode rx reporting is relevant as it currently uses VHT calculations
which are obviously wildy different to S1G. Sample iw output (with the
associated iw patches applied):

Connected to 0c:bf:74:00:21:c4 (on wlan0)
        SSID: wifi_halow
        freq: 923.500
        RX: 7325230 bytes (4756 packets)
        TX: 190044 bytes (2238 packets)
        signal: -38 dBm
        rx bitrate: 43.3 MBit/s S1G-MCS 9 8MHz short GI S1G-NSS 1
        tx bitrate: 43.3 MBit/s S1G-MCS 9 8MHz short GI S1G-NSS 1
        bss flags:
        dtim period: 1
        beacon int: 100

Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20260602062224.1792985-1-lachlan.hodges@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: qtnfmac: topaz: defer IRQ enabling until IPC init

qtnf_pcie_topaz_probe() currently calls devm_request_irq() and only then
disable_irq(). request_irq() installs the action in the irq core
immediately, so qtnf_pcie_topaz_interrupt() can run before the Topaz
private IRQ consumers are initialized, if the hardware misbehaves.

This window is reachable on a running system as soon as probe has
successfully registered pdev->irq but before qtnf_pcie_init_shm_ipc()
sets shm_ipc_ep_in/out.irq_handler. If an interrupt is delivered in
this interval, qtnf_pcie_topaz_interrupt() calls
qtnf_shm_ipc_irq_handler() for shm_ipc_ep_in/out while their irq_handler
callbacks are still unset, so the driver can observe an early IRQ
before its IPC consumer state is ready.

The issue was found on Linux v6.18.21 by our static analysis tool while
scanning request_irq()/disable_irq() registration-order bugs in
wireless PCIe drivers, and then manually reviewed.

Request the IRQ with IRQF_NO_AUTOEN instead and keep the existing
enable_irq() in qtnf_post_init_ep() as the point where interrupts
become visible. This closes the early-IRQ window while preserving the
intended bring-up order.

Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Link: https://patch.msgid.link/20260531145435.701703-1-runyu.xiao@seu.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Add KUnit test for ieee80211_mesh_perr_size_ok

Add a kunit test for ieee80211_mesh_perr_size_ok(),
checking various success and failure cases.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-9-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Add KUnit test for ieee80211_mesh_prep_size_ok

Add a kunit test for ieee80211_mesh_prep_size_ok(),
checking various success and failure cases.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-8-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Add KUnit test for ieee80211_mesh_preq_size_ok

Add a kunit test for ieee80211_mesh_preq_size_ok(),
checking various success and failure cases.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-7-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Fix PERR frame processing

There are no issues with the PERR processing itself; however, to maintain
consistency with the previous PREQ/PREP code modifications, I will create a
new mesh_path_parse_error_frame() function to separately implement the
frame format validation and the "not supported" check.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-6-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Fix overread in PREP frame processing

When the AF flag is enabled, hwmp_prep_frame_process() overreads orig_addr
by 2 bytes. Since this occurs within the socket buffer, it does not read
across memory boundaries and therefore poses no security risk; however, we
will fix it as a precaution.

In this fix, a new function mesh_path_parse_reply_frame() is established to
separate the implementation of frame format validation and the check for
unsupported features. This is intended to facilitate future work when
implementing the currently unsupported parts.

Assisted-by: Claude:Sonnet 4.6
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-5-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Fix overread in PREQ frame processing

When the AF flag is enabled, hwmp_preq_frame_process() overreads
target_addr by 2 bytes. Since this occurs within the socket buffer, it does
not read across memory boundaries and therefore poses no security risk;
however, we will fix it as a precaution.

In this fix, a new function mesh_path_parse_request_frame() is established
to separate the implementation of frame format validation and the check for
unsupported features. This is intended to facilitate future work when
implementing the currently unsupported parts.

Assisted-by: Claude:Sonnet 4.6
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-4-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Use struct instead of macro for PERR frame

The existing PERR_IE_* macros access HWMP PERR frame fields via hardcoded
byte offsets. Each PERR destination entry contains an optional 6-byte AE
(Address Extension) address followed by a reason code, making offset-based
access error-prone.

Introduce typed packed C structs to represent the PERR frame layout:
  - ieee80211_mesh_hwmp_perr: top-level frame containing TTL and
    destination count
  - ieee80211_mesh_hwmp_perr_dst: per-destination entry with optional AE
    address and variable-position reason code

Add ieee80211_mesh_hwmp_perr_get_rcode() to locate the reason code in
each destination entry depending on whether the AE flag is set.

This refactoring makes the PERR processing code consistent with the
struct-based approach adopted for PREQ and PREP in preceding patches.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-3-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Use struct instead of macro for PREP frame

The existing PREP_IE_* macros access HWMP PREP frame fields via hardcoded
byte offsets. When the AE (Address Extension) flag is set, an additional
6 bytes appear mid-frame, making the offset arithmetic error-prone.

Introduce typed packed C structs to represent the PREP frame layout:
  - ieee80211_mesh_hwmp_prep_top: fixed fields before the optional AE
    address
  - ieee80211_mesh_hwmp_prep_bottom: fields after the optional AE address

Add ieee80211_mesh_hwmp_prep_get_bottom() to locate the bottom struct
correctly based on whether the AE flag is set.

This preparatory refactoring is needed to fix a 2-byte overread of
orig_addr in hwmp_prep_frame_process() when AE is enabled, which is
addressed in a subsequent patch.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-2-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: Use struct instead of macro for PREQ frame

The existing PREQ_IE_* macros access HWMP PREQ frame fields via hardcoded
byte offsets. When the AE (Address Extension) flag is set, an additional
6 bytes appear mid-frame, and the macros handle this with conditional
arithmetic (e.g., AE_F_SET(x) ? x + N+6 : x + N). This approach
obscures the frame layout and is prone to miscalculation.

Introduce typed packed C structs to represent the PREQ frame layout:
  - ieee80211_mesh_hwmp_preq_top: fixed fields before the optional AE
    address
  - ieee80211_mesh_hwmp_preq_bottom: fields after the optional AE address
  - ieee80211_mesh_hwmp_preq_target: per-target fields

Add ieee80211_mesh_hwmp_preq_get_bottom() to locate the bottom struct
correctly based on whether the AE flag is set.

This preparatory refactoring is needed to fix a 2-byte overread of
target_addr in hwmp_preq_frame_process() when AE is enabled, which is
addressed in a subsequent patch.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Link: https://patch.msgid.link/20260529230952.124754-1-masashi.honma@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: remove 5/10 MHz channel code

Now that cfg80211 refuses all attempts to use 5/10 MHz
channels, all of this code is unreachable; remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20260529084502.4e5a9350206c.I2f6169a067ddd1b5e234668fcb6e07957fafacf2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: remove 5/10 MHz channel support

Remove WIPHY_FLAG_SUPPORTS_5_10_MHZ and 5/10 MHz channel
width support. We contemplated this back in early 2023
and didn't do it yet, but nobody stepped up to maintain
it.

It's already _mostly_ dead code since it can really only
be used for AP and maybe IBSS and monitor, but not on a
client since there's no way to scan (and hasn't been in
a very long time, if ever), so the only thing that ever
could really happen with it was run syzbot and trip over
assumptions in the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Lachlan Hodges <lachlan.hodges@morsemicro.com>
Link: https://patch.msgid.link/20260529084502.080c5885f0b7.I77cc94485b523c3c006005b9233db13cd4e077b3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: report assoc_link_id in station info for non-MLD STAs on MLD AP

When a non-MLD station associates with an MLD AP, it does so on a
specific link. However, sta_set_sinfo() never sets mlo_params_valid,
so nl80211 never emits NL80211_ATTR_MLO_LINK_ID in get_station /
dump_station responses. Userspace has no way to determine which link
a non-MLD STA is associated on.

Set mlo_params_valid to 1 and assoc_link_id to sta->deflink.link_id,
when valid_links is set.
Also set the mld_addr copy only for MLD STAs, so that non-MLD STAs
get a zeroed mld_addr as documented.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/20260528105042.835284-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'gemini-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator into soc/dt

Gemini device tree updates:

- Add two new devices: the Verbatim Gigabit NAS and the
  Raidsonic IB-4210-B, including ACKed binding updates.

- Fix up boot device for the SQ201.

- Use the right LED trigger for disk activity.

- Add the SSP/SPI block to the SoC.

- Fix up the RUT1xx device tree.

* tag 'gemini-for-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator:
  ARM: dts: gemini: Correct the RUT1xx
  ARM: dts: Add a Raidsonic IB-4210-B DTS
  ARM: dts: Add a Verbatim Gigabit NAS DTS
  dt-bindings: arm: Add two missing Gemini devices
  dt-bindings: vendor-prefixes: Add Verbatim Corporation
  ARM: dts: gemini: Add SSP/SPI block
  ARM: dts: gemini: Tag disk led for disk-activity
  ARM: dts: gemini: iTian SQ201 need to boot from mtdblock3

Signed-off-by: Linus Walleij <linusw@kernel.org>

ASoC: cs42xx8: Add SPI bus support for CS42448/CS42888 codec

Chancel Liu <chancel.liu@nxp.com> says:

The existing cs42xx8 driver only supported I2C control interface.
Add SPI bus support for the Cirrus Logic CS42448/CS42888 Audio CODEC.

Link: https://patch.msgid.link/20260603095041.3906558-1-chancel.liu@oss.nxp.com

ASoC: cs42xx8: Add SPI bus support for CS42448/CS42888 codec

The existing cs42xx8 driver only supported I2C control interface.
Add SPI bus support for the Cirrus Logic CS42448/CS42888 Audio CODEC.

Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260603095041.3906558-3-chancel.liu@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: cirrus,cs42xx8: Add SPI bus support

Codec CS42448/CS42888 supports multiple control interfaces. At present,
only the I2C interface is implemented. Adding support for the SPI
control interface, operating at up to 6MHz.

Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20260603095041.3906558-2-chancel.liu@oss.nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>

arm64: Document SVE constraints on new hwcaps

Two of the SVE hwcaps added for the SVE features in the 2025 dpISA did
not explicitly call out their dependency on SVE in the ABI documentation.
Do so.

While we're here reorder the SVE and fature specific ID registers for
HWCAP3_SVE_LUT6 which did have the SVE dependency but listed it second
unlike the other SVE specific ID registers.

Fixes: abca5e69ab62 ("arm64/cpufeature: Define hwcaps for 2025 dpISA features")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: kernel: Disable CNP on HiSilicon HIP09

HiSilicon HIP09 implements TLB entry matching behavior that deviates
from the ARM architecture specification when the CNP (Common not Private)
bit is set in TTBRx_ELx.

When TTBRx.CNP=1, TLB entries may be incorrectly shared between CPU
cores, leading to TLB conflicts and stale mappings. This affects
coherency and can result in incorrect translations.

Add the hardware erratum workaround (Hisilicon erratum 162100125) to
disable CNP on affected HIP09 cores.

Co-developed-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: cpufeature: Add WORKAROUND_DISABLE_CNP capability

The NVIDIA Carmel CNP erratum is not the only case requiring CNP to be
disabled. Abstract this into a common WORKAROUND_DISABLE_CNP capability
to facilitate adding errata for future chips and reduce duplicate
checks in has_useable_cnp().

This serves as a prerequisite for the subsequent Hisilicon erratum
162100125.

Suggested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

lib/vsprintf: replace min_t/max_t with min/max

Use the simpler min()/max() macros since the values are all compatible.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20260518123145.79411-3-thorsten.blum@linux.dev
Signed-off-by: Petr Mladek <pmladek@suse.com>

wifi: cfg80211: enforce HE/EHT cap/oper consistency

Xiang Mei reports that mac80211 could crash if eht_cap is set
but eht_oper isn't. Rather than fixing that for the individual
user(s), enforce that both HE/EHT have consistent elements.

Reported-by: Xiang Mei <xmei5@asu.edu>
Fixes: 22c64f37e1d4 ("wifi: mac80211: Update MCS15 support in link_conf")
Link: https://patch.msgid.link/20260603091812.101894-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

drm/v3d: Skip CSD when it has zeroed workgroups

A compute shader dispatch encodes its workgroup counts in the CFG0..CFG2
registers. Kicking off a dispatch with a zero count in any of the three
dimensions is invalid. First, the hardware will process 0 as 65536,
while the user-space driver exposes a maximum of 65535. Over that, a
submission with a zeroed workgroup dimension should be a no-op.

These zeroed counts can reach the dispatch path through an indirect CSD
job, whose workgroup counts are only known once the indirect buffer is
read and may legitimately be zero, but such scenario should only result in
a no-op.

Overwrite the indirect CSD job workgroup counts with the indirect BO
ones, even if they are zeroed, and don't submit the job to the hardware
when any of the workgroup counts is zero, so the job completes immediately
instead of running the shader.

Cc: stable@vger.kernel.org
Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.")
Suggested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patch.msgid.link/20260602-v3d-fix-indirect-csd-v4-2-654309e32bc0@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>

drm/v3d: Fix vaddr leak when indirect CSD has zeroed workgroups

v3d_rewrite_csd_job_wg_counts_from_indirect() maps both the indirect
buffer and the workgroup buffer and is expected to release them before
returning. When any of the workgroup counts read from the buffer is zero,
the function bailed out early and skipped the cleanup, leaking the vaddr
mappings of both BOs.

Jump to the cleanup path instead of returning directly, so the mappings
are always dropped.

Cc: stable@vger.kernel.org
Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job")
Suggested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patch.msgid.link/20260602-v3d-fix-indirect-csd-v4-1-654309e32bc0@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>

rv: Use 0 to check preemption enabled in opid

Tracepoint handlers no longer run with preemption disabled by default
since a46023d5616 ("tracing: Guard __DECLARE_TRACE() use of
__DO_TRACE_CALL() with SRCU-fast"), the opid monitor should now count 1
in the preemption count as preemption disabled.

Change the rule for preempt_off to preempt > 0.

Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-11-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Prevent task migration while handling per-CPU events

Tracepoint handlers are fully preemptible after a46023d5616 ("tracing:
Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast"). When
a per-CPU monitor handles an event, it retrieves the monitor state using
a per-CPU pointer. If the event itself doesn't disable preemption, the
task can migrate to a different CPU and we risk updating the wrong
monitor.

Mitigate this by explicitly disabling task migration before acquiring
the monitor pointer. This cannot guarantee the monitor runs on the
correct CPU but reduces the race condition window and prevents warnings.

Reviewed-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-10-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Ensure synchronous cleanup for HA monitors

HA monitors may start timers, all cleanup functions currently stop the
timers asynchronously to avoid sleeping in the wrong context.
Nothing makes sure running callbacks terminate on cleanup.

Run the entire HA timer callback in an RCU read-side critical section,
this way we can simply synchronize_rcu() with any pending timer and are
sure any cleanup using kfree_rcu() runs after callbacks terminated.
Additionally make sure any unlikely callback running late won't run any
code if the monitor is marked as disabled or if destruction started.
Use memory barriers to serialise with racing resets.

Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Fixes: 4a24127bd6cb ("rv: Add support for per-object monitors in DA/HA")
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-9-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Add automatic cleanup handlers for per-task HA monitors

Hybrid automata monitors may start timers, depending on the model, these
may remain active on an exiting task and cause false positives or even
access freed memory.

Add an enable/disable hook in the HA code, currently only populated by
the per-task handler for registration and deregistration.
This hooks to the sched_process_exit event and ensures the timer is
stopped for every exiting task. The handler is enabled automatically but
may be disabled, for instance if the monitor uses the event for another
purpose (but should still manually ensure timers are stopped).

Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Do not rely on clean monitor when initialising HA

Hybrid Automata monitors hook into the DA implementation when doing
da_monitor_reset(). This function is called both on initialisation and
teardown, HA monitors try to cancel a timer only when it's initialised
relying on the da_mon->monitoring flag. This flag could however be
corrupted during initialisation. This happens for instance on per-task
monitors that share the same storage with different type of monitors
like LTL or in case of races during a previous teardown.

Stop relying on the monitoring flag during initialisation, assume that
can have any value, so use a separate da_reset_state() skiping timer
cancellation.
New monitors (e.g. new tasks) are always zero-initialised so it is safe
to rely on the monitoring flag for those.

Reported-by: Wen Yang <wen.yang@linux.dev>
Closes: https://lore.kernel.org/lkml/d02c656aada7d071f083460a5c9a454363669b61.1778522945.git.wen.yang@linux.dev
Suggested-by: Nam Cao <namcao@linutronix.de>
Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Fix monitor start ordering and memory ordering for monitoring flag

da_monitor_start() set monitoring=1 before calling da_monitor_init_hook(),
may racing with the sched_switch handler:

  da_monitor_start()               sched_switch handler
  -------------------------        ---------------------------------
  da_mon->monitoring = 1;
                                   if (da_monitoring(da_mon))  /* true  */
                                       ha_start_timer_ns(...);
                                       /* hrtimer->base == NULL, crash */
  da_monitor_init_hook(da_mon);
  /* hrtimer_setup() sets base */

Fix the ordering and pair with release/acquire semantics:

  da_monitor_init_hook(da_mon);
  smp_store_release(&da_mon->monitoring, 1);    /* da_monitor_start()  */
  return smp_load_acquire(&da_mon->monitoring); /* da_monitoring()     */

On ARM64 a plain STR + LDR does not form a release-acquire pair, so
the load can observe monitoring=1 while hrtimer->base is still NULL.
The plain accesses are also data races under KCSAN.

Use WRITE_ONCE for the monitoring=0 store in da_monitor_reset() to
cover the reset path.

Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Signed-off-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-6-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

rv: Ensure all pending probes terminate on per-obj monitor destroy

The monitor disable/destroy sequence detaches all probes and resets the
monitor's data, however it doesn't wait for pending probes. This is an
issue with per-object monitors, which free the monitor storage.

Call tracepoint_synchronize_unregister() to make sure to wait for all
pending probes before destroying the monitor storage.

Fixes: 4a24127bd6cb ("rv: Add support for per-object monitors in DA/HA")
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260601153840.124372-5-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>