git.ipfire.org Git - thirdparty/linux.git/log

Merge branch kvm-arm64/selftests-6.20 into kvmarm-master/next

* kvm-arm64/selftests-6.20:
  : .
  : Some selftest fixes addressing page alignment issues as well as
  : a bad MMU setup bug, courtesy of Fuad Tabba.
  : .
  KVM: selftests: Fix typos and stale comments in kvm_util
  KVM: selftests: Move page_align() to shared header
  KVM: riscv: selftests: Fix incorrect rounding in page_align()
  KVM: arm64: selftests: Fix incorrect rounding in page_align()
  KVM: arm64: selftests: Disable unused TTBR1_EL1 translations

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvm-arm64/vtcr into kvmarm-master/next

* kvm-arm64/vtcr:
  : .
  : VTCR_EL2 conversion to the configuration-driven RESx framework,
  : fixing a couple of UXN/PXN/XN bugs in the process.
  : .
  KVM: arm64: nv: Return correct RES0 bits for FGT registers
  KVM: arm64: Always populate FGT masks at boot time
  KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings
  KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation
  KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co
  arm64: Convert VTCR_EL2 to sysreg infratructure
  arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum
  KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
  KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit
  KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF
  KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps()
  KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()
  KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp()
  KVM: arm64: Inject UNDEF for a register trap without accessor
  KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load
  KVM: arm64: Fix EL2 S1 XN handling for hVHE setups
  KVM: arm64: gic: Check for vGICv3 when clearing TWI

Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch arm64/for-next/cpufeature into kvmarm-master/next

Merge arm64/for-next/cpufeature in to resolve conflicts resulting from
the removal of CONFIG_PAN.

* arm64/for-next/cpufeature:
  arm64: Add support for FEAT_{LS64, LS64_V}
  KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
  arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
  KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory
  KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
  KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots
  arm64: Unconditionally enable PAN support
  arm64: Unconditionally enable LSE support
  arm64: Add support for TSV110 Spectre-BHB mitigation

Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64: Add support for FEAT_{LS64, LS64_V}

Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
These features are identified by ID_AA64ISAR1_EL1.LS64 and the
use of such instructions in userspace (EL0) can be trapped.

As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell
apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later
patch will be exported to userspace, FEAT_LS64_V will be enabled only
in kernel.

In order to support the use of corresponding instructions in userspace:
- Make ID_AA64ISAR1_EL1.LS64 visbile to userspace
- Add identifying and enabling in the cpufeature list
- Expose these support of these features to userspace through HWCAP3
and cpuinfo

ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for
special memory (device memory) so requires support by the CPU, system
and target memory location (device that support these instructions).
The HWCAP3_LS64, implies the support of CPU and system (since no
identification method from system, so SoC vendors should advertise
support in the CPU if system also support them).

Otherwise for ld64b/st64b the atomicity may not be guaranteed or a
DABT will be generated, so users (probably userspace driver developer)
should make sure the target memory (device) also have the support.
For st64bv 0xffffffffffffffff will be returned as status result for
unsupported memory so user should check it.

Document the restrictions along with HWCAP3_LS64.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest

Using FEAT_{LS64, LS64_V} instructions in a guest is also controlled
by HCRX_EL2.{EnALS, EnASR}. Enable it if guest has related feature.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1

Instructions introduced by FEAT_{LS64, LS64_V} is controlled by
HCRX_EL2.{EnALS, EnASR}. Configure all of these to allow usage
at EL0/1.

This doesn't mean these instructions are always available in
EL0/1 if provided. The hypervisor still have the control at
runtime.

Acked-by: Will Deacon <will@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory

If FEAT_LS64WB not supported, FEAT_LS64* instructions only support
to access Device/Uncacheable memory, otherwise a data abort for
unsupported Exclusive or atomic access (0x35, UAoEF) is generated
per spec. It's implementation defined whether the target exception
level is routed and is possible to implemented as route to EL2 on a
VHE VM according to DDI0487L.b Section C3.2.6 Single-copy atomic
64-byte load/store.

If it's implemented as generate the DABT to the final enabled stage
(stage-2), inject the UAoEF back to the guest after checking the
memslot is valid.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B

Add a bit of documentation for KVM_EXIT_ARM_LDST64B so that userspace
knows what to expect.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots

The main use of {LD,ST}64B* is to talk to a device, which is hopefully
directly assigned to the guest and requires no additional handling.

However, this does not preclude a VMM from exposing a virtual device
to the guest, and to allow 64 byte accesses as part of the programming
interface. A direct consequence of this is that we need to be able
to forward such access to userspace.

Given that such a contraption is very unlikely to ever exist, we choose
to offer a limited service: userspace gets (as part of a new exit reason)
the ESR, the IPA, and that's it. It is fully expected to handle the full
semantics of the instructions, deal with ACCDATA, the return values and
increment PC. Much fun.

A canonical implementation can also simply inject an abort and be done
with it. Frankly, don't try to do anything else unless you have time
to waste.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Unconditionally enable PAN support

FEAT_PAN has been around since ARMv8.1 (over 11 years ago), has no compiler
dependency (we have our own accessors), and is a great security benefit.

Drop CONFIG_ARM64_PAN, and make the support unconditionnal.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Unconditionally enable LSE support

LSE atomics have been in the architecture since ARMv8.1 (released in
2014), and are hopefully supported by all modern toolchains.

Drop the optional nature of LSE support in the kernel, and always
compile the support in, as this really is very little code. LL/SC
still is the default, and the switch to LSE is done dynamically.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

KVM: arm64: nv: Return correct RES0 bits for FGT registers

We had extended the sysreg masking infrastructure to more general
registers, instead of restricting it to VNCR-backed registers, since
commit a0162020095e ("KVM: arm64: Extend masking facility to arbitrary
registers"). Fix kvm_get_sysreg_res0() to reflect this fact.

Note that we're sure that we only deal with FGT registers in
kvm_get_sysreg_res0(), the

if (sr < __VNCR_START__)

is actually a never false, which should probably be removed later.

Fixes: 69c19e047dfe ("KVM: arm64: Add TCR2_EL2 to the sysreg arrays")
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260121101631.41037-1-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org

KVM: arm64: Always populate FGT masks at boot time

We currently only populate the FGT masks if the underlying HW does
support FEAT_FGT. However, with the addition of the RES1 support for
system registers, this results in a lot of noise at boot time, as
reported by Nathan.

That's because even if FGT isn't supported, we still check for the
attribution of the bits to particular features, and not keeping the
masks up-to-date leads to (fairly harmess) warnings.

Given that we want these checks to be enforced even if the HW doesn't
support FGT, enable the generation of FGT masks unconditionally (this
is rather cheap anyway). Only the storage of the FGT configuration is
avoided, which will save a tiny bit of memory on these machines.

Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Fixes: c259d763e6b09 ("KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co")
Link: https://lore.kernel.org/r/20260120211558.GA834868@ax162
Link: https://patch.msgid.link/20260122085153.535538-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: selftests: Fix typos and stale comments in kvm_util

Fix minor documentation errors in `kvm_util.h` and `kvm_util.c`.

- Correct the argument description for `vcpu_args_set` in `kvm_util.h`,
  which incorrectly listed `vm` instead of `vcpu`.
- Fix a typo in the comment for `kvm_selftest_arch_init` ("exeucting" ->
  "executing").
- Correct the return value description for `vm_vaddr_unused_gap` in
  `kvm_util.c` to match the implementation, which returns an address "at
  or above" `vaddr_min`, not "at or below".

No functional change intended.

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: selftests: Move page_align() to shared header

To avoid code duplication, move page_align() to the shared `kvm_util.h`
header file. Rename it to vm_page_align(), to make it clear that the
alignment is done with respect to the guest's base page size.

No functional change intended.

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: riscv: selftests: Fix incorrect rounding in page_align()

The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.

Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.

Fixes: 3e06cdf10520 ("KVM: selftests: Add initial support for RISC-V 64-bit")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Fix incorrect rounding in page_align()

The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.

Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.

Fixes: 7a6629ef746d ("kvm: selftests: add virt mem support for aarch64")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: selftests: Disable unused TTBR1_EL1 translations

KVM selftests map all guest code and data into the lower virtual address
range (0x0000...) managed by TTBR0_EL1. The upper range (0xFFFF...)
managed by TTBR1_EL1 is unused and uninitialized.

If a guest accesses the upper range, the MMU attempts a translation
table walk using uninitialized registers, leading to unpredictable
behavior.

Set `TCR_EL1.EPD1` to disable translation table walks for TTBR1_EL1,
ensuring that any access to the upper range generates an immediate
Translation Fault. Additionally, set `TCR_EL1.TBI1` (Top Byte Ignore) to
ensure that tagged pointers in the upper range also deterministically
trigger a Translation Fault via EPD1.

Define `TCR_EPD1_MASK`, `TCR_EPD1_SHIFT`, and `TCR_TBI1` in
`processor.h` to support this configuration. These are based on their
definitions in `arch/arm64/include/asm/pgtable-hwdef.h`.

Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings

Now that we potentially have two bits to deal with when setting
execution permissions, make sure we correctly handle them when both
when building the page tables and when reading back from them.

Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation

Describe all the VTCR_EL2 fields and their respective configurations,
making sure that we correctly ignore the bits that are not defined
for a given guest configuration.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co

None of the registers we manage in the feature dependency infrastructure
so far has any RES1 bit. This is about to change, as VTCR_EL2 has
its bit 31 being RES1.

In order to not fail the consistency checks by not describing a bit,
add RES1 bits to the set of immutable bits. This requires some extra
surgery for the FGT handling, as we now need to track RES1 bits there
as well.

There are no RES1 FGT bits *yet*. Watch this space.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64: Convert VTCR_EL2 to sysreg infratructure

Our definition of VTCR_EL2 is both partial (tons of fields are
missing) and totally inconsistent (some constants are shifted,
some are not). They are also expressed in terms of TCR, which is
rather inconvenient.

Replace the ad-hoc definitions with the the generated version.
This results in a bunch of additional changes to make the code
with the unshifted nature of generated enumerations.

The register data was extracted from the BSD licenced AARCHMRS
(AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum

ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 are currently represented as unordered
enumerations. However, the architecture treats them as Unsigned,
as hinted to by the MRS data:

(FEAT_S2TGran4K <=> (((UInt(ID_AA64MMFR0_EL1.TGran4_2) == 0) &&
FEAT_TGran4K) ||
(UInt(ID_AA64MMFR0_EL1.TGran4_2) >= 2))))

and similar descriptions exist for 16 and 64k.

This is also confirmed by D24.1.3.3 ("Alternative ID scheme used for
ID_AA64MMFR0_EL1 stage 2 granule sizes") in the L.b revision of
the ARM ARM.

Turn these fields into UnsignedEnum so that we can use the above
description more or less literally.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>

Merge branch kvmarm-fixes-6.19-1 into kvm-arm64/vtcr

KVM/arm64 fixes for 6.19

- Ensure early return semantics are preserved for pKVM fault handlers

- Fix case where the kernel runs with the guest's PAN value when
   CONFIG_ARM64_PAN is not set

- Make stage-1 walks to set the access flag respect the access
   permission of the underlying stage-2, when enabled

- Propagate computed FGT values to the pKVM view of the vCPU at
   vcpu_load()

- Correctly program PXN and UXN privilege bits for hVHE's stage-1 page
   tables

- Check that the VM is actually using VGICv3 before accessing the GICv3
   CPU interface

- Delete some unused code

# -----BEGIN PGP SIGNATURE-----
#
# iI0EABYKADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaWiyJBccb2xpdmVyLnVw
# dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFqVhAQDM4Lbrq0F80X+YzvO7oxWioOy4
# JiTATSii9Lit8KY6fgEAvLD4qaggLdF3+WY+V37YmTj3UDgI31ClBr+xSvSengA=
# =XaL0
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 15 Jan 2026 09:23:48 GMT
# gpg:                using EDDSA key 8D5C78D65EECCC66EB6B28D2A2BE7588247CDD16
# gpg:                issuer "oliver.upton@linux.dev"
# gpg: Can't check signature: No public key

* tag 'kvmarm-fixes-6.19-1':
  KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
  KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit
  KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF
  KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps()
  KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()
  KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp()
  KVM: arm64: Inject UNDEF for a register trap without accessor
  KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load
  KVM: arm64: Fix EL2 S1 XN handling for hVHE setups
  KVM: arm64: gic: Check for vGICv3 when clearing TWI

Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers

Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a
fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT')
to KVM's page-table code. When set, the walk logic maintains its
previous behaviour of terminating a walk as soon as the visitor callback
returns an error. However, when the flag is clear, the walk will
continue if the visitor returns -EAGAIN and the error is then suppressed
and returned as zero to the caller.

Clearing the flag is beneficial when write-protecting a range of IPAs
with kvm_pgtable_stage2_wrprotect() but is not useful in any other
cases, either because we are operating on a single page (e.g.
kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the
early termination is desirable (e.g. when mapping pages from a fault in
user_mem_abort()).

Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM
MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but
failed to propagate any of the walker flags. As a result, page-table
walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the
early termination semantics are desirable on the fault handling path.

Rather than complicate the pKVM hypercall interface, invert the flag so
that the whole thing can be simplified and only pass the new flag
('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code.

Cc: Fuad Tabba <tabba@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://msgid.link/20260105154939.11041-2-will@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit

We set PSTATE.PAN to 1 on exiting from a guest if PAN support has
been compiled in and that it exists on the HW. However, this is not
necessarily correct.

In a nVHE configuration, there is no notion of PAN at EL2, so setting
PSTATE.PAN to anything is pointless.

Furthermore, not setting PAN to 0 when CONFIG_ARM64_PAN isn't set
means we run with the *guest's* PSTATE.PAN (which might be set to 1),
and we will explode on the next userspace access. Yes, the architecture
is delightful in that particular corner.

Fix the whole thing by always setting PAN to something when running
VHE (which implies PAN support), and only ignore it when running nVHE.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20260107124600.2736328-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF

Naturally, updating the Access Flag in a stage-1 descriptor requires
write permission at stage-2, although this isn't actually enforced in
KVM's software PTW.

Generate a stage-2 permission fault if the stage-1 walk attempts to
update the descriptor and its corresponding stage-2 translation lacks
write permission.

Fixes: bff8aa213dee ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20260108204230.677172-1-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps()

Function vcpu_{clear,set}_wfx_traps() are unused since
commit 0b5afe05377d7 ("KVM: arm64: Add early_param to
control WFx trapping").
Remove it.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Dongxu Sun <sundongxu1024@163.com>
Link: https://msgid.link/20260109080226.761107-1-sundongxu1024@163.com
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()

synchronize_vcpu_pstate() doesn't make use of the reference to exit_code,
remove the parameter.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://msgid.link/20251216103053.47224-5-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp()

__pvkm_host_share_hyp() and __pkvm_host_unshare_hyp() both have one
parameter, the pfn, not two. Even though correctness isn't impacted because
the SMCCC handlers pass the first argument and ignore the second one, let's
call the functions with the proper number of arguments.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://msgid.link/20251216103053.47224-4-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Inject UNDEF for a register trap without accessor

Configuring a register trap without specifying an accessor function is
abviously a bug. Instead of calling die() when that happens, let's be a
bit more helpful and print the register encoding. Also inject an
undefined instruction exception in the guest, similar to other unhandled
register accesses.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://msgid.link/20251216103053.47224-3-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load

Commit fb10ddf35c1c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()")
introduced per-VCPU FGT traps. For an unprotected pKVM VCPU, the untrusted
host FGT configuration is copied in pkvm_vcpu_init_traps(), which is called
from __pkvm_init_vcpu(). __pkvm_init_vcpu() is called once per VCPU (when
the VCPU is first run) which means that the uninitialized, zero, values for
the FGT registers end up being used for the entire lifetime of the VCPU.
This causes both unwanted traps (for the inverse polarity trap bits) and
the guest being allowed to access registers it shouldn't.

Fix it by copying the FGT traps for unprotected pKVM VCPUs when the
untrusted host loads the VCPU.

Fixes: fb10ddf35c1c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()")
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251216103053.47224-2-alexandru.elisei@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: Fix EL2 S1 XN handling for hVHE setups

The current XN implementation is tied to the EL2 translation regime,
and fall flat on its face with the EL2&0 one that is used for hVHE,
as the permission bit for privileged execution is a different one.

Fixes: 6537565fd9b7f ("KVM: arm64: Adjust EL2 stage-1 leaf AP bits when ARM64_KVM_HVHE is set")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://msgid.link/20251210173024.561160-2-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>

KVM: arm64: gic: Check for vGICv3 when clearing TWI

Explicitly check for the vgic being v3 when disabling TWI. Failure to
check this can result in using the wrong view of the vgic CPU IF union
causing undesirable/unexpected behaviour.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20260106165154.3321753-1-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oupton@kernel.org>

arm64: Add support for TSV110 Spectre-BHB mitigation

The TSV110 processor is vulnerable to the Spectre-BHB (Branch History
Buffer) attack, which can be exploited to leak information through
branch prediction side channels. This commit adds the MIDR of TSV110
to the list for software mitigation.

Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev>
Signed-off-by: Will Deacon <will@kernel.org>

Linux 6.19-rc4

Merge tag 'core_urgent_for_v6.19_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core entry fix from Borislav Petkov:

- Make sure clang inlines trivial local_irq_* helpers

* tag 'core_urgent_for_v6.19_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Always inline local_irq_{enable,disable}_exit_to_user()

Merge tag 'pmdomain-v6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:

- mediatek: Fix spinlock recursion fix during probe

- imx: Fix reference count leak during probe

* tag 'pmdomain-v6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx: Fix reference count leak in imx_gpc_probe()
pmdomain: mtk-pm-domains: Fix spinlock recursion fix in probe

Merge tag 'perf-tools-fixes-for-v6.19-2026-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tool fixes and from Namhyung Kim:

- skip building BPF skeletons if libopenssl is missing

- a couple of test updates

- handle error cases of filename__read_build_id()

- support NVIDIA Olympus for ARM SPE profiling

- update tool headers to sync with the kernel

* tag 'perf-tools-fixes-for-v6.19-2026-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  tools build: Fix the common set of features test wrt libopenssl
  tools headers: Sync syscall table with kernel sources
  tools headers: Sync linux/socket.h with kernel sources
  tools headers: Sync linux/gfp_types.h with kernel sources
  tools headers: Sync arm64 headers with kernel sources
  tools headers: Sync x86 headers with kernel sources
  tools headers: Sync UAPI sound/asound.h with kernel sources
  tools headers: Sync UAPI linux/mount.h with kernel sources
  tools headers: Sync UAPI linux/fs.h with kernel sources
  tools headers: Sync UAPI linux/fcntl.h with kernel sources
  tools headers: Sync UAPI KVM headers with kernel sources
  tools headers: Sync UAPI drm/drm.h with kernel sources
  perf arm-spe: Add NVIDIA Olympus to neoverse list
  tools headers arm64: Add NVIDIA Olympus part
  perf tests top: Make the test exclusive
  perf tests kvm: Avoid leaving perf.data.guest file around
  perf symbol: Fix ENOENT case for filename__read_build_id
  perf tools: Disable BPF skeleton if no libopenssl found
  tools/build: Add a feature test for libopenssl

Merge tag 'pm-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix a recent regression that affects system suspend testing
at the 'core' level (Rafael Wysocki)"

* tag 'pm-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Fix suspend_test() at the TEST_CORE level

Merge tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fix from Eric Biggers:
"Fix the kunit_run_irq_test() function (which I recently added for the
  CRC and crypto tests) to be less timing-dependent.

  This fixes flakiness in the polyval kunit test suite"

* tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  kunit: Enforce task execution in {soft,hard}irq contexts

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

- Fix several syzkaller found bugs:
    - Poor parsing of the RDMA_NL_LS_OP_IP_RESOLVE netlink
    - GID entry refcount leaking when CM destruction races with
      multicast establishment
    - Missing refcount put in ib_del_sub_device_and_put()

- Fixup recently introduced uABI padding for 32 bit consistency

- Avoid user triggered math overflow in MANA and AFA

- Reading invalid netdev data during an event

- kdoc fixes

- Fix never-working gid copying in ib_get_gids_from_rdma_hdr

- Typo in bnxt when validating the BAR

- bnxt mis-parsed IB_SEND_IP_CSUM so it didn't work always

- bnxt out of bounds access in bnxt related to the counters on new
   devices

- Allocate the bnxt PDE table with the right sizing

- Use dma_free_coherent() correctly in bnxt

- Allow rxe to be unloadable when CONFIG_PROVE_LOCKING by adjusting the
   tracking of the global sockets it uses

- Missing unlocking on error path in rxe

- Compute the right number of pages in a MR in rtrs

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/bnxt_re: fix dma_free_coherent() pointer
  RDMA/rtrs: Fix clt_path::max_pages_per_mr calculation
  IB/rxe: Fix missing umem_odp->umem_mutex unlock on error path
  RDMA/bnxt_re: Fix to use correct page size for PDE table
  RDMA/bnxt_re: Fix OOB write in bnxt_re_copy_err_stats()
  RDMA/bnxt_re: Fix IB_SEND_IP_CSUM handling in post_send
  RDMA/core: always drop device refcount in ib_del_sub_device_and_put()
  RDMA/rxe: let rxe_reclassify_recv_socket() call sk_owner_put()
  RDMA/bnxt_re: Fix incorrect BAR check in bnxt_qplib_map_creq_db()
  RDMA/core: Fix logic error in ib_get_gids_from_rdma_hdr()
  RDMA/efa: Remove possible negative shift
  RTRS/rtrs: clean up rtrs headers kernel-doc
  RDMA/irdma: avoid invalid read in irdma_net_event
  RDMA/mana_ib: check cqe length for kernel CQs
  RDMA/irdma: Fix irdma_alloc_ucontext_resp padding
  RDMA/ucma: Fix rdma_ucm_query_ib_service_resp struct padding
  RDMA/cm: Fix leaking the multicast GID table reference
  RDMA/core: Check for the presence of LS_NLA_TYPE_DGID correctly

Merge tag 'linux_kselftest-fixes-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

- Fix for build failures in tests that use an empty FIXTURE() seen in
   Android's build environment, which uses -D_FORTIFY_SOURCE=3, a build
   failure occurs in tests that use an empty FIXTURE()

- Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920
   board resulting from including transient trace file name in checksum
   compare

- Fix to remove available_events requirement from toplevel-enable for
   instance as it isn't a valid requirement for this test

* tag 'linux_kselftest-fixes-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest/harness: Use helper to avoid zero-size memset warning
  selftests/ftrace: Test toplevel-enable for instance
  selftests/ftrace: traceonoff_triggers: strip off names

Merge tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Scan partition tables asynchronously for ublk, similarly to how nvme
   does it. This avoids potential deadlocks, which is why nvme does it
   that way too. Includes a set of selftests as well.

- MD pull request via Yu:
     - Fix null-pointer dereference in raid5 sysfs group_thread_cnt
       store (Tuo Li)
     - Fix possible mempool corruption during raid1 raid_disks update
       via sysfs (FengWei Shih)
     - Fix logical_block_size configuration being overwritten during
       super_1_validate() (Li Nan)
     - Fix forward incompatibility with configurable logical block size:
       arrays assembled on new kernels could not be assembled on older
       kernels (v6.18 and before) due to non-zero reserved pad rejection
       (Li Nan)
     - Fix static checker warning about iterator not incremented (Li Nan)

- Skip CPU offlining notifications on unmapped hardware queues

- bfq-iosched block stats fix

- Fix outdated comment in bfq-iosched

* tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block, bfq: update outdated comment
  blk-mq: skip CPU offline notify on unmapped hctx
  selftests/ublk: fix Makefile to rebuild on header changes
  selftests/ublk: add test for async partition scan
  ublk: scan partition in async way
  block,bfq: fix aux stat accumulation destination
  md: Fix forward incompatibility from configurable logical block size
  md: Fix logical_block_size configuration being overwritten
  md: suspend array while updating raid_disks via sysfs
  md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
  md: Fix static checker warning in analyze_sbs

Merge tag 'io_uring-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Removed dead argument length for io_uring_validate_mmap_request()

- Use GFP_NOWAIT for overflow CQEs on legacy ring setups rather than
   GFP_ATOMIC, which makes it play nicer with memcg limits

- Fix a potential circular locking issue with tctx node removal and
   exec based cancelations

* tag 'io_uring-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/memmap: drop unused sz param in io_uring_validate_mmap_request()
  io_uring/tctx: add separate lock for list of tctx's in ctx
  io_uring: use GFP_NOWAIT for overflow CQEs on legacy rings

Merge tag 'x86-urgent-2026-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
"Fix the AMD microcode Entrysign signature checking code to include
more models"

* tag 'x86-urgent-2026-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix Entrysign revision check for Zen5/Strix Halo

Merge tag 'loongarch-fixes-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Complete CPUCFG registers definition, set correct protection_map[] for
  VM_NONE/VM_SHARED, fix some bugs in the orc stack unwinder, ftrace and
  BPF JIT"

* tag 'loongarch-fixes-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  samples/ftrace: Adjust LoongArch register restore order in direct calls
  LoongArch: BPF: Enhance the bpf_arch_text_poke() function
  LoongArch: BPF: Enable trampoline-based tracing for module functions
  LoongArch: BPF: Adjust the jump offset of tail calls
  LoongArch: BPF: Save return address register ra to t0 before trampoline
  LoongArch: BPF: Zero-extend bpf_tail_call() index
  LoongArch: BPF: Sign extend kfunc call arguments
  LoongArch: Refactor register restoration in ftrace_common_return
  LoongArch: Enable exception fixup for specific ADE subcode
  LoongArch: Remove unnecessary checks for ORC unwinder
  LoongArch: Remove is_entry_func() and kernel_entry_end
  LoongArch: Use UNWIND_HINT_END_OF_STACK for entry points
  LoongArch: Set correct protection_map[] for VM_NONE/VM_SHARED
  LoongArch: Complete CPUCFG registers definition

Merge tag 'drm-fixes-2026-01-02' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Happy New Year, jetlagged fixes from me, still pretty quiet, xe is
  most of this, with i915/nouveau/imagination fixes and some shmem
  cleanups.

  shmem:
   - docs and MODULE_LICENSE fix

  xe:
   - Ensure svm device memory is idle before migration completes
   - Fix a SVM debug printout
   - Use READ_ONCE() / WRITE_ONCE() for g2h_fence

  i915:
   - Fix eb_lookup_vmas() failure path

  nouveau:
   - fix prepare_fb warnings

  imagination:
   - prevent export of protected objects"

* tag 'drm-fixes-2026-01-02' of https://gitlab.freedesktop.org/drm/kernel:
  drm/i915/gem: Zero-initialize the eb.vma array in i915_gem_do_execbuffer
  drm/xe/guc: READ/WRITE_ONCE g2h_fence->done
  drm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use
  drm/xe/svm: Fix a debug printout
  drm/gem-shmem: Fix the MODULE_LICENSE() string
  drm/gem-shmem: Fix typos in documentation
  drm/nouveau/dispnv50: Don't call drm_atomic_get_crtc_state() in prepare_fb
  drm/imagination: Disallow exporting of PM/FW protected objects

Merge tag 'v6.19-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix memory leak

- Fix two refcount leaks

- Fix error path in create_smb2_pipe

* tag 'v6.19-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
  smb/server: fix refcount leak in smb2_open()
  smb/server: fix refcount leak in parse_durable_handle_context()
  smb/server: call ksmbd_session_rpc_close() on error path in create_smb2_pipe()
  ksmbd: Fix memory leak in get_file_all_info()

Merge tag 'v6.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix array out of bounds error in copy_file_range

- Add tracepoint to help debug ioctl failures

* tag 'v6.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix UBSAN array-index-out-of-bounds in smb2_copychunk_range
smb3 client: add missing tracepoint for unsupported ioctls

block, bfq: update outdated comment

The function bfq_bfqq_may_idle() was renamed as bfq_better_to_idle()
in commit 277a4a9b56cd ("block, bfq: give a better name to
bfq_bfqq_may_idle"). Update the comment accordingly.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/memmap: drop unused sz param in io_uring_validate_mmap_request()

io_uring_validate_mmap_request() doesn't use its size_t sz argument, so
remove it.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/tctx: add separate lock for list of tctx's in ctx

ctx->tcxt_list holds the tasks using this ring, and it's currently
protected by the normal ctx->uring_lock. However, this can cause a
circular locking issue, as reported by syzbot, where cancelations off
exec end up needing to remove an entry from this list:

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L
------------------------------------------------------
syz.0.9999/12287 is trying to acquire lock:
ffff88805851c0a8 (&ctx->uring_lock){+.+.}-{4:4}, at: io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179

but task is already holding lock:
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sig->cred_guard_mutex){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
       proc_pid_attr_write+0x547/0x630 fs/proc/base.c:2837
       vfs_write+0x27e/0xb30 fs/read_write.c:684
       ksys_write+0x145/0x250 fs/read_write.c:738
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (sb_writers#3){.+.+}-{0:0}:
       percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
       percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
       __sb_start_write include/linux/fs/super.h:19 [inline]
       sb_start_write+0x4d/0x1c0 include/linux/fs/super.h:125
       mnt_want_write+0x41/0x90 fs/namespace.c:499
       open_last_lookups fs/namei.c:4529 [inline]
       path_openat+0xadd/0x3dd0 fs/namei.c:4784
       do_filp_open+0x1fa/0x410 fs/namei.c:4814
       io_openat2+0x3e0/0x5c0 io_uring/openclose.c:143
       __io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1792
       io_issue_sqe+0x165/0x1060 io_uring/io_uring.c:1815
       io_queue_sqe io_uring/io_uring.c:2042 [inline]
       io_submit_sqe io_uring/io_uring.c:2320 [inline]
       io_submit_sqes+0xbf4/0x2140 io_uring/io_uring.c:2434
       __do_sys_io_uring_enter io_uring/io_uring.c:3280 [inline]
       __se_sys_io_uring_enter+0x2e0/0x2b60 io_uring/io_uring.c:3219
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&ctx->uring_lock){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
       lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
       io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
       io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
       io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
       io_uring_task_cancel include/linux/io_uring.h:24 [inline]
       begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
       load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
       search_binary_handler fs/exec.c:1669 [inline]
       exec_binprm fs/exec.c:1701 [inline]
       bprm_execve+0x92e/0x1400 fs/exec.c:1753
       do_execveat_common+0x510/0x6a0 fs/exec.c:1859
       do_execve fs/exec.c:1933 [inline]
       __do_sys_execve fs/exec.c:2009 [inline]
       __se_sys_execve fs/exec.c:2004 [inline]
       __x64_sys_execve+0x94/0xb0 fs/exec.c:2004
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  &ctx->uring_lock --> sb_writers#3 --> &sig->cred_guard_mutex

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sig->cred_guard_mutex);
                               lock(sb_writers#3);
                               lock(&sig->cred_guard_mutex);
  lock(&ctx->uring_lock);

*** DEADLOCK ***

1 lock held by syz.0.9999/12287:
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733

stack backtrace:
CPU: 0 UID: 0 PID: 12287 Comm: syz.0.9999 Tainted: G             L      syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_circular_bug+0x2e2/0x300 kernel/locking/lockdep.c:2043
check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff3a8b8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a9a97038 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00007ff3a8de5fa0 RCX: 00007ff3a8b8f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000400
RBP: 00007ff3a8c13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff3a8de6038 R14: 00007ff3a8de5fa0 R15: 00007ff3a8f0fa28
</TASK>

Add a separate lock just for the tctx_list, tctx_lock. This can nest
under ->uring_lock, where necessary, and be used separately for list
manipulation. For the cancelation off exec side, this removes the
need to grab ->uring_lock, hence fixing the circular locking
dependency.

Reported-by: syzbot+b0e3b77ffaa8a4067ce5@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'drm-intel-fixes-2025-12-31' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

drm/i915 fixes for v6.19-rc4:
- Fix eb_lookup_vmas() failure path

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/4e79f041395bb8bcc9b2a76bb98b5e3df1c1c3eb@intel.com

Merge tag 'drm-misc-fixes-2025-12-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

drm-misc-fixes for v6.19-rc4:
- Documentation fixes and MODULE_LICENSE fix for shmem helper.
- Fix warnings in nouveau prepare_fb().
- Prevent export of protected objects in imagination driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/5506492b-02ca-47bc-8712-51e67f0e4b8b@linux.intel.com

Merge tag 'drm-xe-fixes-2025-12-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Core Changes:
- Ensure a SVM device memory allocation is idle before migration complete (Thomas)

Driver Changes:
- Fix a SVM debug printout (Thomas)
- Use READ_ONCE() / WRITE_ONCE() for g2h_fence (Jonathan)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aVOTf6-whmkgrUuq@fedora

wifi: mt76: Remove blank line after mt792x firmware version dmesg

An extra blank line gets printed after printing firmware version
because the build date is null terminated. Remove the "\n" from
dev_info() calls to print firmware version and build date to fix
the problem.

Reported-by: Mario Limonciello <superm1@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Revert "wifi: mt76: Strip whitespace from build ddate"

This reverts commit f804a5895ebad2b2d4fb8a3688d2115926e993d5.

This change introduced the following panic, and mt792x_load_firmware()
fails. wifi is dead on systems with mt792x wireless.

kern  :crit  : kernel BUG at lib/string_helpers.c:1043!
kern  :warn  : Oops: invalid opcode: 0000 [#1] SMP NOPTI
kern  :warn  : CPU: 14 UID: 0 PID: 61 Comm: kworker/14:0 Tainted: G        W
        6.19.0-rc1 #1 PREEMPT(voluntary)
kern  :warn  : Tainted: [W]=WARN
kern  :warn  : Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.16 07/25/2025
kern  :warn  : Workqueue: events mt7921_init_work [mt7921_common]
kern  :warn  : RIP: 0010:__fortify_panic+0xd/0xf
kern  :warn  : Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 40 0f b6 ff e8 c3 55 71 00 <0f> 0b 48 8b 54 24 10 48 8b 74 24 08 4c 89 e9 48 c7 c7 00 a2 d5 a0
kern  :warn  : RSP: 0018:ffffa7a5c03a3d10 EFLAGS: 00010246
kern  :warn  : RAX: ffffffffa0d7aaf2 RBX: 0000000000000000 RCX: ffffffffa0d7aaf2
kern  :warn  : RDX: 0000000000000011 RSI: ffffffffa0d5a170 RDI: ffffffffa128db10
kern  :warn  : RBP: ffff91650ae52060 R08: 0000000000000010 R09: ffffa7a5c31b2000
kern  :warn  : R10: ffffa7a5c03a3bf0 R11: 00000000ffffffff R12: 0000000000000000
kern  :warn  : R13: ffffa7a5c31b2000 R14: 0000000000001000 R15: 0000000000000000
kern  :warn  : FS:  0000000000000000(0000) GS:ffff91743e664000(0000) knlGS:0000000000000000
kern  :warn  : CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern  :warn  : CR2: 00007f10786c241c CR3: 00000003eca24000 CR4: 0000000000f50ef0
kern  :warn  : PKRU: 55555554
kern  :warn  : Call Trace:
kern  :warn  :  <TASK>
kern  :warn  :  mt76_connac2_load_patch.cold+0x2b/0xa41 [mt76_connac_lib]
kern  :warn  :  ? srso_alias_return_thunk+0x5/0xfbef5
kern  :warn  :  mt792x_load_firmware+0x36/0x150 [mt792x_lib]
kern  :warn  :  mt7921_run_firmware+0x2c/0x4a0 [mt7921_common]
kern  :warn  :  ? srso_alias_return_thunk+0x5/0xfbef5
kern  :warn  :  ? mt7921_rr+0x12/0x30 [mt7921e]
kern  :warn  :  ? srso_alias_return_thunk+0x5/0xfbef5
kern  :warn  :  ? ____mt76_poll_msec+0x75/0xb0 [mt76]
kern  :warn  :  mt7921e_mcu_init+0x4c/0x7a [mt7921e]
kern  :warn  :  mt7921_init_work+0x51/0x190 [mt7921_common]
kern  :warn  :  process_one_work+0x18b/0x340
kern  :warn  :  worker_thread+0x256/0x3a0
kern  :warn  :  ? __pfx_worker_thread+0x10/0x10
kern  :warn  :  kthread+0xfc/0x240
kern  :warn  :  ? __pfx_kthread+0x10/0x10
kern  :warn  :  ? __pfx_kthread+0x10/0x10
kern  :warn  :  ret_from_fork+0x254/0x290
kern  :warn  :  ? __pfx_kthread+0x10/0x10
kern  :warn  :  ret_from_fork_asm+0x1a/0x30
kern  :warn  :  </TASK>

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kselftest/harness: Use helper to avoid zero-size memset warning

When building kselftests with a toolchain that enables source
fortification (e.g., Android's build environment, which uses
-D_FORTIFY_SOURCE=3), a build failure occurs in tests that use an
empty FIXTURE().

The root cause is that an empty fixture struct results in
`sizeof(self_private)` evaluating to 0. The compiler's fortification
checks then detect the `memset()` call with a compile-time constant size
of 0, issuing a `-Wuser-defined-warnings` which is promoted to an error
by `-Werror`.

An initial attempt to guard the call with `if (sizeof(self_private) > 0)`
was insufficient. The compiler's static analysis is aggressive enough
to flag the `memset(..., 0)` pattern before evaluating the conditional,
thus still triggering the error.

To resolve this robustly, this change introduces a `static inline`
helper function, `__kselftest_memset_safe()`. This function wraps the
size check and the `memset()` call. By replacing the direct `memset()`
in the `__TEST_F_IMPL` macro with a call to this helper, we create an
abstraction boundary. This prevents the compiler's static analyzer from
"seeing" the problematic pattern at the macro expansion site, resolving
the build failure.

Build Context:
Compiler: Android (14488419, +pgo, +bolt, +lto, +mlgo, based on r584948) clang version 22.0.0 (https://android.googlesource.com/toolchain/llvm-project 2d65e4108033380e6fe8e08b1f1826cd2bfb0c99)
Relevant Options: -O2 -Wall -Werror -D_FORTIFY_SOURCE=3 -target i686-linux-android10000

Test: m kselftest_futex_futex_requeue_pi

Removed Gerrit Change-Id
Shuah Khan <skhan@linuxfoundation.org>

Link: https://lore.kernel.org/r/20251224084120.249417-1-wakel@google.com
Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

Merge tag 'platform-drivers-x86-v6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

- alienware-wmi-wmax: Area-51, x16, and 16X Aurora laptops support

- asus-armoury:
    - Fix FA507R PPT data
    - Add TDP data for more laptop models

- asus-nb-wmi: Asus Zenbook 14 display toggle key support

- dell-lis3lv02d: Dell Latitude 5400 support

- hp-bioscfg: Fix out-of-bounds array access in ACPI package parsing

- ibm_rtl: Fix EBDA signature search pointer arithmetic

- ideapad-laptop: Reassign KEY_CUT to KEY_SELECTIVE_SCREENSHOT

- intel/pmt:
    - Fix kobject memory leak on init failure
    - Use valid pointers on error handling path

- intel/vsec: Correct kernel doc comments

- mellanox: mlxbf-pmc: Fix event names

- msi-laptop: Add sysfs_remove_group()

- samsumg-galaxybook: Do not cast pointer to a shorter type

- think-lmi: WMI certificate thumbprint support for ThinkCenter

- uniwill: Tuxedo Book BA15 Gen10 support

* tag 'platform-drivers-x86-v6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (22 commits)
  platform/x86: asus-armoury: add support for G835LW
  platform/x86: asus-armoury: fix ppt data for FA507R
  platform/x86/intel/pmt/discovery: use valid device pointer in dev_err_probe
  platform/x86: hp-bioscfg: Fix out-of-bounds array access in ACPI package parsing
  platform/x86: asus-armoury: add support for G615LR
  platform/x86: asus-armoury: add support for FA608UM
  platform/x86: asus-armoury: add support for GA403WR
  platform/x86: asus-armoury: add support for GU605CR
  platform/x86: ideapad-laptop: Reassign KEY_CUT to KEY_SELECTIVE_SCREENSHOT
  platform/x86: samsung-galaxybook: Fix problematic pointer cast
  platform/x86/intel/pmt: Fix kobject memory leak on init failure
  platform/x86/intel/vsec: correct kernel-doc comments
  platform/x86: ibm_rtl: fix EBDA signature search pointer arithmetic
  platform/x86: msi-laptop: add missing sysfs_remove_group()
  platform/x86: think-lmi: Add WMI certificate thumbprint support for ThinkCenter
  platform/x86: dell-lis3lv02d: Add Latitude 5400
  platform/mellanox: mlxbf-pmc: Remove trailing whitespaces from event names
  platform/x86: asus-nb-wmi: Add keymap for display toggle
  platform/x86/uniwill: Add TUXEDO Book BA15 Gen10
  platform/x86: alienware-wmi-wmax: Add support for Alienware 16X Aurora
  ...

selftests/ftrace: Test toplevel-enable for instance

'available_events' is actually not required by
'test.d/event/toplevel-enable.tc' and its Existence has been tested in
'test.d/00basic/basic4.tc'.

So the require of 'available_events' can be dropped and then we can add
'instance' flag to test 'test.d/event/toplevel-enable.tc' for instance.

Test result show as below:
# ./ftracetest test.d/event/toplevel-enable.tc
=== Ftrace unit tests ===
[1] event tracing - enable/disable with top level files [PASS]
[2] (instance)  event tracing - enable/disable with top level files [PASS]

# of passed:  2
# of failed:  0
# of unresolved:  0
# of untested:  0
# of unsupported:  0
# of xfailed:  0
# of undefined(test bug):  0

Link: https://lore.kernel.org/r/20230509203659.1173917-1-zhengyejian1@huawei.com
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/ftrace: traceonoff_triggers: strip off names

The func_traceonoff_triggers.tc sometimes goes to fail
on my board, Kunpeng-920.

[root@localhost]# ./ftracetest ./test.d/ftrace/func_traceonoff_triggers.tc -l fail.log
=== Ftrace unit tests ===
[1] ftrace - test for function traceon/off triggers     [FAIL]
[2] (instance)  ftrace - test for function traceon/off triggers [UNSUPPORTED]

I look up the log, and it shows that the md5sum is different between csum1 and csum2.

++ cnt=611
++ sleep .1
+++ cnt_trace
+++ grep -v '^#' trace
+++ wc -l
++ cnt2=611
++ '[' 611 -ne 611 ']'
+++ cat tracing_on
++ on=0
++ '[' 0 '!=' 0 ']'
+++ md5sum trace
++ csum1='76896aa74362fff66a6a5f3cf8a8a500  trace'
++ sleep .1
+++ md5sum trace
++ csum2='ee8625a21c058818fc26e45c1ed3f6de  trace'
++ '[' '76896aa74362fff66a6a5f3cf8a8a500  trace' '!=' 'ee8625a21c058818fc26e45c1ed3f6de  trace' ']'
++ fail 'Tracing file is still changing'
++ echo Tracing file is still changing
Tracing file is still changing
++ exit_fail
++ exit 1

So I directly dump the trace file before md5sum, the diff shows that:

[root@localhost]# diff trace_1.log trace_2.log -y --suppress-common-lines
dockerd-12285   [036] d.... 18385.510290: sched_stat | <...>-12285   [036] d.... 18385.510290: sched_stat
dockerd-12285   [036] d.... 18385.510291: sched_swit | <...>-12285   [036] d.... 18385.510291: sched_swit
<...>-740       [044] d.... 18385.602859: sched_stat | kworker/44:1-740 [044] d.... 18385.602859: sched_stat
<...>-740       [044] d.... 18385.602860: sched_swit | kworker/44:1-740 [044] d.... 18385.602860: sched_swit

And we can see that <...> filed be filled with names.

We can strip off the names there to fix that.

After strip off the names:

kworker/u257:0-12 [019] d..2.  2528.758910: sched_stat | -12 [019] d..2.  2528.758910: sched_stat_runtime: comm=k
kworker/u257:0-12 [019] d..2.  2528.758912: sched_swit | -12 [019] d..2.  2528.758912: sched_switch: prev_comm=kw
<idle>-0          [000] d.s5.  2528.762318: sched_waki | -0  [000] d.s5.  2528.762318: sched_waking: comm=sshd pi
<idle>-0          [037] dNh2.  2528.762326: sched_wake | -0  [037] dNh2.  2528.762326: sched_wakeup: comm=sshd pi
<idle>-0          [037] d..2.  2528.762334: sched_swit | -0  [037] d..2.  2528.762334: sched_switch: prev_comm=sw

Link: https://lore.kernel.org/r/20230818013226.2182299-1-zouyipeng@huawei.com
Fixes: d87b29179aa0 ("selftests: ftrace: Use md5sum to take less time of checking logs")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

Merge tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

- Restrict ROM access to dword to resolve a regression introduced with
   qword access seen on some Intel NICs. Update VGA region access to the
   same given lack of precedent for 64-bit users (Kevin Tian)

- Fix missing .get_region_info_caps callback in the xe-vfio-pci variant
   driver due to integration through the DRM tree (Michal Wajdeczko)

- Add aligned 64-bit access macros to tools/include/linux/types.h,
   allowing removal of uapi/linux/type.h includes from various vfio
   selftest, resolving redefinition warnings for integration with KVM
   selftests (David Matlack)

- Fix error path memory leak in pds-vfio-pci variant driver (Zilin Guan)

- Fix error path use-after-free in xe-vfio-pci variant driver (Alper Ak)

* tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio:
  vfio/xe: Fix use-after-free in xe_vfio_pci_alloc_file()
  vfio/pds: Fix memory leak in pds_vfio_dirty_enable()
  vfio: selftests: Drop <uapi/linux/types.h> includes
  tools include: Add definitions for __aligned_{l,b}e64
  vfio/xe: Add default handler for .get_region_info_caps
  vfio/pci: Disable qword access to the VGA region
  vfio/pci: Disable qword access to the PCI ROM bar

Merge tag 'md-6.19-20251231' of gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux into block-6.19

Pull MD fixes from Yu Kuai:

"- Fix null-pointer dereference in raid5 sysfs group_thread_cnt store
   (Tuo Li)
- Fix possible mempool corruption during raid1 raid_disks update via
   sysfs (FengWei Shih)
- Fix logical_block_size configuration being overwritten during
   super_1_validate() (Li Nan)
- Fix forward incompatibility with configurable logical block size:
   arrays assembled on new kernels could not be assembled on kernels
   <=6.18 due to non-zero reserved pad rejection (Li Nan)
- Fix static checker warning about iterator not incremented (Li Nan)"

* tag 'md-6.19-20251231' of gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux:
  md: Fix forward incompatibility from configurable logical block size
  md: Fix logical_block_size configuration being overwritten
  md: suspend array while updating raid_disks via sysfs
  md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
  md: Fix static checker warning in analyze_sbs

drm/i915/gem: Zero-initialize the eb.vma array in i915_gem_do_execbuffer

Initialize the eb.vma array with values of 0 when the eb structure is
first set up. In particular, this sets the eb->vma[i].vma pointers to
NULL, simplifying cleanup and getting rid of the bug described below.

During the execution of eb_lookup_vmas(), the eb->vma array is
successively filled up with struct eb_vma objects. This process includes
calling eb_add_vma(), which might fail; however, even in the event of
failure, eb->vma[i].vma is set for the currently processed buffer.

If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which
prompts a call to eb_release_vmas() to clean up the mess. Since
eb_lookup_vmas() might fail during processing any (possibly not first)
buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know
at what point did the lookup function fail.

In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper
function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is
set to NULL in case i915_gem_object_userptr_submit_init() fails; the
current one needs to be cleaned up by eb_release_vmas() at this point,
so the next one is set. If eb_add_vma() fails, neither the current nor
the next vma is set to NULL, which is a source of a NULL deref bug
described in the issue linked in the Closes tag.

When entering eb_lookup_vmas(), the vma pointers are set to the slab
poison value, instead of NULL. This doesn't matter for the actual
lookup, since it gets overwritten anyway, however the eb_release_vmas()
function only recognizes NULL as the stopping value, hence the pointers
are being set to NULL as they go in case of intermediate failure. This
patch changes the approach to filling them all with NULL at the start
instead, rather than handling that manually during failure.

Reported-by: Gangmin Kim <km.kim1503@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15062
Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Cc: stable@vger.kernel.org # 5.16.x
Signed-off-by: Krzysztof Niemiec <krzysztof.niemiec@intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20251216180900.54294-2-krzysztof.niemiec@intel.com
(cherry picked from commit 08889b706d4f0b8d2352b7ca29c2d8df4d0787cd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

samples/ftrace: Adjust LoongArch register restore order in direct calls

Ensure that in the ftrace direct call logic, the CPU register state
(with ra = parent return address) is restored to the correct state after
the execution of the custom trampoline function and before returning to
the traced function. Additionally, guarantee the correctness of the jump
logic for jr t0 (traced function address).

Cc: stable@vger.kernel.org
Fixes: 9cdc3b6a299c ("LoongArch: ftrace: Add direct call support")
Reported-by: Youling Tang <tangyouling@kylinos.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Enhance the bpf_arch_text_poke() function

Enhance the bpf_arch_text_poke() function to enable accurate location
of BPF program entry points.

When modifying the entry point of a BPF program, skip the "move t0, ra"
instruction to ensure the correct logic and copy of the jump address.

Cc: stable@vger.kernel.org
Fixes: 677e6123e3d2 ("LoongArch: BPF: Disable trampoline for kernel module function trace")
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Enable trampoline-based tracing for module functions

Remove the previous restrictions that blocked the tracing of kernel
module functions. Fix the issue that previously caused kernel lockups
when attempting to trace module functions.

Before entering the trampoline code, the return address register ra
shall store the address of the next assembly instruction after the
'bl trampoline' instruction, which is the traced function address, and
the register t0 shall store the parent function return address. Refine
the trampoline return logic to ensure that register data remains correct
when returning to both the traced function and the parent function.

Before this patch was applied, the module_attach test in selftests/bpf
encountered a deadlock issue. This was caused by an incorrect jump
address after the trampoline execution, which resulted in an infinite
loop within the module function.

Cc: stable@vger.kernel.org
Fixes: 677e6123e3d2 ("LoongArch: BPF: Disable trampoline for kernel module function trace")
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Adjust the jump offset of tail calls

Call the next bpf prog and skip the first instruction of TCC
initialization.

A total of 7 instructions are skipped:
'move t0, ra' 1 inst
'move_imm + jirl' 5 inst
'addid REG_TCC, zero, 0' 1 inst

Relevant test cases: the tailcalls test item in selftests/bpf.

Cc: stable@vger.kernel.org
Fixes: 677e6123e3d2 ("LoongArch: BPF: Disable trampoline for kernel module function trace")
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Save return address register ra to t0 before trampoline

Modify the build_prologue() function to ensure the return address
register ra is saved to t0 before entering trampoline operations.
This change ensures the accurate return address handling when a BPF
program calls another BPF program, preventing errors in the BPF-to-BPF
call chain.

Cc: stable@vger.kernel.org
Fixes: 677e6123e3d2 ("LoongArch: BPF: Disable trampoline for kernel module function trace")
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Zero-extend bpf_tail_call() index

The bpf_tail_call() index should be treated as a u32 value. Let's
zero-extend it to avoid calling wrong BPF progs. See similar fixes
for x86 [1]) and arm64 ([2]) for more details.

[1]: https://github.com/torvalds/linux/commit/90caccdd8cc0215705f18b92771b449b01e2474a
[2]: https://github.com/torvalds/linux/commit/16338a9b3ac30740d49f5dfed81bac0ffa53b9c7

Cc: stable@vger.kernel.org
Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: BPF: Sign extend kfunc call arguments

The kfunc calls are native calls so they should follow LoongArch calling
conventions. Sign extend its arguments properly to avoid kernel panic.
This is done by adding a new emit_abi_ext() helper. The emit_abi_ext()
helper performs extension in place meaning a value already store in the
target register (Note: this is different from the existing sign_extend()
helper and thus we can't reuse it).

Cc: stable@vger.kernel.org
Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Refactor register restoration in ftrace_common_return

Refactor the register restoration sequence in the ftrace_common_return
function to clearly distinguish between the logic of normal returns and
direct call returns in function tracing scenarios. The logic is as
follows:

1. In the case of a normal return, the execution flow returns to the
traced function, and ftrace must ensure that the register data is
consistent with the state when the function was entered.

ra = parent return address; t0 = traced function return address.

2. In the case of a direct call return, the execution flow jumps to the
custom trampoline function, and ftrace must ensure that the register
data is consistent with the state when ftrace was entered.

ra = traced function return address; t0 = parent return address.

Cc: stable@vger.kernel.org
Fixes: 9cdc3b6a299c ("LoongArch: ftrace: Add direct call support")
Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Enable exception fixup for specific ADE subcode

This patch allows the LoongArch BPF JIT to handle recoverable memory
access errors generated by BPF_PROBE_MEM* instructions.

When a BPF program performs memory access operations, the instructions
it executes may trigger ADEM exceptions. The kernel’s built-in BPF
exception table mechanism (EX_TYPE_BPF) will generate corresponding
exception fixup entries in the JIT compilation phase; however, the
architecture-specific trap handling function needs to proactively call
the common fixup routine to achieve exception recovery.

do_ade(): fix EX_TYPE_BPF memory access exceptions for BPF programs,
ensure safe execution.

Relevant test cases: illegal address access tests in module_attach and
subprogs_extable of selftests/bpf.

Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Remove unnecessary checks for ORC unwinder

According to the following function definitions, __kernel_text_address()
already checks __module_text_address(), so it should remove the check of
__module_text_address() in bt_address() at least.

int __kernel_text_address(unsigned long addr)
{
if (kernel_text_address(addr))
return 1;
...
return 0;
}

int kernel_text_address(unsigned long addr)
{
bool no_rcu;
int ret = 1;
...
if (is_module_text_address(addr))
goto out;
...
return ret;
}

bool is_module_text_address(unsigned long addr)
{
guard(rcu)();
return __module_text_address(addr) != NULL;
}

Furthermore, there are two checks of __kernel_text_address(), one is in
bt_address() and the other is after calling bt_address(), it looks like
redundant.

Handle the exception address first and then use __kernel_text_address()
to validate the calculated address for exception or the normal address
in bt_address(), then it can remove the check of __kernel_text_address()
after calling bt_address().

Just remove unnecessary checks, no functional changes intended.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Remove is_entry_func() and kernel_entry_end

For now, the related code of is_entry_func() is useless, so they can be
removed. Then the symbol kernel_entry_end is not used any more, so it can
be removed too.

Link: https://lore.kernel.org/lkml/kjiyla6qj3l7ezspitulrdoc5laj2e6hoecvd254hssnpddczm@g6nkaombh6va/
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>

LoongArch: Use UNWIND_HINT_END_OF_STACK for entry points

kernel_entry() and smpboot_entry() are the last frames for ORC unwinder,
so it is proper to use the annotation UNWIND_HINT_END_OF_STACK for them.

Link: https://lore.kernel.org/lkml/ots6w2ntyudj5ucs5eowncta2vmfssatpcqwzpar3ekk577hxi@j45dd4dmwx6x/
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Set correct protection_map[] for VM_NONE/VM_SHARED

For 32BIT platform _PAGE_PROTNONE is 0, so set a VMA to be VM_NONE or
VM_SHARED will make pages non-present, then cause Oops with kernel page
fault.

Fix it by set correct protection_map[] for VM_NONE/VM_SHARED, replacing
_PAGE_PROTNONE with _PAGE_PRESENT.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Complete CPUCFG registers definition

According to the "LoongArch Reference Manual Volume 1: Basic
Architecture", begin with LA664 CPU core there are more features
supported which are indicated in CPUCFG2 and CPUCFG3. This patch
completes the definitions of them so as to match the architecture
specification.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

Merge tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes that arrived just a bit late for the 6.19 merge
  window.

  Regression fix:
   - Avoid unnecessarily breaking a timestamp delegation

  Stable fixes:
   - Fix a crasher in nlm4svc_proc_test()
   - Fix nfsd_file reference leak during write delegation
   - Fix error flow in client_states_open()"

* tag 'nfsd-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: Drop the client reference in client_states_open()
  nfsd: use ATTR_DELEG in nfsd4_finalize_deleg_timestamps()
  nfsd: fix nfsd_file reference leak in nfsd4_add_rdaccess_to_wrdeleg()
  lockd: fix vfs_test_lock() calls

io_uring: use GFP_NOWAIT for overflow CQEs on legacy rings

Allocate the overflowing CQE with GFP_NOWAIT instead of GFP_ATOMIC. This
changes causes allocations to fail earlier in out-of-memory situations,
rather than being deferred. Using GFP_ATOMIC allows a process to exceed
memory limits.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220794
Signed-off-by: Alexandre Negrel <alexandre@negrel.dev>
Link: https://lore.kernel.org/io-uring/20251229201933.515797-1-alexandre@negrel.dev/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth and WiFi. Notably this includes the fix
  for the iwlwifi issue you reported.

  Current release - regressions:

   - core: avoid prefetching NULL pointers

   - wifi:
      - iwlwifi: implement settime64 as stub for MVM/MLD PTP
      - mac80211: fix list iteration in ieee80211_add_virtual_monitor()

   - handshake: fix null-ptr-deref in handshake_complete()

   - eth: mana: fix use-after-free in reset service rescan path

  Previous releases - regressions:

   - openvswitch: avoid needlessly taking the RTNL on vport destroy

   - dsa: properly keep track of conduit reference

   - ipv4:
      - fix error route reference count leak with nexthop objects
      - fib: restore ECMP balance from loopback

   - mptcp: ensure context reset on disconnect()

   - bluetooth: fix potential UaF in btusb

   - nfc: fix deadlock between nfc_unregister_device and
     rfkill_fop_write

   - eth:
      - gve: defer interrupt enabling until NAPI registration
      - i40e: fix scheduling in set_rx_mode
      - macb: relocate mog_init_rings() callback from macb_mac_link_up()
        to macb_open()
      - rtl8150: fix memory leak on usb_submit_urb() failure

   - wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()

  Previous releases - always broken:

   - ip6_gre: make ip6gre_header() robust

   - ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT

   - af_unix: don't post cmsg for SO_INQ unless explicitly asked for

   - phy: mediatek: fix nvmem cell reference leak in
     mt798x_phy_calibration

   - wifi: mac80211: discard beacon frames to non-broadcast address

   - eth:
      - iavf: fix off-by-one issues in iavf_config_rss_reg()
      - stmmac: fix the crash issue for zero copy XDP_TX action
      - team: fix check for port enabled when priority changes"

* tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
  net: rose: fix invalid array index in rose_kill_by_device()
  net: enetc: do not print error log if addr is 0
  net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()
  selftests: fib_test: Add test case for ipv4 multi nexthops
  net: fib: restore ECMP balance from loopback
  selftests: fib_nexthops: Add test cases for error routes deletion
  ipv4: Fix reference count leak when using error routes with nexthop objects
  net: usb: sr9700: fix incorrect command used to write single register
  ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr()
  usbnet: avoid a possible crash in dql_completed()
  gve: defer interrupt enabling until NAPI registration
  net: stmmac: fix the crash issue for zero copy XDP_TX action
  octeontx2-pf: fix "UBSAN: shift-out-of-bounds error"
  af_unix: don't post cmsg for SO_INQ unless explicitly asked for
  net: mana: Fix use-after-free in reset service rescan path
  net: avoid prefetching NULL pointers
  net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct
  net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write
  net: usb: asix: validate PHY address before use
  ...

blk-mq: skip CPU offline notify on unmapped hctx

If an hctx has no software ctx mapped, blk_mq_map_swqueue() never
allocates tags and leaves hctx->tags NULL. The CPU hotplug offline
notifier can still run for that hctx, return early since hctx cannot
hold any requests.

Signed-off-by: Cong Zhang <cong.zhang@oss.qualcomm.com>
Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

smb: client: fix UBSAN array-index-out-of-bounds in smb2_copychunk_range

struct copychunk_ioctl_req::ChunkCount is annotated with
__counted_by_le() as the number of elements in Chunks[].

smb2_copychunk_range reuses ChunkCount to store the number of chunks
sent in the current iteration. If a later iteration populates more
chunks than a previous one, the stale smaller value trips UBSAN.

Set ChunkCount to chunk_count (allocated capacity) before populating
Chunks[].

Fixes: cc26f593dc19 ("smb: move copychunk definitions to common/smb2pdu.h")
Link: https://lore.kernel.org/linux-cifs/CAH2r5ms9AWLy8WZ04Cpq5XOeVK64tcrUQ6__iMW+yk1VPzo1BA@mail.gmail.com
Tested-by: Youling Tang <tangyouling@kylinos.cn>
Acked-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb3 client: add missing tracepoint for unsupported ioctls

In debugging a recent problem with an xfstest, noticed that we weren't
tracing cases where the ioctl was not supported.  Add dynamic tracepoint:
    "trace-cmd record -e smb3_unsupported_ioctl"
and then after running an app which calls unsupported ioctl,
"trace-cmd show"would display e.g.
      xfs_io-7289    [012] .....  1205.137765: smb3_unsupported_ioctl: xid=19 fid=0x4535bb84 ioctl cmd=0x801c581f

Acked-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

RDMA/bnxt_re: fix dma_free_coherent() pointer

The dma_alloc_coherent() allocates a dma-mapped buffer, pbl->pg_arr[i].
The dma_free_coherent() should pass the same buffer to
dma_free_coherent() and not page-aligned.

Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://patch.msgid.link/20251230085121.8023-2-fourier.thomas@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT

On PREEMPT_RT kernels, after rt6_get_pcpu_route() returns NULL, the
current task can be preempted. Another task running on the same CPU
may then execute rt6_make_pcpu_route() and successfully install a
pcpu_rt entry. When the first task resumes execution, its cmpxchg()
in rt6_make_pcpu_route() will fail because rt6i_pcpu is no longer
NULL, triggering the BUG_ON(prev). It's easy to reproduce it by adding
mdelay() after rt6_get_pcpu_route().

Using preempt_disable/enable is not appropriate here because
ip6_rt_pcpu_alloc() may sleep.

Fix this by handling the cmpxchg() failure gracefully on PREEMPT_RT:
free our allocation and return the existing pcpu_rt installed by
another task. The BUG_ON is replaced by WARN_ON_ONCE for non-PREEMPT_RT
kernels where such races should not occur.

Link: https://syzkaller.appspot.com/bug?extid=9b35e9bc0951140d13e6
Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.")
Reported-by: syzbot+9b35e9bc0951140d13e6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6918cd88.050a0220.1c914e.0045.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20251223051413.124687-1-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

platform/x86: asus-armoury: add support for G835LW

Add TDP data for laptop model G835LW.

Signed-off-by: Denis Benato <denis.benato@linux.dev>
Link: https://patch.msgid.link/20251229204458.2658777-1-denis.benato@linux.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

RDMA/rtrs: Fix clt_path::max_pages_per_mr calculation

If device max_mr_size bits in the range [mr_page_shift+31:mr_page_shift]
are zero, the `min3` function will set clt_path::max_pages_per_mr to
zero.

`alloc_path_reqs` will pass zero, which is invalid, as the third parameter
to `ib_alloc_mr`.

Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality")
Signed-off-by: Honggang LI <honggangli@163.com>
Link: https://patch.msgid.link/20251229025617.13241-1-honggangli@163.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

net: rose: fix invalid array index in rose_kill_by_device()

rose_kill_by_device() collects sockets into a local array[] and then
iterates over them to disconnect sockets bound to a device being brought
down.

The loop mistakenly indexes array[cnt] instead of array[i]. For cnt <
ARRAY_SIZE(array), this reads an uninitialized entry; for cnt ==
ARRAY_SIZE(array), it is an out-of-bounds read. Either case can lead to
an invalid socket pointer dereference and also leaks references taken
via sock_hold().

Fix the index to use i.

Fixes: 64b8bc7d5f143 ("net/rose: fix races in rose_kill_by_device()")
Co-developed-by: Fatma Alwasmi <falwasmi@purdue.edu>
Signed-off-by: Fatma Alwasmi <falwasmi@purdue.edu>
Signed-off-by: Pwnverse <stanksal@purdue.edu>
Link: https://patch.msgid.link/20251222212227.4116041-1-ritviktanksalkar@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: enetc: do not print error log if addr is 0

A value of 0 for addr indicates that the IEB_LBCR register does not
need to be configured, as its default value is 0. However, the driver
will print an error log if addr is 0, so this issue needs to be fixed.

Fixes: 50bfd9c06f0f ("net: enetc: set external PHY address in IERB for i.MX94 ENETC")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251222022628.4016403-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()

In the non-RT kernel, local_bh_disable() merely disables preemption,
whereas it maps to an actual spin lock in the RT kernel. Consequently,
when attempting to refill RX buffers via netdev_alloc_skb() in
macb_mac_link_up(), a deadlock scenario arises as follows:

   WARNING: possible circular locking dependency detected
   6.18.0-08691-g2061f18ad76e #39 Not tainted
   ------------------------------------------------------
   kworker/0:0/8 is trying to acquire lock:
   ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c

   but task is already holding lock:
   ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit
   +0x148/0xb7c

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (&queue->tx_ptr_lock){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x148/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #2 (_xmit_ETHER#2){+...}-{3:3}:
          rt_spin_lock+0x50/0x1f0
          sch_direct_xmit+0x11c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}:
          lock_release+0x250/0x348
          __local_bh_enable_ip+0x7c/0x240
          __netdev_alloc_skb+0x1b4/0x1d8
          gem_rx_refill+0xdc/0x240
          gem_init_rings+0xb4/0x108
          macb_mac_link_up+0x9c/0x2b4
          phylink_resolve+0x170/0x614
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   -> #0 (&bp->lock){+.+.}-{3:3}:
          __lock_acquire+0x15a8/0x2084
          lock_acquire+0x1cc/0x350
          rt_spin_lock+0x50/0x1f0
          macb_start_xmit+0x808/0xb7c
          dev_hard_start_xmit+0x94/0x284
          sch_direct_xmit+0x8c/0x37c
          __dev_queue_xmit+0x708/0x1120
          neigh_resolve_output+0x148/0x28c
          ip6_finish_output2+0x2c0/0xb2c
          __ip6_finish_output+0x114/0x308
          ip6_output+0xc4/0x4a4
          mld_sendpack+0x220/0x68c
          mld_ifc_work+0x2a8/0x4f4
          process_one_work+0x20c/0x5f8
          worker_thread+0x1b0/0x35c
          kthread+0x144/0x200
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&queue->tx_ptr_lock);
                                  lock(_xmit_ETHER#2);
                                  lock(&queue->tx_ptr_lock);
     lock(&bp->lock);

    *** DEADLOCK ***

   Call trace:
    show_stack+0x18/0x24 (C)
    dump_stack_lvl+0xa0/0xf0
    dump_stack+0x18/0x24
    print_circular_bug+0x28c/0x370
    check_noncircular+0x198/0x1ac
    __lock_acquire+0x15a8/0x2084
    lock_acquire+0x1cc/0x350
    rt_spin_lock+0x50/0x1f0
    macb_start_xmit+0x808/0xb7c
    dev_hard_start_xmit+0x94/0x284
    sch_direct_xmit+0x8c/0x37c
    __dev_queue_xmit+0x708/0x1120
    neigh_resolve_output+0x148/0x28c
    ip6_finish_output2+0x2c0/0xb2c
    __ip6_finish_output+0x114/0x308
    ip6_output+0xc4/0x4a4
    mld_sendpack+0x220/0x68c
    mld_ifc_work+0x2a8/0x4f4
    process_one_work+0x20c/0x5f8
    worker_thread+0x1b0/0x35c
    kthread+0x144/0x200
    ret_from_fork+0x10/0x20

Notably, invoking the mog_init_rings() callback upon link establishment
is unnecessary. Instead, we can exclusively call mog_init_rings() within
the ndo_open() callback. This adjustment resolves the deadlock issue.
Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings()
when opening the network interface via at91ether_open(), moving
mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC
check.

Fixes: 633e98a711ac ("net: macb: use resolved link config in mac_link_up()")
Cc: stable@vger.kernel.org
Suggested-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://patch.msgid.link/20251222015624.1994551-1-xiaolei.wang@windriver.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: fib_test: Add test case for ipv4 multi nexthops

The test checks that with multi nexthops route the preferred route is the
one which matches source ip. In case when source ip is on dummy
interface, it checks that the routes are balanced.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20251221192639.3911901-2-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fib: restore ECMP balance from loopback

Preference of nexthop with source address broke ECMP for packets with
source addresses which are not in the broadcast domain, but rather added
to loopback/dummy interfaces. Original behaviour was to balance over
nexthops while now it uses the latest nexthop from the group. To fix the
issue introduce next hop scoring system where next hops with source
address equal to requested will always have higher priority.

For the case with 198.51.100.1/32 assigned to dummy0 and routed using
192.0.2.0/24 and 203.0.113.0/24 networks:

2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff
    inet 198.51.100.1/32 scope global dummy0
       valid_lft forever preferred_lft forever
7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.2/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 203.0.113.2/24 scope global veth3
       valid_lft forever preferred_lft forever
    inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

~ ip ro list:
default
nexthop via 192.0.2.1 dev veth1 weight 1
nexthop via 203.0.113.1 dev veth3 weight 1
192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2
203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2

before:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    255 veth3

after:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    122 veth1
    133 veth3

Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251221192639.3911901-1-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: fib_nexthops: Add test cases for error routes deletion

Add test cases that check that error routes (e.g., blackhole) are
deleted when their nexthop is deleted.

Output without "ipv4: Fix reference count leak when using error routes
with nexthop objects":

# ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal"

IPv4 functional
----------------------
[...]
       WARNING: Unexpected route entry
TEST: Error route removed on nexthop deletion                       [FAIL]

IPv6
----------------------
[...]
TEST: Error route removed on nexthop deletion                       [ OK ]

Tests passed:  20
Tests failed:   1
Tests skipped:  0

Output with "ipv4: Fix reference count leak when using error routes
with nexthop objects":

# ./fib_nexthops.sh -t "ipv4_fcnal ipv6_fcnal"

IPv4 functional
----------------------
[...]
TEST: Error route removed on nexthop deletion                       [ OK ]

IPv6
----------------------
[...]
TEST: Error route removed on nexthop deletion                       [ OK ]

Tests passed:  21
Tests failed:   0
Tests skipped:  0

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251221144829.197694-2-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ipv4: Fix reference count leak when using error routes with nexthop objects

When a nexthop object is deleted, it is marked as dead and then
fib_table_flush() is called to flush all the routes that are using the
dead nexthop.

The current logic in fib_table_flush() is to only flush error routes
(e.g., blackhole) when it is called as part of network namespace
dismantle (i.e., with flush_all=true). Therefore, error routes are not
flushed when their nexthop object is deleted:

# ip link add name dummy1 up type dummy
# ip nexthop add id 1 dev dummy1
# ip route add 198.51.100.1/32 nhid 1
# ip route add blackhole 198.51.100.2/32 nhid 1
# ip nexthop del id 1
# ip route show
blackhole 198.51.100.2 nhid 1 dev dummy1

As such, they keep holding a reference on the nexthop object which in
turn holds a reference on the nexthop device, resulting in a reference
count leak:

# ip link del dev dummy1
[ 70.516258] unregister_netdevice: waiting for dummy1 to become free. Usage count = 2

Fix by flushing error routes when their nexthop is marked as dead.

IPv6 does not suffer from this problem.

Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Closes: https://lore.kernel.org/netdev/d943f806-4da6-4970-ac28-b9373b0e63ac@I-love.SAKURA.ne.jp/
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20251221144829.197694-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: usb: sr9700: fix incorrect command used to write single register

This fixes the device failing to initialize with "error reading MAC
address" for me, probably because the incorrect write of NCR_RST to
SR_NCR is not actually resetting the device.

Fixes: c9b37458e95629b1d1171457afdcc1bf1eb7881d ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Cc: stable@vger.kernel.org
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20251221082400.50688-1-enelsonmoore@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

IB/rxe: Fix missing umem_odp->umem_mutex unlock on error path

rxe_odp_map_range_and_lock() must release umem_odp->umem_mutex when an
error occurs, including cases where rxe_check_pagefault() fails.

Fixes: 2fae67ab63db ("RDMA/rxe: Add support for Send/Recv/Write/Read with ODP")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20251226094112.3042583-1-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

smb/server: fix refcount leak in smb2_open()

When ksmbd_vfs_getattr() fails, the reference count of ksmbd_file
must be released.

Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/server: fix refcount leak in parse_durable_handle_context()

When the command is a replay operation and -ENOEXEC is returned,
the refcount of ksmbd_file must be released.

Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>