git.ipfire.org Git - thirdparty/kernel/stable.git/log

Merge branch 'for-next/perf' into for-next/core

* for-next/perf: (29 commits)
  perf docs: arm_spe: Document new discard mode
  perf: arm_spe: Add format option for discard mode
  MAINTAINERS: Add perf list for drivers/perf/
  drivers/perf: apple_m1: Map generic branch events
  drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
  perf: imx9_perf: Introduce AXI filter version to refactor the driver and better extension
  perf/arm-cmn: Permit more exhaustive groups
  perf/dwc_pcie: Qualify RAS DES VSEC Capability by Vendor, Revision
  drivers/perf: hisi: Delete redundant blank line of DDRC PMU
  drivers/perf: hisi: Fix incorrect variable name "hha_pmu" in DDRC PMU driver
  drivers/perf: hisi: Export associated CPUs of each PMU through sysfs
  drivers/perf: hisi: Provide a generic implementation of cpumask/identifier
  drivers/perf: hisi: Add a common function to retrieve topology from firmware
  drivers/perf: hisi: Extract topology information to a separate structure
  drivers/perf: hisi: Refactor the detection of associated CPUs
  drivers/perf: hisi: Migrate to one online CPU if no associated one online
  drivers/perf: hisi: Don't update the associated_cpus on CPU offline
  drivers/perf: hisi: Define a symbol namespace for HiSilicon Uncore PMUs
  perf/marvell: Odyssey LLC-TAD performance monitor support
  perf/marvell: Refactor to extract platform data
  ...

Merge branch 'for-next/mm' into for-next/core

* for-next/mm:
  arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
  arm64/mm: Replace open encodings with PXD_TABLE_BIT
  arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
  arm64: Kconfig: force ARM64_PAN=y when enabling TTBR0 sw PAN
  arm64/kvm: Avoid invalid physical addresses to signal owner updates
  arm64/kvm: Configure HYP TCR.PS/DS based on host stage1
  arm64/mm: Override PARange for !LPA2 and use it consistently
  arm64/mm: Reduce PA space to 48 bits when LPA2 is not enabled

Merge branch 'for-next/misc' into for-next/core

* for-next/misc:
  arm64: Remove duplicate included header
  arm64/Kconfig: Drop EXECMEM dependency from ARCH_WANTS_EXECMEM_LATE
  arm64: asm: Fix typo in pgtable.h
  arm64/mm: Ensure adequate HUGE_MAX_HSTATE
  arm64/mm: Replace open encodings with PXD_TABLE_BIT
  arm64/mm: Drop INIT_MM_CONTEXT()

Merge branch 'for-next/docs' into for-next/core

* for-next/docs:
  Documentation: arm64: Remove stale and redundant virtual memory diagrams
  docs: arm64: Document EL3 requirements for FEAT_PMUv3
  docs: arm64: Document EL3 requirements for cpu debug architecture

Merge branch 'for-next/cpufeature' into for-next/core

* for-next/cpufeature:
  kselftest/arm64: Add 2024 dpISA extensions to hwcap test
  KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
  arm64/hwcap: Describe 2024 dpISA extensions to userspace
  arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
  arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
  arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
  arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64ISAR3_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64PFR2_EL1 to DDI0601 2024-09
  arm64/sysreg: Get rid of CPACR_ELx SysregFields
  arm64/sysreg: Convert *_EL12 accessors to Mapping
  arm64/sysreg: Get rid of the TCR2_EL1x SysregFields
  arm64/sysreg: Allow a 'Mapping' descriptor for system registers
  arm64/cpufeature: Refactor conditional logic in init_cpu_ftr_reg()
  arm64: cpufeature: Add HAFT to cpucap_is_possible()

Merge branch 'for-next/cca' into for-next/core

* for-next/cca:
arm64: rsi: Add automatic arm-cca-guest module loading

Documentation: arm64: Remove stale and redundant virtual memory diagrams

The arm64 'memory.rst' file tries to document the virtual memory map
and the translation procedure for a couple of kernel configurations.

Unfortunately, the virtual memory map changes relatively frequently and
we support considerably more configurations than we did when the docs
were introduced (e.g. we now have support for 16KiB pages and 52-bit
addressing). Furthermore, the Arm ARM is the definitive resource for the
translation procedure and so there's little point in duplicating part
of that information in the kernel documentation.

Rather than continue trying (and failing) to maintain these diagrams,
let's rip them out. The kernel page-table can be dumped using
CONFIG_PTDUMP_DEBUGFS if necesssary.

Link: https://lore.kernel.org/r/20250102065554.1533781-1-sangmoon.kim@samsung.com
Reported-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Signed-off-by: Will Deacon <will@kernel.org>

perf docs: arm_spe: Document new discard mode

Document the flag along with PMU events to hint what it's used for and
give an example with other useful options to get minimal output.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250108142904.401139-3-james.clark@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>

perf: arm_spe: Add format option for discard mode

FEAT_SPEv1p2 (optional from Armv8.6) adds a discard mode that allows all
SPE data to be discarded rather than written to memory. Add a format
bit for this mode.

If the mode isn't supported, the format bit isn't published and attempts
to use it will result in -EOPNOTSUPP. Allocating an aux buffer is still
allowed even though it won't be written to so that old tools continue to
work, but updated tools can choose to skip this step.

Signed-off-by: James Clark <james.clark@linaro.org>
Reviewd-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20250108142904.401139-2-james.clark@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>

MAINTAINERS: Add perf list for drivers/perf/

drivers/perf/ contains drivers for the perf subsystem, so it makes sense
that the perf list, linux-perf-users@vger.kernel.org, should be
included for perf drivers.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Clark <james.clark@linaro.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250109152811.3402943-1-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Remove duplicate included header

The header asm/unistd_compat_32.h is included whether CONFIG_COMPAT is
defined or not.

Include it only once and remove the following make includecheck warning:

asm/unistd_compat_32.h is included more than once

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250109104636.124507-2-thorsten.blum@linux.dev
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: apple_m1: Map generic branch events

Map the generic perf events for branch prediction stats to the
corresponding hardware events.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Tested-by: Janne Grunau <j@jannau.net>
Link: https://lore.kernel.org/r/20241217212048.3709204-4-oliver.upton@linux.dev
Signed-off-by: Will Deacon <will@kernel.org>

arm64: rsi: Add automatic arm-cca-guest module loading

The TSM module provides guest identification and attestation when a
guest runs in CCA realm mode. By creating a dummy platform device,
let's ensure the module is automatically loaded. The udev daemon loads
the TSM module after it receives a device addition event. Once that
happens, it can be used earlier in the boot process to decrypt the
rootfs.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20241220181236.172060-2-jeremy.linton@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

kselftest/arm64: Add 2024 dpISA extensions to hwcap test

Add coverage of the hwcaps for the 2024 dpISA extensions to the hwcap
test.

We don't actually test SIGILL generation for CMPBR since the need to
branch makes it a pain to generate and the SIGILL detection would be
unreliable anyway. Since this should be very unusual we provide a stub
function rather than supporting a missing test.

The sigill functions aren't well sorted in the file so the ordering is a
bit random.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-5-7578da51fc3d@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1

ID_AA64ISAR3_EL1 is currently marked as unallocated in KVM but does have a
number of bitfields defined in it. Expose FPRCVT and FAMINMAX, two simple
instruction only extensions to guests.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-4-7578da51fc3d@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/hwcap: Describe 2024 dpISA extensions to userspace

The 2024 dpISA introduces a number of architecture features all of which
only add new instructions so only require the addition of hwcaps and ID
register visibility.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-3-7578da51fc3d@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12

DDI0601 2024-12 introduces SME 2.2 as well as a few new optional features,
update sysreg to reflect the changes in ID_AA64SMFR0_EL1 enumerating them.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-2-7578da51fc3d@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented

The hwcaps code that exposes SVE features to userspace only
considers ID_AA64ZFR0_EL1, while this is only valid when
ID_AA64PFR0_EL1.SVE advertises that SVE is actually supported.

The expectations are that when ID_AA64PFR0_EL1.SVE is 0, the
ID_AA64ZFR0_EL1 register is also 0. So far, so good.

Things become a bit more interesting if the HW implements SME.
In this case, a few ID_AA64ZFR0_EL1 fields indicate *SME*
features. And these fields overlap with their SVE interpretations.
But the architecture says that the SME and SVE feature sets must
match, so we're still hunky-dory.

This goes wrong if the HW implements SME, but not SVE. In this
case, we end-up advertising some SVE features to userspace, even
if the HW has none. That's because we never consider whether SVE
is actually implemented. Oh well.

Fix it by restricting all SVE capabilities to ID_AA64PFR0_EL1.SVE
being non-zero. The HWCAPS documentation is amended to reflect the
actually checks performed by the kernel.

Fixes: 06a916feca2b ("arm64: Expose SVE2 features for userspace")
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250107-arm64-2024-dpisa-v5-1-7578da51fc3d@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association

For PMUs with no association, the hisi_pmu->on_cpu is initialized
according to the NUMA locality but use a wrong CPU for the interrupt
affinity. The CPU selected from cpumask_local_spread() can be different
from the CPU by the cpuhp callback. Fix this by setting the IRQ affinity
to hisi_pmu->on_cpu.

Fixes: 6cd137088fdf ("drivers/perf: hisi: Refactor the detection of associated CPUs")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20241223125134.57885-1-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()

In commit 892f7237b3ff ("arm64: Delay initialisation of
cpuinfo_arm64::reg_{zcr,smcr}") we moved access to ZCR, SMCR and SMIDR
later in the boot process in order to ensure that we don't attempt to
interact with them if SVE or SME is disabled on the command line.
Unfortunately when initialising the boot CPU in init_cpu_features() we work
on a copy of the struct cpuinfo_arm64 for the boot CPU used only during
boot, not the percpu copy used by the sysfs code. The expectation of the
feature identification code was that the ID registers would be read in
__cpuinfo_store_cpu() and the values not modified by init_cpu_features().

The main reason for the original change was to avoid early accesses to
ZCR on practical systems that were seen shipping with SVE reported in ID
registers but traps enabled at EL3 and handled as fatal errors, SME was
rolled in due to the similarity with SVE. Since then we have removed the
early accesses to ZCR and SMCR in commits:

abef0695f9665c3d ("arm64/sve: Remove ZCR pseudo register from cpufeature code")
391208485c3ad50f ("arm64/sve: Remove SMCR pseudo register from cpufeature code")

so only the SMIDR_EL1 part of the change remains. Since SMIDR_EL1 is
only trapped via FEAT_IDST and not the SME trap it is less likely to be
affected by similar issues, and the factors that lead to issues with SVE
are less likely to apply to SME.

Since we have not yet seen practical SME systems that need to use a
command line override (and are only just beginning to see SME systems at
all) and the ID register read is much more likely to be safe let's just
store SMIDR_EL1 along with all the other ID register reads in
__cpuinfo_store_cpu().

This issue wasn't apparent when testing on emulated platforms that do not
report values in SMIDR_EL1.

Fixes: 892f7237b3ff ("arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241217-arm64-fix-boot-cpu-smidr-v3-1-7be278a85623@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()

Commit 2045a3b8911b ("mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()")
introduces the vmemmap_check_pmd() while does not verify if the entry is a
section mapping, as is already done for Loongarch & X86.
The update includes a check for pmd_sect(). Only if pmd_sect() returns true,
further vmemmap population for the addr is skipped.

Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250102074047.674156-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Replace open encodings with PXD_TABLE_BIT

[pgd|p4d]_bad() helpers have open encodings for their respective table bits
which can be replaced with corresponding macros. This makes things clearer,
thus improving their readability as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20250107015529.798319-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Rename pte_mkpresent() as pte_mkvalid()

pte_present() is no longer synonymous with pte_valid() as it also tests for
pte_present_invalid() as well. Hence pte_mkpresent() is misleading, because
all that does is make an entry mapped, via setting PTE_VALID. Hence rename
the helper as pte_mkvalid() which reflects its functionality appropriately.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250107023016.829416-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09

DDI0601 2024-09 introduces new features which are enumerated via
ID_AA64ISAR2_EL1, update the sysreg file to reflect these updates.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241211-arm64-2024-dpisa-v4-6-0fd403876df2@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09

DDI0601 2024-09 introduces SVE 2.2 as well as a few new optional features,
update sysreg to reflect the changes in ID_AA64ZFR0_EL1 enumerating them.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241211-arm64-2024-dpisa-v4-4-0fd403876df2@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09

DDI0601 2024-09 defines two new feature flags in ID_AA64FPFR0_EL1
describing new FP8 operations, describe them in sysreg.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241211-arm64-2024-dpisa-v4-3-0fd403876df2@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Update ID_AA64ISAR3_EL1 to DDI0601 2024-09

DDI0601 2024-09 defines several new feature flags in ID_AA64ISAR3_EL1,
update our description in sysreg to reflect these.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241211-arm64-2024-dpisa-v4-2-0fd403876df2@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Update ID_AA64PFR2_EL1 to DDI0601 2024-09

DDI0601 2024-09 defines a new feature flags in ID_AA64PFR2_EL1
describing support for injecting UNDEF exceptions, update sysreg to
include this.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241211-arm64-2024-dpisa-v4-1-0fd403876df2@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Get rid of CPACR_ELx SysregFields

There is no such thing as CPACR_ELx in the architecture.
What we have is CPACR_EL1, for which CPTR_EL12 is an accessor.

Rename CPACR_ELx_* to CPACR_EL1_*, and fix the bit of code using
these names.

Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241219173351.1123087-5-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Convert *_EL12 accessors to Mapping

Perform a bulk convert of the remaining EL12 accessors to use the
Mapping qualifier, which makes things a bit clearer.

Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241219173351.1123087-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Get rid of the TCR2_EL1x SysregFields

TCR2_EL1x is a pretty bizarre construct, as it is shared between
TCR2_EL1 and TCR2_EL12. But the latter is obviously only an
accessor to the former.

In order to make things more consistent, upgrade TCR2_EL1x to
a full-blown sysreg definition for TCR2_EL1, and describe TCR2_EL12
as a mapping to TCR2_EL1.

This results in a couple of minor changes to the actual code.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241219173351.1123087-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Allow a 'Mapping' descriptor for system registers

*EL02 and *_EL12 system registers are actually only accessors for
EL0 and EL1 registers accessed from EL2 when HCR_EL2.E2H==1. They
do not have fields of their own.

To that effect, introduce a 'Mapping' entry, describing which
system register an _EL12 register maps to.

Implementation wise, this is handled the same was as Fields,
which ls only a comment.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241219173351.1123087-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64: Kconfig: force ARM64_PAN=y when enabling TTBR0 sw PAN

There are a couple of instances of Kconfig constraints where PAN must be
enabled too if TTBR0 sw PAN is enabled, primarily to avoid dealing with
the modified TTBR0_EL1 sysreg format that is used when 52-bit physical
addressing and/or CnP are enabled (support for either implies support
for hardware PAN as well, which will supersede PAN emulation if both are
available)

Let's simplify this, and always enable ARM64_PAN when enabling TTBR0 sw
PAN. This decouples the PAN configuration from the VA size selection,
permitting us to simplify the latter in subsequent patches. (Note that
PAN and TTBR0 sw PAN can still be disabled after this patch, but not
independently)

To avoid a convoluted circular Kconfig dependency involving KCSAN, make
ARM64_MTE select ARM64_PAN too, instead of depending on it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241212081841.2168124-13-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/kvm: Avoid invalid physical addresses to signal owner updates

The pKVM stage2 mapping code relies on an invalid physical address to
signal to the internal API that only the annotations of descriptors
should be updated, and these are stored in the high bits of invalid
descriptors covering memory that has been donated to protected guests,
and is therefore unmapped from the host stage-2 page tables.

Given that these invalid PAs are never stored into the descriptors, it
is better to rely on an explicit flag, to clarify the API and to avoid
confusion regarding whether or not the output address of a descriptor
can ever be invalid to begin with (which is not the case with LPA2).

That removes a dependency on the logic that reasons about the maximum PA
range, which differs on LPA2 capable CPUs based on whether LPA2 is
enabled or not, and will be further clarified in subsequent patches.

Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241212081841.2168124-12-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/kvm: Configure HYP TCR.PS/DS based on host stage1

When the host stage1 is configured for LPA2, the value currently being
programmed into TCR_EL2.T0SZ may be invalid unless LPA2 is configured
at HYP as well. This means kvm_lpa2_is_enabled() is not the right
condition to test when setting TCR_EL2.DS, as it will return false if
LPA2 is only available for stage 1 but not for stage 2.

Similary, programming TCR_EL2.PS based on a limited IPA range due to
lack of stage2 LPA2 support could potentially result in problems.

So use lpa2_is_enabled() instead, and set the PS field according to the
host's IPS, which is capped at 48 bits if LPA2 support is absent or
disabled. Whether or not we can make meaningful use of such a
configuration is a different question.

Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241212081841.2168124-11-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Override PARange for !LPA2 and use it consistently

When FEAT_LPA{,2} are not implemented, the ID_AA64MMFR0_EL1.PARange and
TCR.IPS values corresponding with 52-bit physical addressing are
reserved.

Setting the TCR.IPS field to 0b110 (52-bit physical addressing) has side
effects, such as how the TTBRn_ELx.BADDR fields are interpreted, and so
it is important that disabling FEAT_LPA2 (by overriding the
ID_AA64MMFR0.TGran fields) also presents a PARange field consistent with
that.

So limit the field to 48 bits unless LPA2 is enabled, and update
existing references to use the override consistently.

Fixes: 352b0395b505 ("arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs")
Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241212081841.2168124-10-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Reduce PA space to 48 bits when LPA2 is not enabled

Currently, LPA2 kernel support implies support for up to 52 bits of
physical addressing, and this is reflected in global definitions such as
PHYS_MASK_SHIFT and MAX_PHYSMEM_BITS.

This is potentially problematic, given that LPA2 hardware support is
modeled as a CPU feature which can be overridden, and with LPA2 hardware
support turned off, attempting to map physical regions with address bits
[51:48] set (which may exist on LPA2 capable systems booting with
arm64.nolva) will result in corrupted mappings with a truncated output
address and bogus shareability attributes.

This means that the accepted physical address range in the mapping
routines should be at most 48 bits wide when LPA2 support is configured
but not enabled at runtime.

Fixes: 352b0395b505 ("arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs")
Cc: stable@vger.kernel.org
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241212081841.2168124-9-ardb+git@google.com
Signed-off-by: Will Deacon <will@kernel.org>

docs: arm64: Document EL3 requirements for FEAT_PMUv3

This documents EL3 requirements for FEAT_PMUv3. The register field MDCR_EL3
.TPM needs to be cleared for accesses into PMU registers without any trap
being generated into EL3. PMUv3 registers like PMCCFILTR_EL0, PMCCNTR_EL0
PMCNTENCLR_EL0, PMCNTENSET_EL0, PMCR_EL0, PMEVCNTR<n>_EL0, PMEVTYPER<n>_EL0
etc are already being accessed for perf HW PMU implementation.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20241211065425.1106683-3-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

docs: arm64: Document EL3 requirements for cpu debug architecture

This documents EL3 requirements for debug architecture. The register field
MDCR_EL3.TDA needs to be cleared for accesses into debug registers without
any trap being generated into EL3. CPU debug registers like DBGBCR<n>_EL1,
DBGBVR<n>_EL1, DBGWCR<n>_EL1, DBGWVR<n>_EL1 and MDSCR_EL1 are already being
accessed for HW breakpoint, watchpoint and debug monitor implementations on
the platform.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20241211065425.1106683-2-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

perf: imx9_perf: Introduce AXI filter version to refactor the driver and better extension

The imx93 is the first supported DDR PMU that supports read transaction,
write transaction and read beats events which corresponding respecitively
to counter 2, 3 and 4.

However, transaction-based AXI match has low accuracy when get total bits
compared to beats-based. And imx93 doesn't assign AXI_ID to each master.
So axi filter is not used widely on imx93. This could be regards as AXI
filter version 1.

To improve the AXI filter capability, imx95 supports 1 read beats and 3
write beats event which corresponding respecitively to counter 2-5. imx95
also detailed AXI_ID allocation so that most of the master could be count
individually. This could be regards as AXI filter version 2.

This will introduce AXI filter version to refactor the driver and support
better extension, such as coming imx943. This is also a potential fix on
imx91 when configure axi filter.

Fixes: 44798fe136dc ("perf: imx_perf: add support for i.MX91 platform")
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20241212065708.1353513-1-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/arm-cmn: Permit more exhaustive groups

The group validation logic still somewhat assumes the original CMN-600
case of events counting globally, such that if one tries to group 9
events where the first 8 target a single DTC domain, the 9th will be
rejected because *a* DTC domain is full, even though it might only
target other non-overlapping domains and thus still be schedulable.
Improve matters by only counting the DTCs that the new event actually
needs (as arm_cmn_val_add_event() was already clever enough to do).

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/bdfd1e58dac449e407c5cacfd6bf8577dc0a5899.1733943898.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/dwc_pcie: Qualify RAS DES VSEC Capability by Vendor, Revision

PCI Vendor-Specific (VSEC) Capabilities are defined by each vendor.
Devices from different vendors may advertise a VSEC Capability with the DWC
RAS DES functionality, but the vendors may assign different VSEC IDs.

Search for the DWC RAS DES Capability using the VSEC ID and VSEC Rev
chosen by the vendor.

This does not fix a current problem because Alibaba, Ampere, and Qualcomm
all assigned the same VSEC ID and VSEC Rev for the DWC RAS DES Capability.

The potential issue is that we may add support for a device from another
vendor, where the vendor has already assigned DWC_PCIE_VSEC_RAS_DES_ID
(0x02) for an unrelated VSEC. In that event, dwc_pcie_des_cap() would find
the unrelated VSEC and mistakenly assume it was a DWC RAS DES Capability.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-and-tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241209222938.3219364-1-helgaas@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>

arm64/Kconfig: Drop EXECMEM dependency from ARCH_WANTS_EXECMEM_LATE

ARCH_WANTS_EXECMEM_LATE indicates subscribing platform's preference for
EXECMEM late initialisation without creating a new dependency. Hence this
just drops EXECMEM dependency while selecting ARCH_WANTS_EXECMEM_LATE.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20241210043257.715822-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Delete redundant blank line of DDRC PMU

Do not add blank line at the end of a code block defined by braces.

Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-11-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Fix incorrect variable name "hha_pmu" in DDRC PMU driver

In the callback function write_evtype(), the variable name of struct
hisi_pmu should be "ddrc_pmu" instead of "hha_pmu".

Signed-off-by: Junhao He <hejunhao3@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-10-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Export associated CPUs of each PMU through sysfs

Although the event of the uncore PMU can only be opened on a single
CPU, some PMU does have the affinity on a range of CPUs. For example
the L3C PMU is associated to the CPUs sharing the L3T it monitors.
Users may infer this affinity by the PMU name which may have SCCL ID
and CCL ID encoded (for L3C etc), but it's not that straightforward.
So export this information by adding an "associated_cpus" sysfs
attribute then user can get this directly.

Reviewed-by: Jonathan Cameron <Joanthan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-9-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Provide a generic implementation of cpumask/identifier

Each type of HiSilicon Uncore PMU has the following sysfs attributes:

- format: bitmask in perf_event_attr::config[012] of corresponding
attribute
- event: events name and corresponding event code
- cpumask: range of CPUs the events can be opened on
- identifier: the version of this PMU

Different types of PMU have different implementations of the "format"
and "event" but all share the same implementation of the "cpumask"
and "identifier". Thus we can move cpumask and identifier to the
hisi_uncore_pmu framework and drivers can use the generic
implementation.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-8-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Add a common function to retrieve topology from firmware

Currently each type of uncore PMU driver uses almost the same routine
and the same firmware interface (or properties) to retrieve the topology
information, then reset the unused IDs to -1. Extract the common parts
to the framework in hisi_uncore_pmu_init_topology().

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-7-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Extract topology information to a separate structure

HiSilicon Uncore PMUs are identified by the IDs of the topology element
on which the PMUs are located. Add a new separate struct hisi_pmu_toplogy
to encapsulate this information. Add additional documentation on the
meaning of each ID.

- make sccl_id and sicl_id into a union since they're exclusive. It can
  also be accessed by scl_id if the SICL/SCCL distinction is not
  relevant
- make index_id and sub_id signed so -1 may be used to indicate the PMU
  doesn't have this topology element or it could not be retrieved.

This patch should have no functional changes.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-6-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Refactor the detection of associated CPUs

There are two type of PMUs supported currently:
1) PMUs locate on SCCL (Super CPU Cluster [1]), associated with certain
   CCL (CPU cluster [1])(e.g. L3C PMU) or not (e.g. DDRC PMU)
2) PMUs locate on the SICL (Super IO Cluster [1]), which has no
   association with certain CPU topology (e.g. CPA PMU)

Currently the associated CPUs of the PMU is detected in the cpuhp online
callback as below:
- for type 1) the CPUs match PMU's sccl_id and ccl_id
- for type 2) PMU's sccl_id is -1 and all online CPUs will be associated

Since uncore PMUs are not bound to certain CPU context and event could be
counting started by any online CPU, the associated CPUs are just a
preference. Below disadvantages are observed in current implementation:
- the PMU cannot be used if its associated CPUs are offline
- SICL PMUs are associated to all the online CPUs implicitly without
  the consideration of locality

So refactor the way we detect the associated CPUs in below aspects:
- add a clear definition of hisi_pmu::associated_cpus
- initialize hisi_pmu::on_cpu based on locality if no associated CPU
  found, otherwise update it from associated CPUs
- drop the detection with a sccl_id of -1 for SICL PMUs

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/perf/hisi-pmu.rst?h=v6.12-rc1

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-5-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Migrate to one online CPU if no associated one online

If the selected CPU hisi_pmu::on_cpu goes offline, driver will select
a new online CPU from hisi_pmu::associated_cpus, or if no online CPU
found the PMU context won't be migrated. However for uncore PMUs the
associated CPUs are just a peference and it also works to schedule
the events on any online CPUs. So add a fallback to choose an online
CPU if no associated CPUs found.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Don't update the associated_cpus on CPU offline

Event will be scheduled on CPU of hisi_pmu::on_cpu which is selected
from the intersection of hisi_pmu::associated_cpus and online CPUs.
So the associated_cpus don't need to be maintained with online CPUs.
This will save one update operation if one associated CPU is offlined.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

drivers/perf: hisi: Define a symbol namespace for HiSilicon Uncore PMUs

The HiSilicon Uncore PMU framework implements some common functions
and exported them to the drivers. Use a specific HISI_PMU namespace
for the exported symbols to avoid pollute the generic ones.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20241210141525.37788-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/cpufeature: Refactor conditional logic in init_cpu_ftr_reg()

Unnecessarily checks ftr_ovr == tmp in an extra else if, which is not
needed because that condition would already be true by default if the
previous conditions are not satisfied.

if (ftr_ovr != tmp) {
} else if (ftr_new != tmp) {
} else if (ftr_ovr == tmp) {

Logic: The first and last conditions are inverses of each other, so
the last condition must be true if the first two conditions are false.

Additionally, all branches set the variable str, making the subsequent
"if (str)" check redundant

Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io>
Link: https://lore.kernel.org/r/20241115053740.20523-1-hardevsinh.palaniya@siliconsignals.io
Signed-off-by: Will Deacon <will@kernel.org>

arm64: cpufeature: Add HAFT to cpucap_is_possible()

For consistency with other cpucaps, handle the configuration check for
ARM64_HAFT in cpucap_is_possible() rather than this being explicit in
system_supports_haft(). The configuration check will now happen
implicitly as cpus_have_final_cap() uses cpucap_is_possible() via
alternative_has_cap_unlikely().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20241209155948.2124393-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64: asm: Fix typo in pgtable.h

The word 'trasferring' is wrong, so fix it.

Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20241203093323.7831-1-zhujun2@cmss.chinamobile.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Ensure adequate HUGE_MAX_HSTATE

This asserts that HUGE_MAX_HSTATE is sufficient enough preventing potential
hugetlb_max_hstate runtime overflow in hugetlb_add_hstate() thus triggering
a BUG_ON() there after.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20241202064407.53807-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Replace open encodings with PXD_TABLE_BIT

[pgd|p4d]_bad() helpers have open encodings for their respective table bits
which can be replaced with corresponding macros. This makes things clearer,
thus improving their readability as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20241202083850.73207-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

arm64/mm: Drop INIT_MM_CONTEXT()

Platform override for INIT_MM_CONTEXT() is redundant because swapper_pg_dir
always gets assigned as the pgd during init_mm initialization. So just drop
this override on arm64.

Originally this override was added via the 'commit 2b5548b68199 ("arm64/mm:
Separate boot-time page tables from swapper_pg_dir")' because non standard
init_pg_dir was assigned as the pgd. Subsequently it was changed as default
swapper_pg_dir by the 'commit ba5b0333a847 ("arm64: mm: omit redundant
remap of kernel image")', which might have also just dropped this override.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20241202043553.29592-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/marvell: Odyssey LLC-TAD performance monitor support

Each TAD provides eight 64-bit counters for monitoring
cache behavior.The driver always configures the same counter for
all the TADs. The user would end up effectively reserving one of
eight counters in every TAD to look across all TADs.
The occurrences of events are aggregated and presented to the user
at the end of running the workload. The driver does not provide a
way for the user to partition TADs so that different TADs are used for
different applications.

The performance events reflect various internal or interface activities.
By combining the values from multiple performance counters, cache
performance can be measured in terms such as: cache miss rate, cache
allocations, interface retry rate, internal resource occupancy, etc.

Each supported counter's event and formatting information is exposed
to sysfs at /sys/devices/tad/. Use perf tool stat command to measure
the pmu events. For instance:

perf stat -e tad_hit_ltg,tad_hit_dtg <workload>

Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-6-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/marvell: Refactor to extract platform data

Refactor the Marvell TAD PMU driver to add versioning to the
existing driver.

Make no functional changes, the behavior and performance
of the driver remain unchanged.

Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-5-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/marvell: Odyssey DDR Performance monitor support

Odyssey DRAM Subsystem supports eight counters for monitoring performance
and software can program those counters to monitor any of the defined
performance events. Supported performance events include those counted
at the interface between the DDR controller and the PHY, interface between
the DDR Controller and the CHI interconnect, or within the DDR Controller.

Additionally DSS also supports two fixed performance event counters, one
for ddr reads and the other for ddr writes.

Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-4-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/marvell: Refactor to extract PMU operations

Introduce a refactor to the Marvell DDR PMU driver to extract
PMU operations ("pmu ops") from the existing driver.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-3-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/marvell: Refactor to extract platform data

Introduce a refactor to the Marvell DDR pmu driver to extract platform
data ("pdata") from the existing driver. Prepare for the upcoming
support of the next version of the Performance Monitoring Unit (PMU) in
this driver.

Make no functional changes, this refactor solely improves code
organization and prepares for future enhancements.

While at it, fix a typo.

Signed-off-by: Gowthami Thiagarajan <gthiagarajan@marvell.com>
Link: https://lore.kernel.org/r/20241108040619.753343-2-gthiagarajan@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>

Documentation: dwc_pcie_pmu: Fix the mnemonics and eventid

Fix the event id and type in the example. In addition, the recent fix,
which addressed the mnemonics with mixed case, didn't fix the document.
Match the names with the driver.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241205061914.5568-3-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>

perf/dwc_pcie: Fix the event numbers

According to Databook, L1 aux is event number 0x08 and
TX L0s and RX L0S is 0x09. Fix the event numbers for the
two events.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241205061914.5568-2-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>

perf: arm_cspmu: nvidia: monitor all ports by default

Some NVIDIA PMUs like the NVLINK-C2C, CNVLINK, and PCIE PMU provide
port filtering. If the port filter is set to zero, the counter of
these PMUs will not capture any event. To avoid meaningless
experiment, the driver sets the port filter value to a default
non-zero value.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-5-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>

perf: arm_cspmu: nvidia: enable NVLINK-C2C port filtering

Enable NVLINK-C2C port filtering to distinguish traffic from
different GPUs connected to NVLINK-C2C.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-4-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>

perf: arm_cspmu: nvidia: fix sysfs path in the kernel doc

Fix typos to the sysfs path referenced by NVIDIA
uncore pmu kernel doc.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-3-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>

perf: arm_cspmu: nvidia: remove unsupported SCF events

Remove unsupported events under SCF PMU.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Link: https://lore.kernel.org/r/20241031142118.1865965-2-bwicaksono@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>

Linux 6.13-rc2

Merge tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix a section mismatch warning in modpost

- Fix Debian package build error with the O= option

* tag 'kbuild-fixes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: deb-pkg: fix build error with O=
modpost: Add .irqentry.text to OTHER_SECTIONS

Merge tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

- Fix a /proc/interrupts formatting regression

- Have the BCM2836 interrupt controller enter power management states
   properly

- Other fixlets

* tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing
  genirq/proc: Add missing space separator back
  irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
  irqchip/gic-v3: Fix irq_complete_ack() comment

Merge tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

- Handle the case where clocksources with small counter width can,
   in conjunction with overly long idle sleeps, falsely trigger the
   negative motion detection of clocksources

* tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Make negative motion detection more robust

Merge tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Have the Automatic IBRS setting check on AMD does not falsely fire in
   the guest when it has been set already on the host

- Make sure cacheinfo structures memory is allocated to address a boot
   NULL ptr dereference on Intel Meteor Lake which has different numbers
   of subleafs in its CPUID(4) leaf

- Take care of the GDT restoring on the kexec path too, as expected by
   the kernel

- Make sure SMP is not disabled when IO-APIC is disabled on the kernel
   cmdline

- Add a PGD flag _PAGE_NOPTISHADOW to instruct machinery not to
   propagate changes to the kernelmode page tables, to the user portion,
   in PTI

- Mark Intel Lunar Lake as affected by an issue where MONITOR wakeups
   can get lost and thus user-visible delays happen

- Make sure PKRU is properly restored with XRSTOR on AMD after a PRKU
   write of 0 (WRPKRU) which will mark PKRU in its init state and thus
   lose the actual buffer

* tag 'x86_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
  x86/cacheinfo: Delete global num_cache_leaves
  cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
  x86/kexec: Restore GDT on return from ::preserve_context kexec
  x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
  x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables
  x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
  x86/pkeys: Ensure updated PKRU value is XRSTOR'd
  x86/pkeys: Change caller of update_pkru_in_sigframe()

Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"24 hotfixes.  17 are cc:stable.  15 are MM and 9 are non-MM.

  The usual bunch of singletons - please see the relevant changelogs for
  details"

* tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
  iio: magnetometer: yas530: use signed integer type for clamp limits
  sched/numa: fix memory leak due to the overwritten vma->numab_state
  mm/damon: fix order of arguments in damos_before_apply tracepoint
  lib: stackinit: hide never-taken branch from compiler
  mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio()
  scatterlist: fix incorrect func name in kernel-doc
  mm: correct typo in MMAP_STATE() macro
  mm: respect mmap hint address when aligning for THP
  mm: memcg: declare do_memsw_account inline
  mm/codetag: swap tags when migrate pages
  ocfs2: update seq_file index in ocfs2_dlm_seq_next
  stackdepot: fix stack_depot_save_flags() in NMI context
  mm: open-code page_folio() in dump_page()
  mm: open-code PageTail in folio_flags() and const_folio_flags()
  mm: fix vrealloc()'s KASAN poisoning logic
  Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
  selftests/damon: add _damon_sysfs.py to TEST_FILES
  selftest: hugetlb_dio: fix test naming
  ocfs2: free inode when ocfs2_get_init_inode() fails
  nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
  ...

kbuild: deb-pkg: fix build error with O=

Since commit 13b25489b6f8 ("kbuild: change working directory to external
module directory with M="), the Debian package build fails if a relative
path is specified with the O= option.

  $ make O=build bindeb-pkg
    [ snip ]
  dpkg-deb: building package 'linux-image-6.13.0-rc1' in '../linux-image-6.13.0-rc1_6.13.0-rc1-6_amd64.deb'.
  Rebuilding host programs with x86_64-linux-gnu-gcc...
  make[6]: Entering directory '/home/masahiro/linux/build'
  /home/masahiro/linux/Makefile:190: *** specified kernel directory "build" does not exist.  Stop.

This occurs because the sub_make_done flag is cleared, even though the
working directory is already in the output directory.

Passing KBUILD_OUTPUT=. resolves the issue.

Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=")
Reported-by: Charlie Jenkins <charlie@rivosinc.com>
Closes: https://lore.kernel.org/all/Z1DnP-GJcfseyrM3@ghost/
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

modpost: Add .irqentry.text to OTHER_SECTIONS

The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:

  WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...

  The relocation at __ex_table+0x447c references section ".irqentry.text"
  which is not in the list of authorized sections.

Add .irqentry.text to OTHER_SECTIONS to cure the issue.

Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # needed for linux-5.4-y
Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

Merge tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- DFS fix (for race with tree disconnect and dfs cache worker)

- Four fixes for SMB3.1.1 posix extensions:
      - improve special file support e.g. to Samba, retrieving the file
        type earlier
      - reduce roundtrips (e.g. on ls -l, in some cases)

* tag '6.13-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: fix potential race in cifs_put_tcon()
  smb3.1.1: fix posix mounts to older servers
  fs/smb/client: cifs_prime_dcache() for SMB3 POSIX reparse points
  fs/smb/client: Implement new SMB3 POSIX type
  fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Large number of small fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
  scsi: scsi_debug: Fix hrtimer support for ndelay
  scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
  scsi: ufs: core: Add missing post notify for power mode change
  scsi: sg: Fix slab-use-after-free read in sg_release()
  scsi: ufs: core: sysfs: Prevent div by zero
  scsi: qla2xxx: Update version to 10.02.09.400-k
  scsi: qla2xxx: Supported speed displayed incorrectly for VPorts
  scsi: qla2xxx: Fix NVMe and NPIV connect issue
  scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cnt
  scsi: qla2xxx: Fix use after free on unload
  scsi: qla2xxx: Fix abort in bsg timeout
  scsi: mpi3mr: Update driver version to 8.12.0.3.50
  scsi: mpi3mr: Handling of fault code for insufficient power
  scsi: mpi3mr: Start controller indexing from 0
  scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
  scsi: mpi3mr: Synchronize access to ioctl data buffer
  scsi: mpt3sas: Update driver version to 51.100.00.00
  scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
  scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()
  scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove()
  ...

Merge tag 'block-6.13-20241207' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Keith:
      - Target fix using incorrect zero buffer (Nilay)
      - Device specifc deallocate quirk fixes (Christoph, Keith)
      - Fabrics fix for handling max command target bugs (Maurizio)
      - Cocci fix usage for kzalloc (Yu-Chen)
      - DMA size fix for host memory buffer feature (Christoph)
      - Fabrics queue cleanup fixes (Chunguang)

- CPU hotplug ordering fixes

- Add missing MODULE_DESCRIPTION for rnull

- bcache error value fix

- virtio-blk queue freeze fix

* tag 'block-6.13-20241207' of git://git.kernel.dk/linux:
  blk-mq: move cpuhp callback registering out of q->sysfs_lock
  blk-mq: register cpuhp callback after hctx is added to xarray table
  virtio-blk: don't keep queue frozen during system suspend
  nvme-tcp: simplify nvme_tcp_teardown_io_queues()
  nvme-tcp: no need to quiesce admin_q in nvme_tcp_teardown_io_queues()
  nvme-rdma: unquiesce admin_q before destroy it
  nvme-tcp: fix the memleak while create new ctrl failed
  nvme-pci: don't use dma_alloc_noncontiguous with 0 merge boundary
  nvmet: replace kmalloc + memset with kzalloc for data allocation
  nvme-fabrics: handle zero MAXCMD without closing the connection
  bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again
  nvme-pci: remove two deallocate zeroes quirks
  block: rnull: add missing MODULE_DESCRIPTION
  nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
  nvmet: use kzalloc instead of ZERO_PAGE in nvme_execute_identify_ns_nvm()

Merge tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"A single fix for a parameter type which affects 32-bit"

* tag 'io_uring-6.13-20241207' of git://git.kernel.dk/linux:
io_uring: Change res2 parameter type in io_uring_cmd_done

Merge tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull jffs2 fix from Richard Weinberger:

- Fixup rtime compressor bounds checking

* tag 'ubifs-for-linus-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: Fix rtime decompressor

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann::

- Fix several issues for BPF LPM trie map which were found by syzbot
   and during addition of new test cases (Hou Tao)

- Fix a missing process_iter_arg register type check in the BPF
   verifier (Kumar Kartikeya Dwivedi, Tao Lyu)

- Fix several correctness gaps in the BPF verifier when interacting
   with the BPF stack without CAP_PERFMON (Kumar Kartikeya Dwivedi,
   Eduard Zingerman, Tao Lyu)

- Fix OOB BPF map writes when deleting elements for the case of xsk map
   as well as devmap (Maciej Fijalkowski)

- Fix xsk sockets to always clear DMA mapping information when
   unmapping the pool (Larysa Zaremba)

- Fix sk_mem_uncharge logic in tcp_bpf_sendmsg to only uncharge after
   sent bytes have been finalized (Zijian Zhang)

- Fix BPF sockmap with vsocks which was missing a queue check in poll
   and sockmap cleanup on close (Michal Luczaj)

- Fix tools infra to override makefile ARCH variable if defined but
   empty, which addresses cross-building tools. (Björn Töpel)

- Fix two resolve_btfids build warnings on unresolved bpf_lsm symbols
   (Thomas Weißschuh)

- Fix a NULL pointer dereference in bpftool (Amir Mohammadi)

- Fix BPF selftests to check for CONFIG_PREEMPTION instead of
   CONFIG_PREEMPT (Sebastian Andrzej Siewior)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (31 commits)
  selftests/bpf: Add more test cases for LPM trie
  selftests/bpf: Move test_lpm_map.c to map_tests
  bpf: Use raw_spinlock_t for LPM trie
  bpf: Switch to bpf mem allocator for LPM trie
  bpf: Fix exact match conditions in trie_get_next_key()
  bpf: Handle in-place update for full LPM trie correctly
  bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
  bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
  bpf: Remove unnecessary check when updating LPM trie
  selftests/bpf: Add test for narrow spill into 64-bit spilled scalar
  selftests/bpf: Add test for reading from STACK_INVALID slots
  selftests/bpf: Introduce __caps_unpriv annotation for tests
  bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots
  bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc
  samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS
  bpf: Zero index arg error string for dynptr and iter
  selftests/bpf: Add tests for iter arg check
  bpf: Ensure reg is PTR_TO_STACK in process_iter_arg
  tools: Override makefile ARCH variable if defined, but empty
  selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap
  ...

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
"Nothing major, some left-overs from the recent merging window (MTE,
  coco) and some newly found issues like the ptrace() ones.

   - MTE/hugetlbfs:

      - Set VM_MTE_ALLOWED in the arch code and remove it from the core
        code for hugetlbfs mappings

      - Fix copy_highpage() warning when the source is a huge page but
        not MTE tagged, taking the wrong small page path

   - drivers/virt/coco:

      - Add the pKVM and Arm CCA drivers under the arm64 maintainership

      - Fix the pkvm driver to fall back to ioremap() (and warn) if the
        MMIO_GUARD hypercall fails

      - Keep the Arm CCA driver default 'n' rather than 'm'

   - A series of fixes for the arm64 ptrace() implementation,
     potentially leading to the kernel consuming uninitialised stack
     variables when PTRACE_SETREGSET is invoked with a length of 0

   - Fix zone_dma_limit calculation when RAM starts below 4GB and
     ZONE_DMA is capped to this limit

   - Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a
     call to page_to_phys() (from patch_map()) which checks pfn_valid()
     before vmemmap has been set up

   - Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI
     ops when the kernel assumes 8-bit ASIDs but running under a
     hypervisor on a system that implements 16-bit ASIDs (found running
     Linux under Parallels on Apple M4)

   - ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it
     is using the same SMMU PMCG as HIP09 and suffers from the same
     errata

   - Add GCS to cpucap_is_possible(), missed in the recent merge"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
  arm64: ptrace: fix partial SETREGSET for NT_ARM_POE
  arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
  arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
  arm64: cpufeature: Add GCS to cpucap_is_possible()
  coco: virt: arm64: Do not enable cca guest driver by default
  arm64: mte: Fix copy_highpage() warning on hugetlb folios
  arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDs
  ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
  MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry
  drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
  arm64: patching: avoid early page_to_phys()
  arm64: mm: Fix zone_dma_limit calculation
  arm64: mte: set VM_MTE_ALLOWED for hugetlbfs at correct place

Merge tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:
"Restore check for node validity in arch_numa.

  The rework of NUMA initialization in arch_numa dropped a check that
  refused to accept configurations with invalid node IDs.

  Restore that check to ensure that when firmware passes invalid nodes,
  such configuration is rejected and kernel gracefully falls back to
  dummy NUMA"

* tag 'fixes-2024-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  arch_numa: Restore nid checks before registering a memblock with a node
  memblock: allow zero threshold in validate_numa_converage()

Merge tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel

Pull more drm fixes from Simona Vetter:
"Due to mailing list unreliability we missed the amdgpu pull, hence
  part two with that now included:

   - amdgu: mostly display fixes + jpeg vcn 1.0, sriov, dcn4.0 resume
     fixes

   - amdkfd fixes"

* tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu: rework resume handling for display (v2)
  drm/amd/pm: fix and simplify workload handling
  Revert "drm/amd/pm: correct the workload setting"
  drm/amdgpu: fix sriov reinit late orders
  drm/amdgpu: Fix ISP hw init issue
  drm/amd/display: Add hblank borrowing support
  drm/amd/display: Limit VTotal range to max hw cap minus fp
  drm/amd/display: Correct prefetch calculation
  drm/amd/display: Add option to retrieve detile buffer size
  drm/amd/display: Add a left edge pixel if in YCbCr422 or YCbCr420 and odm
  drm/amdkfd: hard-code cacheline for gc943,gc944
  drm/amdkfd: add MEC version that supports no PCIe atomics for GFX12
  drm/amd/display: Fix programming backlight on OLED panels
  drm/amd: Sanity check the ACPI EDID
  drm/amdgpu/hdp7.0: do a posting read when flushing HDP
  drm/amdgpu/hdp6.0: do a posting read when flushing HDP
  drm/amdgpu/hdp5.2: do a posting read when flushing HDP
  drm/amdgpu/hdp5.0: do a posting read when flushing HDP
  drm/amdgpu/hdp4.0: do a posting read when flushing HDP
  drm/amdgpu/jpeg1.0: fix idle work handler

Merge tag 'amd-drm-fixes-6.13-2024-12-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.13-2024-12-04:

amdgpu:
- Jpeg work handler fix for VCN 1.0
- HDP flush fixes
- ACPI EDID sanity check
- OLED panel backlight fix
- DC YCbCr fix
- DC Detile buffer size debugging
- DC prefetch calculation fix
- DC VTotal handling fix
- DC HBlank fix
- ISP fix
- SR-IOV fix
- Workload profile fixes
- DCN 4.0.1 resume fix

amdkfd:
- GC 12.x fix
- GC 9.4.x fix

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241206190452.2571042-1-alexander.deucher@amd.com

Merge tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Pretty quiet week which is probably expected after US holidays, the
  dma-fence and displayport MST message handling fixes make up the bulk
  of this, along with a couple of minor xe and other driver fixes.

  dma-fence:
   - Fix reference leak on fence-merge failure path
   - Simplify fence merging with kernel's sort()
   - Fix dma_fence_array_signaled() to ensure forward progress

  dp_mst:
   - Fix MST sideband message body length check
   - Fix a bunch of locking/state handling with DP MST msgs

  sti:
   - Add __iomem for mixer_dbg_mxn()'s parameter

  xe:
   - Missing init value and 64-bit write-order check
   - Fix a memory allocation issue causing lockdep violation

  v3d:
   - Performance counter fix"

* tag 'drm-fixes-2024-12-07' of https://gitlab.freedesktop.org/drm/kernel:
  drm/v3d: Enable Performance Counters before clearing them
  drm/dp_mst: Use reset_msg_rx_state() instead of open coding it
  drm/dp_mst: Reset message rx state after OOM in drm_dp_mst_handle_up_req()
  drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
  drm/dp_mst: Fix down request message timeout handling
  drm/dp_mst: Simplify error path in drm_dp_mst_handle_down_rep()
  drm/dp_mst: Verify request type in the corresponding down message reply
  drm/dp_mst: Fix resetting msg rx state after topology removal
  drm/xe: Move the coredump registration to the worker thread
  drm/xe/guc: Fix missing init value and add register order check
  drm/sti: Add __iomem for mixer_dbg_mxn's parameter
  drm/dp_mst: Fix MST sideband message body length check
  dma-buf: fix dma_fence_array_signaled v4
  dma-fence: Use kernel's sort for merging fences
  dma-fence: Fix reference leak on fence merge failure path

Merge tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes that have been gathered in the week.

   - Fix the missing XRUN handling in USB-audio low latency mode

   - Fix regression by the previous USB-audio hadening change

   - Clean up old SH sound driver to use the standard helpers

   - A few further fixes for MIDI 2.0 UMP handling

   - Various HD-audio and USB-audio quirks

   - Fix jack handling at PM on ASoC Intel AVS

   - Misc small fixes for ASoC SOF and Mediatek"

* tag 'sound-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"
  ASoC: mediatek: mt8188-mt6359: Remove hardcoded dmic codec
  ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops
  ALSA: usb-audio: Add extra PID for RME Digiface USB
  ALSA: usb-audio: Fix a DMA to stack memory bug
  ASoC: SOF: ipc3-topology: fix resource leaks in sof_ipc3_widget_setup_comp_dai()
  ALSA: hda/realtek: Add support for Samsung Galaxy Book3 360 (NP730QFG)
  ASoC: Intel: avs: da7219: Remove suspend_pre() and resume_post()
  ALSA: hda/tas2781: Fix error code tas2781_read_acpi()
  ALSA: hda/realtek: Enable mute and micmute LED on HP ProBook 430 G8
  ALSA: usb-audio: add mixer mapping for Corsair HS80
  ALSA: ump: Shut up truncated string warning
  ALSA: sh: Use standard helper for buffer accesses
  ALSA: usb-audio: Notify xrun for low-latency mode
  ALSA: hda/conexant: fix Z60MR100 startup pop issue
  ALSA: ump: Update legacy substream names upon FB info update
  ALSA: ump: Indicate the inactive group in legacy substream names
  ALSA: ump: Don't open legacy substream for an inactive group
  ALSA: seq: ump: Fix seq port updates per FB info notify

Merge tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"A couple of small fixes, fixing an incorrect format specifier in a log
  message and adding missing cleanup of the devres data used to support
  dev_get_regmap() when a device is unregistered"

* tag 'regmap-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: detach regmap from dev on regmap_exit
  regmap: Use correct format specifier for logging range errors

Merge tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few small driver specific fixes and device ID updates for SPI.

  The Apple change flags the driver as being compatible with the core's
  GPIO chip select support, fixing support for some systems"

* tag 'spi-fix-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
  spi: intel: Add Panther Lake SPI controller support
  spi: apple: Set use_gpio_descriptors to true
  spi: mpc52xx: Add cancel_work_sync before module remove

Merge tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"Core:
   - Further prevent card detect during shutdown

  Host drivers:
   - sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10
     tablet"

* tag 'mmc-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Further prevent card detect during shutdown
  mmc: sdhci-pci: Add DMI quirk for missing CD GPIO on Vexia Edu Atla 10 tablet

Merge tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain fixes from Ulf Hansson:
"Core:
   - Fix a couple of memory-leaks during genpd init/remove

  Providers:
   - imx: Adjust delay for gpcv2 to fix power up handshake
   - mediatek: Fix DT bindings by adding another nested power-domain
     layer"

* tag 'pmdomain-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
  pmdomain: imx: gpcv2: Adjust delay after power up handshake
  pmdomain: core: Fix error path in pm_genpd_init() when ida alloc fails
  pmdomain: core: Add missing put_device()
  dt-bindings: power: mediatek: Add another nested power-domain layer

x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails

When ensuring EFER.AUTOIBRS is set, WARN only on a negative return code
from msr_set_bit(), as '1' is used to indicate the WRMSR was successful
('0' indicates the MSR bit was already set).

Fixes: 8cc68c9c9e92 ("x86/CPU/AMD: Make sure EFER[AIBRSE] is set")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Z1MkNofJjt7Oq0G6@google.com
Closes: https://lore.kernel.org/all/20241205220604.GA2054199@thelio-3990X

Merge branch 'fixes-for-lpm-trie'

Hou Tao says:

====================
This patch set fixes several issues for LPM trie. These issues were
found during adding new test cases or were reported by syzbot.

The patch set is structured as follows:

Patch #1~#2 are clean-ups for lpm_trie_update_elem().
Patch #3 handles BPF_EXIST and BPF_NOEXIST correctly for LPM trie.
Patch #4 fixes the accounting of n_entries when doing in-place update.
Patch #5 fixes the exact match condition in trie_get_next_key() and it
may skip keys when the passed key is not found in the map.
Patch #6~#7 switch from kmalloc() to bpf memory allocator for LPM trie
to fix several lock order warnings reported by syzbot. It also enables
raw_spinlock_t for LPM trie again. After these changes, the LPM trie will
be closer to being usable in any context (though the reentrance check of
trie->lock is still missing, but it is on my todo list).
Patch #8: move test_lpm_map to map_tests to make it run regularly.
Patch #9: add test cases for the issues fixed by patch #3~#5.

Please see individual patches for more details. Comments are always
welcome.

Change Log:
v3:
  * patch #2: remove the unnecessary NULL-init for im_node
  * patch #6: alloc the leaf node before disabling IRQ to low
    the possibility of -ENOMEM when leaf_size is large; Free
    these nodes outside the trie lock (Suggested by Alexei)
  * collect review and ack tags (Thanks for Toke & Daniel)

v2: https://lore.kernel.org/bpf/20241127004641.1118269-1-houtao@huaweicloud.com/
  * collect review tags (Thanks for Toke)
  * drop "Add bpf_mem_cache_is_mergeable() helper" patch
  * patch #3~#4: add fix tag
  * patch #4: rename the helper to trie_check_add_elem() and increase
    n_entries in it.
  * patch #6: use one bpf mem allocator and update commit message to
    clarify that using bpf mem allocator is more appropriate.
  * patch #7: update commit message to add the possible max running time
    for update operation.
  * patch #9: update commit message to specify the purpose of these test
    cases.

v1: https://lore.kernel.org/bpf/20241118010808.2243555-1-houtao@huaweicloud.com/
====================

Link: https://lore.kernel.org/all/20241206110622.1161752-1-houtao@huaweicloud.com/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add more test cases for LPM trie

Add more test cases for LPM trie in test_maps:

1) test_lpm_trie_update_flags
It constructs various use cases for BPF_EXIST and BPF_NOEXIST and check
whether the return value of update operation is expected.

2) test_lpm_trie_update_full_maps
It tests the update operations on a full LPM trie map. Adding new node
will fail and overwriting the value of existed node will succeed.

3) test_lpm_trie_iterate_strs and test_lpm_trie_iterate_ints
There two test cases test whether the iteration through get_next_key is
sorted and expected. These two test cases delete the minimal key after
each iteration and check whether next iteration returns the second
minimal key. The only difference between these two test cases is the
former one saves strings in the LPM trie and the latter saves integers.
Without the fix of get_next_key, these two cases will fail as shown
below:
test_lpm_trie_iterate_strs(1091):FAIL:iterate #2 got abc exp abS
test_lpm_trie_iterate_ints(1142):FAIL:iterate #1 got 0x2 exp 0x1

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Move test_lpm_map.c to map_tests

Move test_lpm_map.c to map_tests/ to include LPM trie test cases in
regular test_maps run. Most code remains unchanged, including the use of
assert(). Only reduce n_lookups from 64K to 512, which decreases
test_lpm_map runtime from 37s to 0.7s.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Use raw_spinlock_t for LPM trie

After switching from kmalloc() to the bpf memory allocator, there will be
no blocking operation during the update of LPM trie. Therefore, change
trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in
atomic context, even on RT kernels.

The max value of prefixlen is 2048. Therefore, update or deletion
operations will find the target after at most 2048 comparisons.
Constructing a test case which updates an element after 2048 comparisons
under a 8 CPU VM, and the average time and the maximal time for such
update operation is about 210us and 900us.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Switch to bpf mem allocator for LPM trie

Multiple syzbot warnings have been reported. These warnings are mainly
about the lock order between trie->lock and kmalloc()'s internal lock.
See report [1] as an example:

======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted
------------------------------------------------------
syz.3.2069/15008 is trying to acquire lock:
ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ...

but task is already holding lock:
ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ...

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&trie->lock){-.-.}-{2:2}:
       __raw_spin_lock_irqsave
       _raw_spin_lock_irqsave+0x3a/0x60
       trie_delete_elem+0xb0/0x820
       ___bpf_prog_run+0x3e51/0xabd0
       __bpf_prog_run32+0xc1/0x100
       bpf_dispatcher_nop_func
       ......
       bpf_trace_run2+0x231/0x590
       __bpf_trace_contention_end+0xca/0x110
       trace_contention_end.constprop.0+0xea/0x170
       __pv_queued_spin_lock_slowpath+0x28e/0xcc0
       pv_queued_spin_lock_slowpath
       queued_spin_lock_slowpath
       queued_spin_lock
       do_raw_spin_lock+0x210/0x2c0
       __raw_spin_lock_irqsave
       _raw_spin_lock_irqsave+0x42/0x60
       __put_partials+0xc3/0x170
       qlink_free
       qlist_free_all+0x4e/0x140
       kasan_quarantine_reduce+0x192/0x1e0
       __kasan_slab_alloc+0x69/0x90
       kasan_slab_alloc
       slab_post_alloc_hook
       slab_alloc_node
       kmem_cache_alloc_node_noprof+0x153/0x310
       __alloc_skb+0x2b1/0x380
       ......

-> #0 (&n->list_lock){-.-.}-{2:2}:
       check_prev_add
       check_prevs_add
       validate_chain
       __lock_acquire+0x2478/0x3b30
       lock_acquire
       lock_acquire+0x1b1/0x560
       __raw_spin_lock_irqsave
       _raw_spin_lock_irqsave+0x3a/0x60
       get_partial_node.part.0+0x20/0x350
       get_partial_node
       get_partial
       ___slab_alloc+0x65b/0x1870
       __slab_alloc.constprop.0+0x56/0xb0
       __slab_alloc_node
       slab_alloc_node
       __do_kmalloc_node
       __kmalloc_node_noprof+0x35c/0x440
       kmalloc_node_noprof
       bpf_map_kmalloc_node+0x98/0x4a0
       lpm_trie_node_alloc
       trie_update_elem+0x1ef/0xe00
       bpf_map_update_value+0x2c1/0x6c0
       map_update_elem+0x623/0x910
       __sys_bpf+0x90c/0x49a0
       ...

other info that might help us debug this:

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&trie->lock);
                               lock(&n->list_lock);
                               lock(&trie->lock);
  lock(&n->list_lock);

*** DEADLOCK ***

[1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7

A bpf program attached to trace_contention_end() triggers after
acquiring &n->list_lock. The program invokes trie_delete_elem(), which
then acquires trie->lock. However, it is possible that another
process is invoking trie_update_elem(). trie_update_elem() will acquire
trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke
get_partial_node() and try to acquire &n->list_lock (not necessarily the
same lock object). Therefore, lockdep warns about the circular locking
dependency.

Invoking kmalloc() before acquiring trie->lock could fix the warning.
However, since BPF programs call be invoked from any context (e.g.,
through kprobe/tracepoint/fentry), there may still be lock ordering
problems for internal locks in kmalloc() or trie->lock itself.

To eliminate these potential lock ordering problems with kmalloc()'s
internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent
BPF memory allocator APIs that can be invoked in any context. The lock
ordering problems with trie->lock (e.g., reentrance) will be handled
separately.

Three aspects of this change require explanation:

1. Intermediate and leaf nodes are allocated from the same allocator.
Since the value size of LPM trie is usually small, using a single
alocator reduces the memory overhead of the BPF memory allocator.

2. Leaf nodes are allocated before disabling IRQs. This handles cases
where leaf_size is large (e.g., > 4KB - 8) and updates require
intermediate node allocation. If leaf nodes were allocated in
IRQ-disabled region, the free objects in BPF memory allocator would not
be refilled timely and the intermediate node allocation may fail.

3. Paired migrate_{disable|enable}() calls for node alloc and free. The
BPF memory allocator uses per-CPU struct internally, these paired calls
are necessary to guarantee correctness.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>