]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
arm64: Use load LSE atomics for the non-return per-CPU atomic operations
authorCatalin Marinas <catalin.marinas@arm.com>
Thu, 6 Nov 2025 15:52:13 +0000 (15:52 +0000)
committerWill Deacon <will@kernel.org>
Fri, 7 Nov 2025 14:20:07 +0000 (14:20 +0000)
commit535fdfc5a228524552ee8810c9175e877e127c27
tree29aca393ba8d6fab709550a3053c874e09f0b770
parentb98c94eed4a975e0c80b7e90a649a46967376f58
arm64: Use load LSE atomics for the non-return per-CPU atomic operations

The non-return per-CPU this_cpu_*() atomic operations are implemented as
STADD/STCLR/STSET when FEAT_LSE is available. On many microarchitecture
implementations, these instructions tend to be executed "far" in the
interconnect or memory subsystem (unless the data is already in the L1
cache). This is in general more efficient when there is contention as it
avoids bouncing cache lines between CPUs. The load atomics (e.g. LDADD
without XZR as destination), OTOH, tend to be executed "near" with the
data loaded into the L1 cache.

STADD executed back to back as in srcu_read_{lock,unlock}*() incur an
additional overhead due to the default posting behaviour on several CPU
implementations. Since the per-CPU atomics are unlikely to be used
concurrently on the same memory location, encourage the hardware to to
execute them "near" by issuing load atomics - LDADD/LDCLR/LDSET - with
the destination register unused (but not XZR).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
[will: Add comment and link to the discussion thread]
Signed-off-by: Will Deacon <will@kernel.org>
arch/arm64/include/asm/percpu.h