git.ipfire.org Git - thirdparty/linux.git/log

klp-build: Fix patch cleanup on interrupt

If a build error occurs and the user hits Ctrl-C while a large patch is
being reverted during cleanup, the cleanup EXIT trap gets re-triggered
and tries to re-revert the already partially-reverted patch.  That
causes 'patch -R' to repeatedly prompt

  "Unreversed patch detected!  Ignore -R? [n]"

for each already-reverted hunk, with no way to break out.

Fix it by adding '--force' to the patch revert command in
revert_patch(), which causes it to silently ignore already-reverted
hunks.  And ignore errors, as the cleanup is always best-effort.

For similar reasons, add to APPLIED_PATCHES before (rather than after)
applying the patch in apply_patch() so an interrupted apply will also
get cleaned up.

Fixes: d36a7343f4ba ("livepatch/klp-build: switch to GNU patch and recountdiff")
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

klp-build: Suppress excessive fuzz output by default

When a patch applies with fuzz, the detailed output from the patch tool
can be very noisy, especially for big patches.

Suppress the fuzz details by default, while keeping the "applied with
fuzz" warning. The noise can be restored with '--verbose'.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

klp-build: Validate patch file existence

Make sure all patch files actually exist. Otherwise there can be
confusing errors later.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

klp-build: Don't use errexit

The errtrace option (combined with the ERR trap) already serves the same
function (and more) as errexit, so errexit is redundant. And it has
more pitfalls. Remove it.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

klp-build: Fix checksum comparison for changed offsets

The klp-build -f/--show-first-changed feature uses diff to compare
checksum log lines between original and patched objects.  However, diff
compares entire lines, including the offset field.  When a function is
at a different section offset, the offset field differs even though the
instruction checksum is identical, causing the wrong instruction to be
printed.

Only compare the checksum field when looking for the first changed
instruction.  Also print both the original and patched offsets when they
differ.

Fixes: 78be9facfb5e ("livepatch/klp-build: Add --show-first-changed option to show function divergence")
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

klp-build: Fix hang on out-of-date .config

If .config is out of date with the kernel source, 'make syncconfig'
hangs while waiting for user input on new config options. Detect the
mismatch and return an error.

Fixes: 6f93f7b06810 ("livepatch/klp-build: Fix inconsistent kernel version")
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool: Fix reloc hash collision in find_reloc_by_dest_range()

In find_reloc_by_dest_range(), hash collisions can cause a high-offset
relocation to appear when probing a low-offset hash bucket.

Only return early when the best match found so far genuinely belongs to
the current bucket (its offset is within the bucket's stride range).
Otherwise, continue scanning later buckets which may contain
lower-offset matches.

This ensures the first reloc in the range gets returned.

Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix reloc corruption in convert_reloc_sym_to_secsym()

Use the section symbol's index instead of the old symbol's index when
updating the ELF relocation entry in convert_reloc_sym_to_secsym().

Found by Sashiko review.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Don't correlate .rodata.cst* constant pool objects

Clang aggregates UBSAN type descriptors into shared anonymous
.data..L__unnamed_* sections.  This data is used by UBSAN trap handlers.

When a changed function has an UBSAN bounds check, klp-diff clones the
entire UBSAN data section associated with the TU.  Relocations within
the cloned section that reference named rodata objects in .rodata.cst*
(like 'exponent', 'pirq_ali_set.irqmap') become KLP relocations because
those objects now get correlated.

That results in a .klp.rela.vmlinux..data section which can easily have
thousands of KLP relocs, most of which are completely superfluous, used
by functions which aren't cloned to the patch module.

The .rodata.cst* sections are SHF_MERGE constant pool sections
containing small fixed-size data (lookup tables, bitmasks) that is only
read by value.  Pointer identity is never relevant for these objects, so
correlating them is unnecessary.

Exclude .rodata.cst* objects from correlation so they get cloned as
local data instead of generating KLP relocations.

It might be possible to someday treat UBSAN data sections as special
sections, and only extract the few needed entries.  But this works for
now.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix pointer comparisons for rodata objects

klp-diff treats all rodata as uncorrelated, so any reference to it uses
a duplicated copy rather than using a KLP reloc.

For the contents of the data itself, a duplicated copy is fine.
However, pointer comparisons (e.g., f->f_op == &foo_ops) are broken.

Fix it by correlating non-anonymous rodata objects.

Also, use a new find_symbol_containing_inclusive() helper for matching
the end of a symbol so bounds calculations don't get broken, for the
case where an array or other symbol's ending address is used as part of
a bounds calculation.

While these are really two distinct changes, they need to be done in the
same patch so as to avoid introducing bisection regressions.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Simplify reloc symbol conversion

Inline section_reference_needed() and is_reloc_allowed() into
convert_reloc_sym() and remove the redundant is_reloc_allowed() check in
clone_reloc().

Move the is_sec_sym() checks into the convert callees so they become
no-ops when the reloc is already in the right format. This allows
convert_reloc_sym() to unconditionally dispatch to the right converter
based on section type.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool: Move mark_rodata() to elf.c

Move the sec->rodata marking from check.c to elf.c so it's set during
ELF reading rather than during the check pipeline. This makes the
rodata flag available to all objtool users, including klp-diff which
reads ELF files directly without running check().

Add an is_rodata_sec() helper to elf.h for consistency with
is_text_sec() and is_string_sec().

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix relocation conversion failures for R_X86_64_NONE

Objtool has some hacks which NOP out certain calls/jumps and replace
their relocations with R_X86_64_NONE. The klp-diff relocation
extraction code will error out when trying to copy these relocations due
to their negative addend, which would only makes sense for a PC-relative
branch instruction. Just ignore them.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix kCFI trap handling

.kcfi_traps contains references to kCFI trap instruction locations.
When a KCFI type check fails at an indirect call, the trap handler looks
up the faulting address in this section.

Add it to the special sections list so the entries get extracted for the
changed functions they reference.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix extraction of text annotations for alternatives

Objtool is failing to extract text annotations which reference
.altinstr_replacement instructions:

  1) Alternative replacement fake symbols are NOTYPE rather than FUNC,
     and they don't have sym->included set, thus they aren't recognized
     by should_keep_special_sym().

  2) .discard.annotate_insn gets processed before .altinstr_replacement,
     so the referenced (fake) symbols don't have clones yet.

Fix the first issue by checking for a valid clone instead of
sym->included and by accepting NOTYPE symbols when processing
.discard.annotate_insn.

Fix the second issue by deferring text annotation processing until after
the other special sections have been cloned.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix XXH3 state memory leak

The XXH3 state allocated in checksum_init() is never freed. Free it in
checksum_finish().

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix cloning of zero-length section symbols

Fix NULL dereference when cloning a symbol from an empty section.
sec->data is only populated for sections with non-zero size.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix handling of zero-length .altinstr_replacement sections

When a section is empty (e.g. only zero-length alternative
replacements), there are no symbols to convert a section symbol
reference to. Skip the reloc instead of erroring out.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix --debug-checksum for duplicate symbol names

find_symbol_by_name() only returns the first match, so
--debug-checksum=<func> silently ignores any subsequent duplicately
named functions after the first.

Fix that, along with a new for_each_sym_by_name() helper.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool: Replace iterator callback with for_each_sym_by_mangled_name()

Convert the callback-based iterate_sym_by_demangled_name() with a new
for_each_sym_by_demangled_name() macro. This eliminates the callback
struct/function and makes the code more compact and readable.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix create_fake_symbols() skipping entsize-based sections

create_fake_symbols() has two phases: creating symbols from
ANNOTATE_DATA_SPECIAL entries, and a fallback that uses sh_entsize for
special sections like .static_call_sites.

When .discard.annotate_data is absent, the function returns early,
skipping the entsize fallback and silently allowing unsupported
module-local static call keys through.

Fix it by jumping to the entsize phase instead of returning early.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Assisted-by: Claude:claude-4-opus
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Improve local label check

Clang emits various .L-prefixed local symbols beyond .Ltmp*, such as
.L__const.* for local constant data. These are assembler-local labels
not present in kallsyms, so they can never be resolved at module load
time.

Broaden the check from .Ltmp* to all .L* symbols so they get cloned into
the patch module instead.

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Don't report uncorrelated functions as new

Clang LTO uses __UNIQUE_ID() to generate some uniquely named wrapper
functions, like initstubs. If they're uncorrelated, prevent them from
being reported as new functions and included unnecessarily.

Note that dont_correlate() already includes prefix functions, so prefix
functions are still being ignored here.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Don't correlate __initstub__ symbols

With LTO, the initcall infrastructure generates __initstub__kmod_*
wrapper functions in .init.text. These are the LTO equivalent of
__initcall__kmod_* data pointers, which are already excluded from
correlation.

These are __init functions whose memory is freed after boot, so there's
no reason to include or reference them in a livepatch module.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Don't correlate absolute symbols

Some arch/x86/crypto/*.S files define local .set/.equ constants that get
duplicated in vmlinux.o. This causes klp-diff to fail with "Multiple
correlation candidates" errors since it can't uniquely match these
between orig and patched builds.

Skip ABS symbols in dont_correlate(). They're purely compile-time
assembly constants that are never referenced by relocations, so they
don't need correlation.

Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Don't correlate __ADDRESSABLE() symbols

Symbols created by __ADDRESSABLE() are only used to convince the
toolchain not to optimize out the referenced symbol.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix .data..once static local non-correlation

While there was once a section named .data.once, it has since been
renamed to .data..once with commit dbefa1f31a91 ("Rename .data.once to
.data..once to fix resetting WARN*_ONCE"). Fix it.

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

objtool/klp: Fix is_uncorrelated_static_local() for Clang

For naming function-local static locals, GCC uses <var>.<id>, e.g.
__already_done.15, while Clang uses <func>.<var> with optional .<id>,
e.g. create_worker.__already_done.111

The existing is_uncorrelated_static_local() check only matches the GCC
convention where the variable name is a prefix. Handle both cases by
checking for a prefix match (GCC) and by checking after the first dot
separator (Clang).

Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

cpufreq: apple-soc: Use FIELD_MODIFY()

Use FIELD_MODIFY() to remove open-coded bit manipulation.
No functional change intended.

Signed-off-by: Hans Zhang <18255117159@163.com>
Reviewed-by: Joshua Peisach <jpeisach@ubuntu.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq/amd-pstate: Use FIELD_MODIFY()

Use FIELD_MODIFY() to remove open-coded bit manipulation.
No functional change intended.

Signed-off-by: Hans Zhang <18255117159@163.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

drm/i915/hdcp: Drop mgr->base.lock acquisition in intel_conn_to_vcpi()

Now that intel_conn_to_vcpi() reads the MST topology state via
drm_atomic_get_new_mst_topology_state(), the topology state belongs
to this atomic commit and is already serialized through the atomic
state's private object machinery. There is no need to additionally
take mgr->base.lock here.
Taking it from the HDCP enable path in commit_tail with
state->base.acquire_ctx is also unsafe: by this point the
acquire_ctx is no longer in a state where new modeset locks may be
acquired through it, which produces a modeset-lock splat on MST +
HDCP. Drop the drm_modeset_lock() call.

Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Tested-by: Santhosh Reddy Guddati <santhosh.reddy.guddati@intel.com>
Reviewed-by: Santhosh Reddy Guddati <santhosh.reddy.guddati@intel.com>
Link: https://patch.msgid.link/20260504101117.3619984-3-suraj.kandpal@intel.com

drm/i915/hdcp: Use new MST topology state in intel_conn_to_vcpi()

intel_conn_to_vcpi() runs from the HDCP enable path in commit_tail
and looks up the VCPI via mgr->base.state. When an ALLOCATE_PAYLOAD
is being driven on the mgr just before HDCP enable, that payload
list is being mutated in place, so the lookup can miss the port and
trip drm_WARN_ON(!payload), causing HDCP to be programmed with
VCPI 0.
Use drm_atomic_get_new_mst_topology_state() to read the topology
state attached to this atomic commit (stable, decided in
atomic_check), and bail out cleanly when no topology state or
payload is present for this port instead of WARNing.

Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Tested-by: Santhosh Reddy Guddati <santhosh.reddy.guddati@intel.com>
Reviewed-by: Santhosh Reddy Guddati <santhosh.reddy.guddati@intel.com>
Link: https://patch.msgid.link/20260504101117.3619984-2-suraj.kandpal@intel.com

cpufreq: qcom: Unify user-visible "Qualcomm" name

Various names for Qualcomm as a company are used in user-visible config
options. Switch to unified "Qualcomm" so it will be easier for users to
identify the options when for example running menuconfig.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

net/sched: speedup tc_dump_qdisc() when tcm_handle is provided

"tc qdisc show ... handle xxx" filtering can be done by the kernel.

A followup patch can do the same for tcm_parent.

iproute2/tc needs a small companion patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260503114515.2460477-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: airoha: Introduce airoha_fe_get()/airoha_qdma_get() register read helpers

Add airoha_fe_get() and airoha_qdma_get() as utility routines for reading
a masked field from a specified register.
This is a non-functional refactor, no logical changes are introduced to
the existing codebase.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260501-airoha_fe_get-airoha_qdma_get-v3-1-126c6f647ccb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: fix .get_stats64 sleeping in atomic context

The .get_stats64 callback runs in atomic context, but on
MDIO-connected switches every register read acquires the MDIO bus
mutex, which can sleep:
[   12.645973] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:609
[   12.654442] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 759, name: grep
[   12.663377] preempt_count: 0, expected: 0
[   12.667410] RCU nest depth: 1, expected: 0
[   12.671511] INFO: lockdep is turned off.
[   12.675441] CPU: 0 UID: 0 PID: 759 Comm: grep Tainted: G S      W           7.0.0+ #0 PREEMPT
[   12.675453] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[   12.675456] Hardware name: Bananapi BPI-R64 (DT)
[   12.675459] Call trace:
[   12.675462]  show_stack+0x14/0x1c (C)
[   12.675477]  dump_stack_lvl+0x68/0x8c
[   12.675487]  dump_stack+0x14/0x1c
[   12.675495]  __might_resched+0x14c/0x220
[   12.675504]  __might_sleep+0x44/0x80
[   12.675511]  __mutex_lock+0x50/0xb10
[   12.675523]  mutex_lock_nested+0x20/0x30
[   12.675532]  mt7530_get_stats64+0x40/0x2ac
[   12.675542]  dsa_user_get_stats64+0x2c/0x40
[   12.675553]  dev_get_stats+0x44/0x1e0
[   12.675564]  dev_seq_printf_stats+0x24/0xe0
[   12.675575]  dev_seq_show+0x14/0x3c
[   12.675583]  seq_read_iter+0x37c/0x480
[   12.675595]  seq_read+0xd0/0xec
[   12.675605]  proc_reg_read+0x94/0xe4
[   12.675615]  vfs_read+0x98/0x29c
[   12.675625]  ksys_read+0x54/0xdc
[   12.675633]  __arm64_sys_read+0x18/0x20
[   12.675642]  invoke_syscall.constprop.0+0x54/0xec
[   12.675653]  do_el0_svc+0x3c/0xb4
[   12.675662]  el0_svc+0x38/0x200
[   12.675670]  el0t_64_sync_handler+0x98/0xdc
[   12.675679]  el0t_64_sync+0x158/0x15c

For MDIO-connected switches, poll MIB counters asynchronously using a
delayed workqueue every second and let .get_stats64 return the cached
values under a spinlock. A mod_delayed_work() call on each read
triggers an immediate refresh so counters stay responsive when queried
more frequently.

MMIO-connected switches (MT7988, EN7581, AN7583) are not affected
because their regmap does not sleep, so they continue to read MIB
counters directly in .get_stats64.

Fixes: 88c810f35ed5 ("net: dsa: mt7530: implement .get_stats64")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Chester A. Unal <chester.a.unal@arinc9.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6940b913da2c29156f0feff74b678d3c526ee84c.1777719253.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipmr: Add __rcu to netns_ipv4.mrt.

kernel test robot reported this Sparse warning:

  $ make C=1 net/ipv4/ipmr.o
  net/ipv4/ipmr.c:312:24: error: incompatible types in comparison expression (different address spaces):
  net/ipv4/ipmr.c:312:24:    struct mr_table [noderef] __rcu *
  net/ipv4/ipmr.c:312:24:    struct mr_table *

Let's add __rcu annotation to netns_ipv4.mrt.

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605030032.glNApko7-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260502180755.359554-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

psp: strip variable-length PSP header in psp_dev_rcv()

psp_dev_rcv() unconditionally removes a fixed PSP_ENCAP_HLEN, even
when psph->hdrlen indicates that the PSP header carries optional
fields. A frame whose PSP header advertises a non-zero VC or any
extension would therefore be silently mis-decapsulated: option bytes
would spill into the inner packet head and downstream parsing would
fail on a corrupted skb.

Compute the full PSP header length from psph->hdrlen, pull the
optional bytes into the linear region, and strip the whole header
when decapsulating. Optional fields (VC, ...) are still ignored,
just discarded with the rest of the header instead of leaking.
crypt_offset and the VIRT flag are intentionally not validated here
- callers know their device's PSP implementation and can decide.

Both in-tree callers gate on hardware-validated PSP, so this is a
correctness fix rather than a reachable corruption path under
current configurations.

Fixes: 0eddb8023cee ("psp: provide decapsulation and receive helper for drivers")
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260502141945.14484-1-devnexen@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: prevent possible UAF in rtnl_prop_list_size()

I was mistaken by synchronize_rcu() [1] call in netdev_name_node_alt_destroy(),
giving a false sense of RCU safety at delete times.

We have to use list_del_rcu() to not confuse potential readers
in rtnl_prop_list_size().

[1] This synchronize_rcu() call was later removed in commit 723de3ebef03
("net: free altname using an RCU callback").

Fixes: 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260502124102.499204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: realtek: replace magic number with register bit macros

Replace magic number with register bit macros. The description of
the RTL8211B interrupt register is obtained from publicly available
datasheet (RTL8211B(L) Rev. 1.5 Datasheet)

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20260502092857.156831-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-misc-fixes-for-v7-1-rc3'

Matthieu Baerts says:

====================
mptcp: misc fixes for v7.1-rc3

Here are various unrelated fixes:

- Patch 1: increment the right MIB counter. A fix for v5.7.

- Patch 2: set the right MPTCP reset reason. A fix for v5.9.

- Patch 3: fix rx timestamp corruption when on MPTCP passive fastopen. A
fix for v6.2.

- Patch 4: increase sockopt seq after having set TCP_MAXSEG to propagate
it to newer subflows later. A fix for 6.17.
====================

Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-0-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: sockopt: increase seq in mptcp_setsockopt_all_sf

mptcp_setsockopt_all_sf() was missing a call to sockopt_seq_inc(). This
is required not to cause missing synchronization for newer subflows
created later on.

This helper is called each time a socket option is set on subflows, and
future ones will need to inherit this option after their creation.

Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-4-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: fix rx timestamp corruption on fastopen

The skb cb offset containing the timestamp presence flag is cleared
before loading such information. Cache such value before MPTCP CB
initialization.

Fixes: 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-3-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failure

When HMAC validation fails on a received ACK + MP_JOIN in
subflow_syn_recv_sock(), the subflow is reset with reason
MPTCP_RST_EPROHIBIT ("Administratively prohibited"). This is
incorrect: HMAC validation failure is an MPTCP protocol-level
error, not an administrative policy denial.

The mirror site on the client, in subflow_finish_connect(), already
uses MPTCP_RST_EMPTCP ("MPTCP-specific error") for the same kind of
HMAC failure on the SYN/ACK + MP_JOIN. Use the same reason on the
server side for symmetry and accuracy.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Fixes: 443041deb5ef ("mptcp: fix NULL pointer in can_accept_new_subflow")
Cc: stable@vger.kernel.org
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-2-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failure

In subflow_finish_connect(), HMAC validation of the server's HMAC
in SYN/ACK + MP_JOIN increments MPTCP_MIB_JOINACKMAC ("HMAC was
wrong on ACK + MP_JOIN") on failure. The function processes the
SYN/ACK, not the ACK; the matching MPTCP_MIB_JOINSYNACKMAC counter
("HMAC was wrong on SYN/ACK + MP_JOIN") exists but is not
incremented anywhere in the tree.

The mirror site on the server, subflow_syn_recv_sock(), already
uses JOINACKMAC correctly for ACK HMAC failure. Use JOINSYNACKMAC
at the SYN/ACK validation site so each counter reflects the packet
whose HMAC actually failed.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable@vger.kernel.org
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-1-b70118df778e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: hardening: Reject zero max_num_queues from GDMA_QUERY_MAX_RESOURCES

In a CVM environment, hardware responses cannot be trusted. The
GDMA_QUERY_MAX_RESOURCES command returns resource limits used to
determine the maximum number of queues.

In mana_gd_query_max_resources(), gc->max_num_queues is initialized
from num_online_cpus() and successively clamped by the hardware-reported
max_eq, max_cq, max_sq, max_rq, and num_msix_usable values. If any of
these hardware values is zero, gc->max_num_queues becomes zero and the
function returns success. This leads to a confusing failure later when
alloc_etherdev_mq() is called with zero queues, returning NULL and
producing a misleading -ENOMEM error.

Add an explicit zero check for gc->max_num_queues after all clamping
steps and return -ENOSPC for a clear early failure, consistent with the
existing gc->num_msix_usable <= 1 guard.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Link: https://patch.msgid.link/20260430083627.1873757-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: hardening: Reject zero max_num_queues from MANA_QUERY_VPORT_CONFIG

As a part of MANA hardening for CVM, validate that max_num_sq and
max_num_rq returned by MANA_QUERY_VPORT_CONFIG are not zero. These
values flow into apc->num_queues, which is used as an allocation count
and loop bound. A zero value would result in zero-size allocations and
incorrect driver behavior.

Return -EPROTO if either value is zero.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260430085638.1875400-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock/virtio: fix potential unbounded skb queue

virtio_transport_inc_rx_pkt() checks vvs->rx_bytes + len > vvs->buf_alloc.

virtio_transport_recv_enqueue() skips coalescing for packets
with VIRTIO_VSOCK_SEQ_EOM.

If fed with packets with len == 0 and VIRTIO_VSOCK_SEQ_EOM,
a very large number of packets can be queued
because vvs->rx_bytes stays at 0.

Fix this by estimating the skb metadata size:

(Number of skbs in the queue) * SKB_TRUESIZE(0)

Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: virtualization@lists.linux.dev
Link: https://patch.msgid.link/20260430122653.554058-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-bridge-mcast-support-exponential-field-encoding'

Ujjal Roy says:

====================
net: bridge: mcast: support exponential field encoding

Description:
This series addresses a mismatch in how multicast query
intervals and response codes are handled across IPv4 (IGMPv3)
and IPv6 (MLDv2). While decoding logic currently exists,
the corresponding encoding logic is missing during query
packet generation. This leads to incorrect intervals being
transmitted when values exceed their linear thresholds.

The patches introduce a unified floating-point encoding
approach based on RFC3376 and RFC3810, ensuring that large
intervals are correctly represented in QQIC and MRC fields
using the exponent-mantissa format.

Key Changes:
* ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation
  Removes legacy macros in favor of a cleaner, unified
  calculation for retrieving intervals from encoded fields,
  improving code maintainability.

* ipv6: mld: rename mldv2_mrc() and add mldv2_qqi()
  Standardizes MLDv2 terminology by renaming mldv2_mrc()
  to mldv2_mrd() (Maximum Response Delay) and introducing
  a new API mldv2_qqi for QQI calculation, improving code
  readability.

* ipv4: igmp: encode multicast exponential fields
  Introduces the logic to dynamically calculate the exponent
  and mantissa using bit-scan (fls). This ensures QQIC and
  MRC fields (8-bit) are properly encoded when transmitting
  query packets with intervals that exceed their respective
  linear threshold value of 128 (for QQI/MRT).

* ipv6: mld: encode multicast exponential fields
  Applies similar encoding logic for MLDv2. This ensures
  QQIC (8-bit) and MRC (16-bit) fields are properly encoded
  when transmitting query packets with intervals that exceed
  their respective linear thresholds (128 for QQI; 32768
  for MRD).

* selftests: net: bridge: add MRC and QQIC field encoding tests
  Updates bridge selftests to validate both linear and non-linear
  (exponential) encoding for MRC and QQIC fields, ensuring
  protocol compliance across IGMPv3 and MLDv2.

Impact:
These changes ensure that multicast queriers and listeners
stay synchronized on timing intervals, preventing protocol
timeouts or premature group membership expiration caused
by incorrectly formatted packet headers.
====================

Link: https://patch.msgid.link/20260502131907.987-1-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: bridge: add MRC and QQIC field encoding tests

Enhance vlmc_query_intvl_test and vlmc_query_response_intvl_test in
bridge_vlan_mcast.sh to validate IGMPv3/MLDv2 protocol compliance for
MRC and QQIC field encoding across both linear and exponential ranges.

TEST: Vlan multicast snooping enable                                [ OK ]
TEST: Vlan mcast_query_interval global option default value         [ OK ]
TEST: Number of tagged IGMPv2 general query                         [ OK ]
TEST: IGMPv3 QQIC linear value 60(s)                                [ OK ]
TEST: MLDv2 QQIC linear value 60(s)                                 [ OK ]
TEST: IGMPv3 QQIC non linear value 160(s)                           [ OK ]
TEST: MLDv2 QQIC non linear value 160(s)                            [ OK ]
TEST: Vlan mcast_query_response_interval global option default value   [ OK ]
TEST: IGMPv3 MRC linear value of 60(x0.1s)                          [ OK ]
TEST: MLDv2 MRC linear value of 24000(ms)                           [ OK ]
TEST: IGMPv3 MRC non linear value of 240(x0.1s)                     [ OK ]
TEST: MLDv2 MRC non linear value of 48000(ms)                       [ OK ]

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-6-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: mld: encode multicast exponential fields

In MLD, MRC and QQIC fields are not correctly encoded when
generating query packets. Since the receiver of the query
interprets these fields using the MLDv2 floating-point
decoding logic, any value that exceeds the linear threshold
is incorrectly parsed as an exponential value, leading to
an incorrect interval calculation.

Encode and assign the corresponding protocol fields during
query generation. Introduce the logic to dynamically
calculate the exponent and mantissa using bit-scan (fls).
This ensures MRC (16-bit) and QQIC (8-bit) fields are
properly encoded when transmitting query packets with
intervals that exceed their respective linear thresholds
(32768 for MRD; 128 for QQI).

RFC3810: If Maximum Response Code >= 32768, the Maximum
Response Code field represents a floating-point value as
follows:
     0 1 2 3 4 5 6 7 8 9 A B C D E F
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1| exp |          mant         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RFC3810: If QQIC >= 128, the QQIC field represents a
floating-point value as follows:
     0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+
    |1| exp | mant  |
    +-+-+-+-+-+-+-+-+

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-5-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: igmp: encode multicast exponential fields

In IGMP, MRC and QQIC fields are not correctly encoded
when generating query packets. Since the receiver of the
query interprets these fields using the IGMPv3 floating-
point decoding logic, any value that exceeds the linear
threshold is incorrectly parsed as an exponential value,
leading to an incorrect interval calculation.

Encode and assign the corresponding protocol fields during
query generation. Introduce the logic to dynamically
calculate the exponent and mantissa using bit-scan (fls).
This ensures MRC and QQIC fields (8-bit) are properly
encoded when transmitting query packets with intervals
that exceed their respective linear threshold value of
128 (for MRT/QQI).

RFC3376: for both MRC and QQIC, values >= 128 represent
the same floating-point encoding as follows:
     0 1 2 3 4 5 6 7
    +-+-+-+-+-+-+-+-+
    |1| exp | mant  |
    +-+-+-+-+-+-+-+-+

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-4-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: mld: rename mldv2_mrc() and add mldv2_qqi()

Rename mldv2_mrc() to mldv2_mrd() as it is used to calculate
the Maximum Response Delay from the Maximum Response Code.

Introduce a new API mldv2_qqi() to define the existing
calculation logic of QQI from QQIC. This also organizes
the existing mld_update_qi() API.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-3-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation

Get rid of the IGMPV3_MRC macro and use the igmpv3_mrt() API to
calculate the Max Resp Time from the Maximum Response Code.

Similarly, for IGMPV3_QQIC, use the igmpv3_qqi() API to calculate
the Querier's Query Interval from the QQIC field.

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-2-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: asix: ax88772: re-add usbnet_link_change() in phylink callbacks

Commit e0bffe3e6894 ("net: asix: ax88772: migrate to phylink") replaced
the asix_adjust_link() PHY callback with phylink's mac_link_up() and
mac_link_down() handlers, but did not carry over the usbnet_link_change()
notification that commit 805206e66fab ("net: asix: fix "can't send until
first packet is send" issue") had added.

As a result, the original symptom returns: when the link comes up,
usbnet is never notified, so the RX URB submission stays dormant until
some other event (e.g. a transmitted packet triggering the status
endpoint interrupt) wakes it up.

This is reproducible with the Apple A1277 USB Ethernet Adapter
(05ac:1402, AX88772A based) on a Banana Pro using a static IPv4
configuration. After bringing the interface up, no incoming packets are
received until the first outgoing frame triggers usbnet's RX path.

Restore the link change notification, gated on a carrier transition so
the call remains idempotent if the status endpoint also reports the
change later.

Fixes: e0bffe3e6894 ("net: asix: ax88772: migrate to phylink")
Signed-off-by: Markus Baier <Markus.Baier@soslab.tu-darmstadt.de>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260501163941.107668-1-Markus.Baier@soslab.tu-darmstadt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-convert-af_netlink-and-af_vsock-to-getsockopt_iter-api'

Breno Leitao says:

====================
net: Convert AF_NETLINK and AF_VSOCK to getsockopt_iter API

Continue the work to convert protocols to the new getsockopt_iter API.

Convert AF_NETLINK and AF_VSOCK getsockopt implementations to the new
sockopt_t/getsockopt_iter API, and add kselftests that verify the size
and errno semantics are preserved across the conversion.

I chose these two socket families because they are probably one of the
most used protocols,, ensuring that any potential bugs will be
discovered and reported quickly.
====================

Link: https://patch.msgid.link/20260501-getsock_one-v1-0-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: selftests: add getsockopt_iter regression tests

Add a single kselftest covering the proto_ops getsockopt_iter
conversions for AF_NETLINK and AF_VSOCK, using one fixture per protocol:

netlink:

NETLINK_PKTINFO covers the flag-style int path (exact size, oversize
clamp, undersize -EINVAL); NETLINK_LIST_MEMBERSHIPS covers the
size-discovery path that always reports the required buffer length back
via optlen, even when the user buffer is too small to receive any group
bits.

vsock:
SO_VM_SOCKETS_BUFFER_SIZE covers the u64 path (exact size, oversize
clamp, undersize -EINVAL).

Each fixture also exercises an unknown optname and a bogus level so
the returned-length / errno semantics preserved by the sockopt_t
conversion are pinned down.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260501-getsock_one-v1-3-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock: convert to getsockopt_iter

Convert AF_VSOCK's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t. The single
vsock_connectible_getsockopt() callback is shared by both
vsock_stream_ops and vsock_seqpacket_ops, so both proto_ops are
updated to use .getsockopt_iter.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260501-getsock_one-v1-2-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: convert to getsockopt_iter

Convert AF_NETLINK's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- For NETLINK_LIST_MEMBERSHIPS: walk the groups bitmap and emit each
  u32 sequentially via copy_to_iter(), then set opt->optlen to the
  total size required (ALIGN(BITS_TO_BYTES(ngroups), sizeof(u32))).
  The wrapper writes opt->optlen back to userspace even on partial
  failure, preserving the existing API that lets userspace discover
  the needed allocation size.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260501-getsock_one-v1-1-810ce23ea70e@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

spi: spacemit: add u64 cast to NSEC_PER_SEC to avoid 32-bit overflow

NSEC_PER_SEC expands to the long constant 1000000000L, so NSEC_PER_SEC *
BITS_PER_BYTE (8 * 10^9) overflows on 32-bit-long architectures
before the result reaches the u64 nsec_per_word.

Promote the multiplication to u64 by casting the first operand, which is
NSEC_PER_SEC.

Fixes: efcd8b9d1111 ("spi: spacemit: introduce SpacemiT K1 SPI controller driver")
Suggested-by: Alex Elder <elder@riscstar.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605050437.RS6mmV2b-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605050317.Tf9j487w-lkp@intel.com/
Signed-off-by: Guodong Xu <guodong@riscstar.com>
Link: https://patch.msgid.link/20260505-spi-spacemit-k1-fix-overflow-v1-1-77564c2e4e86@riscstar.com
Signed-off-by: Mark Brown <broonie@kernel.org>

net/sched: add qstats_cpu_drop_inc() helper

1) Using this_cpu_inc() is better than going through this_cpu_ptr():

- Single instruction on x86.
- Store tearing prevention.

2) Change tcf_action_update_stats() to use this_cpu_add().

3) Add WRITE_ONCE() to __qdisc_qstats_drop() and qstats_drop_inc()
   in preparation for lockless "tc qdisc show".

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 3/17 up/down: 72/-216 (-144)
Function                                     old     new   delta
dualpi2_enqueue_skb                          462     511     +49
tcf_ife_act                                 1061    1077     +16
taprio_enqueue                               613     620      +7
codel_qdisc_enqueue                          149     143      -6
tcf_vlan_act                                 684     676      -8
tcf_skbedit_act                              626     618      -8
tcf_police_act                               725     717      -8
tcf_mpls_act                                1297    1289      -8
tcf_gate_act                                 310     302      -8
tcf_gact_act                                 222     214      -8
tcf_csum_act                                2438    2430      -8
tcf_bpf_act                                  709     701      -8
tcf_action_update_stats                      124     115      -9
pie_qdisc_enqueue                            865     856      -9
pfifo_enqueue                                116     107      -9
choke_enqueue                               2069    2059     -10
plug_enqueue                                 139     128     -11
bfifo_enqueue                                121     110     -11
tcf_nat_act                                 1501    1489     -12
gred_enqueue                                1743    1668     -75
Total: Before=24388609, After=24388465, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260501135916.2566766-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ASoC: mediatek: Add support for MT8196 SoC

Cyril Chao <Cyril.Chao@mediatek.com> says:

This series of patches adds support for Mediatek AFE of MT8196 SoC.

ASoC: mediatek: mt8196: add machine driver with nau8825

Add support for mt8196 board with nau8825.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-11-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mediatek,mt8196-nau8825: Add audio sound card

Add soundcard bindings for the MT8196 SoC with the NAU8825 audio codec.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-10-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: add platform driver

Add mt8196 platform driver.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-9-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mediatek,mt8196-afe: add audio AFE

Add mt8196 audio AFE.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-8-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support TDM in platform driver

Add mt8196 TDM DAI driver support.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-7-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support I2S in platform driver

Add mt8196 I2S DAI driver support.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-6-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support ADDA in platform driver

Add mt8196 ADDA DAI driver support.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-5-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: support audio clock control

Add audio clock wrapper and audio tuner control.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-4-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt8196: add common header

Add header files for register definitions and structures.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-3-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: common: modify mtk afe platform driver for mt8196

Mofify the pcm pointer interface to support 64-bit address access.

Signed-off-by: Darren Ye <darren.ye@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com>
Signed-off-by: Cyril Chao <Cyril.Chao@mediatek.com>
Link: https://patch.msgid.link/20260430022417.32282-2-Cyril.Chao@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add machine driver for on-chip HDMI codec

Add a simple ASoC machine driver that wires the MT2701/MT7623N
AFE HDMI playback path to the on-chip HDMI transmitter exposed
as a generic hdmi-audio-codec "i2s-hifi" DAI.

The driver binds to "mediatek,mt2701-hdmi-audio". MT7623N device
trees carry "mediatek,mt7623n-hdmi-audio" as a board-specific
fallback, matching the dt-binding.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/1dafc8147ee09d44de53b69bca792bdbfe13e8b0.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add HDMI audio memif, FE and BE DAIs

Extend the MT2701/MT7623N AFE driver with the HDMI playback path:

  - a new HDMI DMA memif (MT2701_MEMIF_HDMI) mapped to the
    AFE_HDMI_OUT_{CON0,BASE,CUR,END} registers;
  - a PCM_HDMI front-end DAI (S16_LE only, 44.1k/48k) which feeds
    the memif via DPCM;
  - an HDMI BE DAI wrapping the AFE_8CH_I2S_OUT_CON engine that
    serialises L/R samples towards the on-chip HDMI transmitter.

Sample-rate programming uses the empirically determined
HDMI_BCK_DIV = 45 * 48000 / rate - 1 formula in AUDIO_TOP_CON3,
which covers 44.1 kHz and 48 kHz within the 6-bit divider range.
The AFE_HDMI_CONN0 interconnect is programmed to route memif
output pairs to the serializer inputs with L/R in the right order
for hdmi-audio-codec.

The existing I2S engine helpers (mt2701_mclk_configuration,
mt2701_i2s_path_enable, mt2701_afe_i2s_path_disable) are reused
for the HDMI BE so that MCLK at 128*fs and the ASYS I2S3 FS field
are programmed and cleanly released across open/close cycles.

Only S16_LE and 44.1k/48k are exposed to userspace. Other rates
fall outside the 6-bit BCK divider range, and wider sample
formats require DMA BIT_WIDTH programming that the current memif
setup does not handle. These limits match what the MT8173 AFE
driver exposes for its HDMI path.

The HDMI-related AFE registers (AUDIO_TOP_CON3, AFE_HDMI_OUT_CON0,
AFE_HDMI_CONN0, AFE_8CH_I2S_OUT_CON) are added to the suspend
backup list so that the existing mtk_afe_suspend/resume framework
saves and restores them across system sleep cycles.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/a580d6d676bcdba3d8bad94bb3bcb91a336bf1ba.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add optional HDMI audio path clocks

The HDMI audio output path on MT2701/MT7623N is rooted in HADDS2PLL
and gated by the audio_hdmi, audio_spdf and audio_apll power gates.
Acquire these four clocks from device tree using devm_clk_get_optional
so that existing platforms which do not wire up HDMI audio keep
probing unchanged. Actual clock enable/prepare is deferred to the
upcoming HDMI DAI startup path.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/5e24890acf597b04485145b5056ad8b161b4cbda.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: mediatek: mt2701: add AFE HDMI register definitions

Add register offsets and bit defines for the MT2701/MT7623N AFE
HDMI audio output path: the HDMI BCK divider in AUDIO_TOP_CON3,
the HDMI output memif control and descriptor registers, the 8-bit
AFE_HDMI_CONN0 interconnect, and the AFE_8CH_I2S_OUT_CON engine
that drives the HDMI TX serial link. These are a prerequisite for
adding an HDMI playback path to the mt2701 AFE driver and have no
behavioural effect on their own.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/2c2a2e3e5d01da4a130160f5d5ffbd2a3808fe12.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mediatek,mt2701-hdmi-audio: add MT2701 HDMI audio

Describe the sound card node that routes the MT2701/MT7623N AFE
HDMI playback path to the on-chip HDMI transmitter. This is
separate from the AFE platform binding (mediatek,mt2701-audio)
because it represents board-level audio routing between the AFE
and the HDMI codec, not an additional IP block. MT7623N boards
carry the same IP and use the mt7623n- compatible as a fallback
to mt2701-.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/b18cd550bcfab8c5a8a765d68a9ab5bae3f1c4e7.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: dt-bindings: mt2701-afe-pcm: add HDMI audio path clocks

Document four additional optional clocks feeding the HDMI audio
output path on MT2701: the HADDS2 PLL (root of the HDMI audio
clock tree), the HDMI audio and S/PDIF interface power gates,
and the audio APLL root gate. Older device trees that do not
wire these up remain valid via minItems. MT7622 does not have
HDMI audio hardware, so its compatible is restricted to the
base set of 34 clocks.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/1a871e4494d514a270c7c27c080e338b2dbf6559.1776998727.git.daniel@makrotopia.org
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: tas2770: Deal with bogus initial temperature value

TAS2770 initialises the temperature readout registers to 0.
This value persists until the chip is fully powered up and
the ADC starts sampling. The ADC then persists the last sampled
temperature during software shutdown.

The ADC should therefore never return 0 in normal operating
conditions, so return -ENODATA and mark it as a fault condition
using HWMON_T_FAULT.

Fixes: ff73e2780169 ("ASoC: tas2770: expose die temp to hwmon")
Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: tas2764: Deal with bogus initial temperature register value

The TAS2764 datasheet specifies that the chip initialises the
temperature register such that the temperature reading is 2.6 *C,
ostensibly to prevent tripping the chip's protection circuitry.
The chip is not capable of representing 2.6 *C however, and the
register is actually initialised to 0. The ADC does not start
sampling until the chip is powered up, and the last sampled
temperature persists in the register during software shutdown.
Therefore, any reading returning 0 is almost certain to be
from before the ADC has actually started sampling, meaning that
it is invalid.

Return -ENODATA early if the temperature has not yet been sampled
by the chip, and indicate a fault condition using HWMON_T_FAULT.

Fixes: 186dfc85f9a8 ("ASoC: tas2764: expose die temp to hwmon")
Signed-off-by: James Calligeros <jcalligeros99@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>

net: phy: realtek: Add support for PHY LEDs on RTL8221B

Realtek RTL8221B Ethernet PHY supports three LED pins which are used to
indicate link status and activity. Add netdev trigger support for them.

Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260501100002.755672-1-amadeus@jmu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netpoll: pass buffer size to egress_dev() to avoid MAC truncation

egress_dev() formats np->dev_mac via snprintf() but receives buf as
a bare char *, so it cannot derive the buffer size from the pointer. The
size argument was hardcoded to MAC_ADDR_STR_LEN (3 * ETH_ALEN - 1 = 17),
which is silly wrong in two ways:

1) misleading kernel log output on the MAC-selected target path
    (np->dev_name[0] == '\0'); for example "aa:bb:cc:dd:ee:ff doesn't
    exist, aborting" was logged as "aa:bb:cc:dd:ee:f doesn't exist,
    aborting".

2) the second argument of snprintf is the size of the buffer, not the
    size of what you want to write.

Add a bufsz parameter to egress_dev() and pass sizeof(buf) from each
caller, matching the standard snprintf() idiom and removing the
hardcoded size from the helper.

Every caller already declares "char buf[MAC_ADDR_STR_LEN + 1]" so the
formatted MAC continues to fit.

Tested by booting with
  netconsole=6665@/aa:bb:cc:dd:ee:ff,6666@10.0.0.1/00:11:22:33:44:55
on a kernel without a matching device. Pre-fix dmesg shows
"aa:bb:cc:dd:ee:f doesn't exist, aborting"; post-fix shows the full
"aa:bb:cc:dd:ee:ff doesn't exist, aborting".

Fixes: f8a10bed32f5 ("netconsole: allow selection of egress interface via MAC address")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260501-netpoll_snprintf_fix-v1-1-84b0566e6597@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: Set gc_in_progress to true in unix_gc().

Igor Ushakov reported that unix_gc() could run with gc_in_progress
being false if the work is scheduled while running:

  Thread 1         Thread 2                     Thread 3
  --------         --------                     --------
                   unix_schedule_gc()           unix_schedule_gc()
                   `- if (!gc_in_progress)      `- if (!gc_in_progress)
                      |- gc_in_progress = true     |
                      `- queue_work()              |
  unix_gc() <----------------/                     |
  |                                                |- gc_in_progress = true
  ...                                              `- queue_work()
  |                                                       |
  `- gc_in_progress = false                               |
                                                          |
  unix_gc() <---------------------------------------------'
  |
  ... /* gc_in_progress == false */
  |
  `- gc_in_progress = false

unix_peek_fpl() relies on gc_in_progress not to confuse GC
by MSG_PEEK.

Let's set gc_in_progress to true in unix_gc().

Fixes: 8b90a9f819dc ("af_unix: Run GC on only one CPU.")
Reported-by: Igor Ushakov <sysroot314@gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260501073945.1884564-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/sched: taprio: prepare taprio_dump() for RTNL removal

We soon will no longer hold RTNL in qdisc dumps.

Add READ_ONCE()/WRITE_ONCE() annotations.

Note: taprio already uses RCU to protect most of its fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260501064247.2027688-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

spi: Consistently define pci_device_ids using named initializers

The .driver_data member of the various struct pci_device_id arrays were
initialized by list expressions. This isn't easily readable if you're
not into PCI. Using named initializers is more explicit and thus easier
to parse. Also skip explicit assignments of 0 (which the compiler then
takes care of).

This change doesn't introduce changes to the compiled pci_device_id
arrays. Tested on x86 and arm64.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20260504142117.2116978-2-u.kleine-koenig@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>

selftests/resctrl: Reduce L2 impact on CAT test

The L3 CAT test loads a buffer into cache that is proportional to the L3
size allocated for the workload and measures cache misses when accessing
the buffer as a test of L3 occupancy. When loading the buffer it can be
assumed that a portion of the buffer will be loaded into the L2 cache and
depending on cache design may not be present in L3. It is thus possible
for data to not be in L3 but also not trigger an L3 cache miss when
accessed.

Reduce impact of L2 on the L3 CAT test by, if L2 allocation is supported,
minimizing the portion of L2 that the workload can allocate into. This
encourages most of buffer to be loaded into L3 and support better
comparison between buffer size, cache portion, and cache misses when
accessing the buffer.

Link: https://lore.kernel.org/r/1f5aad318889cd6d4f9a8d8b0fbe83e3848d41a9.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Simplify perf usage in CAT test

The CAT test relies on the PERF_COUNT_HW_CACHE_MISSES event to determine if
modifying a cache portion size is successful. This event is configured to
report the data as part of an event group, but no other events are added to
the group.

Remove the unnecessary PERF_FORMAT_GROUP format setting. This eliminates
the need for struct perf_event_read and results in read() of the associated
file descriptor to return just one value associated with the
PERF_COUNT_HW_CACHE_MISSES event of interest.

Link: https://lore.kernel.org/r/fb69325eba5031b735fa79effaaacd797c9c6040.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Remove requirement on cache miss rate

As the CAT test reads the same buffer into different sized cache portions
it compares the number of cache misses against an expected percentage
based on the size of the cache portion.

Systems and test conditions vary. The CAT test is a test of resctrl
subsystem health and not a test of the hardware architecture so it is not
required to place requirements on the size of the difference in cache
misses, just that the number of cache misses when reading a buffer
increase as the cache portion used for the buffer decreases.

Remove additional constraint on how big the difference between cache
misses should be as the cache portion size changes. Only test that the
cache misses increase as the cache portion size decreases. This remains
a good sanity check of resctrl subsystem health while reducing impact
of hardware architectural differences and the various conditions under
which the test may run.

Increase the size difference between cache portions to additionally avoid
any consequences resulting from smaller increments.

Link: https://lore.kernel.org/r/6de4da5486354c0f25fef0d194956470cb744041.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Raise threshold at which MBM and PMU values are compared

Commit 501cfdba0a40 ("selftests/resctrl: Do not compare performance
counters and resctrl at low bandwidth") introduced a threshold under which
memory bandwidth values from MBM and performance counters are not compared.
This is needed because MBM and the PMUs do not have an identical view of
memory bandwidth since PMUs can count all memory traffic while MBM does not
count "overhead" (for example RAS) traffic that cannot be attributed to an
RMID. As a ratio this difference in view of memory bandwidth is pronounced
at low memory bandwidths.

The 750MiB threshold was chosen arbitrarily after comparisons on different
platforms. Exposed to more platforms after introduction this threshold has
proven to be inadequate.

Having accurate comparison between performance counters and MBM requires
careful management of system load as well as control of features that
introduce extra memory traffic, for example, patrol scrub. This is not
appropriate for the resctrl selftests that are intended to run on a
variety of systems with various configurations.

Increase the memory bandwidth threshold under which no comparison is made
between performance counters and MBM. Add additional leniency by increasing
the percentage of difference that will be tolerated between these counts.

There is no impact to the validity of the resctrl selftests results as a
measure of resctrl subsystem health.

Link: https://lore.kernel.org/r/b374c33ddd324130d6255cbb91c3dd500e8277e7.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Increase size of buffer used in MBM and MBA tests

Errata for Sierra Forest [1] (SRF42) and Granite Rapids [2] (GNR12)
describe the problem that MBM on Intel RDT may overcount memory bandwidth
measurements. The resctrl tests compare memory bandwidth reported by iMC
PMU to that reported by MBM causing the tests to fail on these systems
depending on the settings of the platform related to the errata.

Since the resctrl tests need to run under various conditions it is not
possible to ensure system settings are such that MBM will not overcount.
It has been observed that the overcounting can be controlled via the
buffer size used in the MBM and MBA tests that rely on comparisons
between iMC PMU and MBM measurements.

Running the MBM test on affected platforms with different buffer sizes it
can be observed that the difference between iMC PMU and MBM counts reduce
as the buffer size increases. After increasing the buffer size to more
than 4X the differences between iMC PMU and MBM become insignificant.

Increase the buffer size used in MBM and MBA tests to 4X L3 size to reduce
possibility of tests failing due to difference in counts reported by iMC
PMU and MBM.

Link: https://lore.kernel.org/r/1bd4d8c5fc791234b0a9da94f29a3e278ba2f7ee.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/
Link: https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/birch-stream/xeon-6900-6700-6500-series-processors-with-p-cores-specification-update/011US/errata-details/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Support multiple events associated with iMC

The resctrl selftests discover needed parameters to perf_event_open() via
sysfs. The PMU associated with every memory controller (iMC) is discovered
via the /sys/bus/event_source/devices/uncore_imc_N/type file while
the read memory bandwidth event type and umask is discovered via
/sys/bus/event_source/devices/uncore_imc_N/events/cas_count_read.

Newer systems may have multiple events that expose read memory bandwidth.
Running a recent kernel that includes
commit 6a8a48644c4b ("perf/x86/intel/uncore: Add per-scheduler IMC CAS count events")
on these systems expose the multiple events. For example,
/sys/bus/event_source/devices/uncore_imc_N/events/cas_count_read_sch0
/sys/bus/event_source/devices/uncore_imc_N/events/cas_count_read_sch1

Support parsing of iMC PMU properties when the PMU may have multiple events
to measure read memory bandwidth. The PMU only needs to be discovered once.
Split the parsing of event details from actual PMU discovery in order to
loop over all events associated with the PMU. Match all events with the
cas_count_read prefix instead of requiring there to be one file with that
name.

Make the parsing code more robust. With strings passed around to create
needed paths, use snprintf() instead of sprintf() to ensure there is
always enough space to create the path while using the standard PATH_MAX
for path lengths. Ensure there is enough room in imc_counters_config[]
before attempting to add an entry.

Link: https://lore.kernel.org/r/b03ca0fa21a09500c56ee589e32516c2c5effeaf.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Prepare for parsing multiple events per iMC

The events needed to read memory bandwidth are discovered by iterating
over every memory controller (iMC) within /sys/bus/event_source/devices.
Each iMC's PMU is assumed to have one event to measure read memory
bandwidth that is represented by the sysfs cas_count_read file. The event's
configuration is read from "cas_count_read" and stored as an element of
imc_counters_config[] by read_from_imc_dir() that receives the
index of the array where to store the configuration as argument.

It is possible that an iMC's PMU may have more than one event that should
be used to measure memory bandwidth.

Change semantics to not provide the index of the array to
read_from_imc_dir() but instead a pointer to the index. This enables
read_from_imc_dir() to store configurations for more than one event by
incrementing the index to imc_counters_config[] itself.

Ensure that the same type is consistently used for the index as it is
passed around during counter configuration.

Link: https://lore.kernel.org/r/549e026d20af0381349e645c912e6470fce8bd7e.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Do not store iMC counter value in counter config structure

The MBM and MBA tests compare MBM memory bandwidth measurements against
the memory bandwidth event values obtained from each memory controller's
PMU. The memory bandwidth event settings are discovered from the memory
controller details found in /sys/bus/event_source/devices/uncore_imc_N and
stored in struct imc_counter_config.

In addition to event settings struct imc_counter_config contains
imc_counter_config::return_value in which the associated event value is
stored on every read.

The event value is consumed and immediately recorded at regular intervals.
The stored value is never consumed afterwards, making its storage as part
of event configuration unnecessary.

Remove the return_value member from struct imc_counter_config. Instead
just use a more aptly named "measurement" local variable for use during
event reading.

Link: https://lore.kernel.org/r/e0b6ad2755e2fd802f54b0bc07eeb90247baca19.1775266384.git.reinette.chatre@intel.com
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Reduce interference from L2 occupancy during cache occupancy test

The CMT test creates a new control group that is also capable of monitoring
and assigns the workload to it. The workload allocates a buffer that by
default fills a portion of the L3 and keeps reading from the buffer,
measuring the L3 occupancy at intervals. The test passes if the workload's
L3 occupancy is within 15% of the buffer size.

The CMT test does not take into account that some of the workload's data
may land in L2/L1. Matching L3 occupancy to the size of the buffer while
a portion of the buffer can be allocated into L2 is not accurate.

Take the L2 cache into account to improve test accuracy:
- Reduce the workload's L2 cache allocation to the minimum on systems that
   support L2 cache allocation. Do so with a new utility in preparation for
   all L3 cache allocation tests needing the same capability.
- Increase the buffer size to accommodate data that may be allocated into
   the L2 cache. Use a buffer size double the L3 portion to keep using the
   L3 portion size as goal for L3 occupancy while taking into account that
   some of the data may be in L2.

Running the CMT test on a sample system while introducing significant
cache misses using "stress-ng --matrix-3d 0 --matrix-3d-zyx" shows
significant improvement in L3 cache occupancy:

Before:

    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Write schema "L3:0=fffe0" to resctrl FS
    # Write schema "L3:0=1f" to resctrl FS
    # Benchmark PID: 7089
    # Checking for pass/fail
    # Pass: Check cache miss rate within 15%
    # Percent diff=12
    # Number of bits: 5
    # Average LLC val: 73269248
    # Cache span (bytes): 83886080
    ok 1 CMT: test

After:
    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Write schema "L3:0=fffe0" to resctrl FS
    # Write schema "L3:0=1f" to resctrl FS
    # Write schema "L2:1=0x1" to resctrl FS
    # Benchmark PID: 7171
    # Checking for pass/fail
    # Pass: Check cache miss rate within 15%
    # Percent diff=0
    # Number of bits: 5
    # Average LLC val: 83755008
    # Cache span (bytes): 83886080
    ok 1 CMT: test

Link: https://lore.kernel.org/r/00445fa64c251b86b86023f87220ee1ad8561460.1775266384.git.reinette.chatre@intel.com
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/aO+7MeSMV29VdbQs@e133380.arm.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

selftests/resctrl: Improve accuracy of cache occupancy test

Dave Martin reported inconsistent CMT test failures. In one experiment
the first run of the CMT test failed because of too large (24%) difference
between measured and achievable cache occupancy while the second run passed
with an acceptable 4% difference.

The CMT test is susceptible to interference from the rest of the system.
This can be demonstrated with a utility like stress-ng by running the CMT
test while introducing cache misses using:

   stress-ng --matrix-3d 0 --matrix-3d-zyx

Below shows an example of the CMT test failing because of a significant
difference between measured and achievable cache occupancy when run with
interference:
    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Benchmark PID: 7011
    # Checking for pass/fail
    # Fail: Check cache miss rate within 15%
    # Percent diff=99
    # Number of bits: 5
    # Average LLC val: 235929
    # Cache span (bytes): 83886080
    not ok 1 CMT: test

The CMT test creates a new control group that is also capable of monitoring
and assigns the workload to it. The workload allocates a buffer that by
default fills a portion of the L3 and keeps reading from the buffer,
measuring the L3 occupancy at intervals. The test passes if the workload's
L3 occupancy is within 15% of the buffer size.

By not adjusting any capacity bitmasks the workload shares the cache with
the rest of the system. Any other task that may be running could evict
the workload's data from the cache causing it to have low cache occupancy.

Reduce interference from the rest of the system by ensuring that the
workload's control group uses the capacity bitmask found in the user
parameters for L3 and that the rest of the system can only allocate into
the inverse of the workload's L3 cache portion. Other tasks can thus no
longer evict the workload's data from L3.

With the above adjustments the CMT test is more consistent. Repeating the
CMT test while generating interference with stress-ng on a sample
system after applying the fixes show significant improvement in test
accuracy:

    # Starting CMT test ...
    # Mounting resctrl to "/sys/fs/resctrl"
    # Cache size :335544320
    # Writing benchmark parameters to resctrl FS
    # Write schema "L3:0=fffe0" to resctrl FS
    # Write schema "L3:0=1f" to resctrl FS
    # Benchmark PID: 7089
    # Checking for pass/fail
    # Pass: Check cache miss rate within 15%
    # Percent diff=12
    # Number of bits: 5
    # Average LLC val: 73269248
    # Cache span (bytes): 83886080
    ok 1 CMT: test

Link: https://lore.kernel.org/r/b160592179f88069cdc679563e152007998a0d76.1775266384.git.reinette.chatre@intel.com
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/aO+7MeSMV29VdbQs@e133380.arm.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

Input: tsc2007 - reduce I2C transactions for Z2 read

The current implementation sends a separate power-down command
after reading the Z2 value, resulting in an extra I2C
transaction per measurement cycle.

The TSC2007 command byte contains a 2-bit power-down mode
selection field. By selecting the power-down state in the Z2
measurement command, the device powers down after the Z2 A/D
conversion completes, eliminating the subsequent power-down
transaction.

This reduces the number of I2C transactions by one per touch
measurement cycle, decreasing I2C bus overhead and improving
touch sampling performance.

Signed-off-by: Yuki Horii <yuuki198708@gmail.com>
Tested-by: Andreas Kemnade <andreas@kemnade.info> # GTA04
Link: https://patch.msgid.link/20260410074100.1660-1-horiiyuk@ishida.co.jp
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

regulator: Add PM8150 PMIC support

Rakesh Kota <rakesh.kota@oss.qualcomm.com> says:

PM8150 is a power management IC. It is used in shikra boards.

Link: https://patch.msgid.link/20260429-add_pm8150_regulators-v1-0-9879c0967cf0@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: qcom_smd: Add PM8150 regulators

The PM8150 is found on boards with shikra SoCs and It
provides 10 SMPS and 18 LDO regulators.

Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260429-add_pm8150_regulators-v1-2-9879c0967cf0@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: dt-bindings: qcom,smd-rpm-regulator: Document PM8150 IC

Document the pm8150 compatible string and available regulators in
the QCOM SMD RPM regulator documentation.

Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com>
Link: https://patch.msgid.link/20260429-add_pm8150_regulators-v1-1-9879c0967cf0@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>

sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN

Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
affinity management"), kthreads default to use the HK_TYPE_DOMAIN
cpumask. IOW, it is no longer affected by the setting of the nohz_full
boot kernel parameter.

That means HK_TYPE_KTHREAD should now be an alias of HK_TYPE_DOMAIN
instead of HK_TYPE_KERNEL_NOISE to correctly reflect the current kthread
behavior. Make the change as HK_TYPE_KTHREAD is still being used in
some networking code.

Fixes: 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>