Maulik Shah [Tue, 28 Apr 2026 12:14:58 +0000 (17:44 +0530)]
pinctrl: qcom: Fix wakeirq map by removing disconnected irqs for sm8150
PDC interrupts 122-125 were meant for ibi_i3c wakeup but sm8150 do not
support i3c. GPIOs 39,51,88 and 144 are also connected to different PDC
pin and already reflected in the wake irq map.
Remove the unsupported wakeup interrupts from the map.
Felix Gu [Mon, 4 May 2026 14:53:26 +0000 (22:53 +0800)]
pinctrl: sunxi: fix regulator leak in sunxi_pmx_request() error path
In the error path of sunxi_pmx_request(), the code calls
regulator_put(s_reg->regulator) to release the regulator. However,
s_reg->regulator is only assigned after a successful regulator_enable().
This causes a memory leak: the regulator obtained via regulator_get()
is never properly released when regulator_enable() fails.
Fixes: dc1445584177 ("pinctrl: sunxi: Fix and simplify pin bank regulator handling") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
Myeonghun Pak [Fri, 24 Apr 2026 12:40:39 +0000 (21:40 +0900)]
drm/tve200: Fix probe cleanup after register failure
tve200_modeset_init() creates a panel bridge and initializes the DRM
mode config before tve200_probe() registers the DRM device. If
drm_dev_register() fails, probe returns an error and the driver's remove
callback is not called, so those modeset resources are left behind.
Unwind the panel bridge and mode config on that failure path before
disabling the clock and dropping the DRM device reference.
Because the default console's baud rate is not set, defconfig kernels do
not have any serial output on this platform. Set the baud rate to
115200, matching what is used by U-Boot etc on this platform.
Suggested-by: Vivian Wang <wangruikang@iscas.ac.cn> Fixes: d60d57ab6b2a8 ("riscv: dts: spacemit: add Banana Pi BPI-F3 board device tree") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Yixun Lan <dlan@kernel.org> Link: https://lore.kernel.org/r/20260430-reword-overstep-3be08b7eab25@spud Signed-off-by: Yixun Lan <dlan@kernel.org>
The return value of vsnprintf() and bstr_printf() can overflow INT_MAX
and return a minus value. In the @size is checked input overflow, but
it does not check the output, which is expected required size.
This should never happen but it should be checked and limited.
lib/vsprintf: Fix to check field_width and precision
Check the field_width and presition correctly. Previously it depends
on the bitfield conversion from int to check out-of-range error.
However, commit 938df695e98d ("vsprintf: associate the format state
with the format pointer") changed those fields to int.
We need to check the out-of-range correctly without bitfield
conversion.
Fixes: 938df695e98d ("vsprintf: associate the format state with the format pointer") Reported-by: David Laight <david.laight.linux@gmail.com> Closes: https://lore.kernel.org/all/20260318151250.40fef0ab@pumpkin/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/177452712047.197965.16376597502504928495.stgit@devnote2 Signed-off-by: Petr Mladek <pmladek@suse.com>
Juergen Gross [Tue, 5 May 2026 08:06:53 +0000 (10:06 +0200)]
x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()
When fixing a conflict in xen_e820_resolve_conflicts(), the loop over
the E820 map entries needs to be restarted, as the E820 map will have
been modified by the fix. Otherwise entries might be skipped by
accident.
The current implementation of the heartbeat error injection uses
adf_disable_arb_thd() to stop a specific accelerator engine thread
from processing requests. This does not reliably prevent the device
from generating responses.
Fix the error injection by disabling the device arbiter through
exit_arb() instead. This properly simulates a device failure by
stopping all arbitration, which results in missing responses for
sent requests.
Remove the now unused adf_disable_arb_thd() function and its
declaration.
Julian Braha [Tue, 31 Mar 2026 12:22:14 +0000 (13:22 +0100)]
keys: cleanup dead code in Kconfig for FIPS_SIGNATURE_SELFTEST
There is already an 'if ASYMMETRIC_KEY_TYPE' condition wrapping
FIPS_SIGNATURE_SELFTEST, making the 'depends on' statement a
duplicate dependency (dead code).
I propose leaving the outer 'if ASYMMETRIC_KEY_TYPE...endif' and removing
the individual 'depends on' statement.
This dead code was found by kconfirm, a static analysis tool for Kconfig.
Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ensure that all devices currently offline are purged correctly.
Previously, purging logic relied on the internal FSM state to
determine whether a device was offline. However, devices with a
target state of offline could be skipped if CIO internal
processing was still ongoing during the purge operation.
Update the purge decision logic to rely on the online variable
in the cdev structure instead of the internal FSM state,
providing a more reliable indication of actual device
availability.
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
rhashtable_insert_rehash() allocates a new bucket table
with GFP_ATOMIC, as it is called from an RCU read-side
critical section.
If rhashtable_rehash_attach() then fails, the new table
is freed via kvfree(). This is unsafe, since kvfree() may
fall back to vfree() for vmalloc-backed allocations, which
can sleep and trigger:
BUG: sleeping function called from invalid context
Add bucket_table_free_atomic(), which uses kvfree_atomic()
so the table can be freed safely from non-sleeping context.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kvmalloc() now supports non-sleeping GFP flags, including
the vmalloc fallback path. This means it may return vmalloc
memory even for GFP_ATOMIC and GFP_NOWAIT allocations.
Freeing such memory with kvfree() may then end up calling
vfree(), which is not safe for non-sleeping contexts.
Introduce kvfree_atomic() helper for such cases. It mirrors
kvfree(), but uses vfree_atomic() for vmalloced memory.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
rhashtable: drop ht->mutex in rhashtable_free_and_destroy()
rhashtable_free_and_destroy() is a single-shot teardown routine:
cancel_work_sync() has already quiesced the deferred rehash worker, and
the function's documented contract requires the caller to guarantee no
other concurrent access to the rhashtable. Under those conditions
ht->mutex is not protecting anything -- taking it is a leftover from
the original teardown path.
That leftover is actively harmful: it closes a circular lock-class
dependency with fs_reclaim. The deferred rehash worker takes ht->mutex
and then allocates GFP_KERNEL memory in bucket_table_alloc(),
establishing
&ht->mutex -> fs_reclaim
After commit b32c4a213698 ("xattr: add rhashtable-based simple_xattr
infrastructure") introduced simple_xattr_ht_free(), which calls
rhashtable_free_and_destroy(), the simple_xattrs teardown became
reachable from evict() under the dcache shrinker. The subsequent
per-subsystem adaptations made the reverse edge concrete in three
independent code paths:
* commit 52b364fed6e1 ("shmem: adapt to rhashtable-based simple_xattrs with lazy allocation")
* commit 5bd97f5c5f24 ("kernfs: adapt to rhashtable-based simple_xattrs with lazy allocation")
* commit 50704c391fbf ("pidfs: adapt to rhashtable-based simple_xattrs")
Any of the three closes the cycle
fs_reclaim -> &ht->mutex
which lockdep reports as follows. This particular splat was observed
organically on a workstation kernel built from vfs-7.1-rc1.xattr at
~35h uptime under normal mixed workload, with CONFIG_PROVE_LOCKING=y.
The path happens to go through kernfs:
WARNING: possible circular locking dependency detected
7.0.0-faeab166167f-with-fixes-v1+ #191 Tainted: G U
kswapd0/243 is trying to acquire lock: ffff8882e475c0f8 (&ht->mutex){+.+.}-{4:4},
at: rhashtable_free_and_destroy+0x36/0x740
but task is already holding lock: ffffffffa8ad1d00 (fs_reclaim){+.+.}-{0:0},
at: balance_pgdat+0x995/0x1600
the existing dependency chain (in reverse order) is:
Note that lockdep tracks lock classes, not instances: the two
&ht->mutex sites are on different rhashtable objects (the deferred
worker was triggered by some unrelated rhashtable growth), but because
rhashtable_init() uses a single static lockdep key for all rhashtables,
this is a real class-level cycle. Once reported, lockdep disables
itself for the remainder of the boot, masking any subsequent locking
bugs.
Drop the mutex. After cancel_work_sync() the rehash worker is quiesced
and, per this function's contract, no other concurrent access is
possible; the tables are therefore owned exclusively by this function
and can be walked without any lock held.
Switch the table walks from rht_dereference() (which requires
ht->mutex to be held under CONFIG_PROVE_RCU) to rcu_dereference_raw(),
which has no lockdep annotation. rht_ptr_exclusive() already uses
rcu_dereference_protected(p, 1) and needs no change.
This is the only place in lib/rhashtable.c where &ht->mutex is
acquired from a path reachable under fs_reclaim; the deferred worker
is the only other site and it is the forward edge. Removing the
acquisition here therefore eliminates the class cycle for all three
subsystems that use simple_xattrs, not just the one in the splat
above. No locking-semantics change is introduced for correct users;
incorrect users would already be racing with rehash worker completion
regardless of the mutex.
Synthetic reproduction of the splat within a few-minute window was
unsuccessful across several attempts (tmpfs and kernfs zombies via
cgroupfs with open-fd-through-rmdir, with and without swap, up to
~60k reclaim-path executions of simple_xattr_ht_free() in a single
run), consistent with the rare coincidence-of-edges profile of the
bug: the forward edge is already registered in /proc/lockdep on any
idle system via rht_deferred_worker, but the reverse edge requires
evict() to complete kernfs_put()'s final release inside the fs_reclaim
critical section, which in my attempts was ordered against rather than
interleaved with the worker.
Jens Axboe [Mon, 4 May 2026 14:34:32 +0000 (08:34 -0600)]
block: only read from sqe on initial invocation of blkdev_uring_cmd()
This passthrough helper currently only supports discards. Part of that
command is the start and length, which is read from the SQE. It does
so on every invocation, where it really should just make it stable
on the first invocation. This avoids needing to copy the SQE upfront,
as we only really need those two 8b values stored in our per-req
payload.
Ard Biesheuvel [Fri, 1 May 2026 07:16:38 +0000 (09:16 +0200)]
x86/efi: Restore IRQ state in EFI page fault handler
The kernel's softirq API does not permit re-enabling softirqs while IRQs
are disabled. The reason for this is that local_bh_enable() will not
only re-enable delivery of softirqs over the back of IRQs, it will also
handle any pending softirqs immediately, regardless of whether IRQs are
enabled at that point.
For this reason, commit
d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
disables softirqs only when IRQs are enabled, as it is not permitted
otherwise, but also unnecessary, given that asynchronous softirq
delivery never happens to begin with while IRQs are disabled.
However, this does mean that entering a kernel mode FPU section with
IRQs enabled and leaving it with IRQs disabled leads to problems, as
identified by Sashiko [0]: the EFI page fault handler is called from
page_fault_oops() with IRQs disabled, and thus ends the kernel mode FPU
section with IRQs disabled as well, regardless of whether IRQs were
enabled when it was started. This may result in schedule() being called
with a non-zero preempt_count, causing a BUG().
So take care to re-enable IRQs when handling any EFI page faults if they
were taken with IRQs enabled.
Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ivan Hu <ivan.hu@canonical.com> Cc: x86@kernel.org Cc: <stable@vger.kernel.org> Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Sakari Ailus [Sat, 21 Mar 2026 21:41:50 +0000 (23:41 +0200)]
media: v4l2-subdev: Fail {enable,disable}_streams and s_streaming nicely
If a sub-device does not set enable_streams() and disable_streams() pad
ops while it sets the s_stream() video op to
v4l2_subdev_s_stream_helper(), enabling or disabling streaming either way
on the sub-device will result calling v4l2_subdev_s_stream_helper() and
v4l2_subdev_{enable,disable}_streams() recursively, exhausting the stack.
Return -ENOIOCTLCMD in this case to handle the situation gracefully.
Fixes: b62949ddaa52 ("media: subdev: Support single-stream case in v4l2_subdev_enable/disable_streams()") Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Guangshuo Li [Fri, 1 May 2026 19:00:05 +0000 (03:00 +0800)]
cpufreq: qcom-cpufreq-hw: Fix possible double free
qcom_cpufreq.data is allocated with devm_kzalloc() in probe() as an
array of per-domain data. qcom_cpufreq_hw_cpu_init() stores a pointer to
one element of this array in policy->driver_data.
qcom_cpufreq_hw_cpu_exit() currently calls kfree() on policy->driver_data.
This is not valid because the memory is devm-managed. For the first
domain, this can free the devm-managed allocation while the devres entry
is still active, leading to a possible double free when the platform
device is later detached. For other domains, the pointer may refer to an
element inside the array rather than the allocation base.
Remove the kfree(data) call and let devres release qcom_cpufreq.data.
This issue was found by a static analysis tool I am developing.
Fixes: 054a3ef683a1 ("cpufreq: qcom-hw: Allocate qcom_cpufreq_data during probe") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Kuan-Ting Chen [Mon, 4 May 2026 15:27:12 +0000 (23:27 +0800)]
xfrm: esp: avoid in-place decrypt on shared skb frags
MSG_SPLICE_PAGES can attach pages from a pipe directly to an skb. TCP
marks such skbs with SKBFL_SHARED_FRAG after skb_splice_from_iter(),
so later paths that may modify packet data can first make a private
copy. The IPv4/IPv6 datagram append paths did not set this flag when
splicing pages into UDP skbs.
That leaves an ESP-in-UDP packet made from shared pipe pages looking
like an ordinary uncloned nonlinear skb. ESP input then takes the no-COW
fast path for uncloned skbs without a frag_list and decrypts in place
over data that is not owned privately by the skb.
Mark IPv4/IPv6 datagram splice frags with SKBFL_SHARED_FRAG, matching
TCP. Also make ESP input fall back to skb_cow_data() when the flag is
present, so ESP does not decrypt externally backed frags in place.
Private nonlinear skb frags still use the existing fast path.
This intentionally does not change ESP output. In esp_output_head(),
the path that appends the ESP trailer to existing skb tailroom without
calling skb_cow_data() is not reachable for nonlinear skbs:
skb_tailroom() returns zero when skb->data_len is nonzero, while ESP
tailen is positive. Thus ESP output will either use the separate
destination-frag path or fall back to skb_cow_data().
Cache the dont_correlate() result once per symbol at the start of
correlate_symbols(). This reduces klp diff time on an arm64 LTO
vmlinux.o from 2m51s to 35s.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool: Improve and simplify prefix symbol detection
Only create prefix symbols for functions that have
__patchable_function_entries entries, since those are the only C
functions where prefix NOPs are intentional.
This both simplifies the detection and makes it more accurate.
Note that assembly functions using SYM_TYPED_FUNC_START() can also have
prefixed NOPs, but that macro already creates their __cfi_ symbols.
With CFI+CALL_PADDING, Clang places .Ltmp labels at the start of the NOP
padding (offset 5) between the __cfi_ prefix and the function entry
point. get_func_prefix() only checks the immediately previous symbol,
so the intervening .Ltmp label causes it to miss the __cfi_ prefix
symbol.
This results in klp-diff not cloning the kCFI type hash into the
livepatch module, causing a CFI failure at module load when calling
callback functions through indirect calls:
CFI failure at __klp_enable_patch+0xab/0x140
(target: pre_patch_callback+0x0/0x80 [livepatch_combined];
expected type: 0xde073954)
Instead of walking backward through the section's symbol list, just use
find_func_containing() for the byte before the function. This works now
that __cfi_ symbols are being grown by objtool to fill the padding.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool: Grow __cfi_* prefix symbols for all CFI+CALL_PADDING
For all CONFIG_CFI+CONFIG_CALL_PADDING configs, for C functions, the
__cfi_ symbols only cover the 5-byte kCFI type hash. After that there
also N bytes of NOP padding between the hash and the function entry
which aren't associated with any symbol.
The NOPs can be replaced with actual code at runtime. Without a symbol,
unwinders and tooling have no way of knowing where those bytes belong.
Grow the existing __cfi_* symbols to fill that gap.
Note that assembly functions with SYM_TYPED_FUNC_START() aren't affected
by this issue, their __cfi_ symbols also cover the padding.
Also, CONFIG_PREFIX_SYMBOLS has no reason to exist: CONFIG_CALL_PADDING
is what causes the compiler to emit NOP padding before function entry
(via -fpatchable-function-entry), so it's the right condition for
creating prefix symbols.
Remove CONFIG_PREFIX_SYMBOLS, as it's no longer needed. Simplify the
LONGEST_SYM_KUNIT_TEST dependency accordingly. Rework objtool's
arguments a bit to handle the variety of prefix/cfi-related cases.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Fix position-dependent checksums for non-relocated jumps/calls
When computing klp checksums, instructions with non-relocated jump/call
destination offsets are problematic because the offset values can change
when surrounding code has moved, causing the function to be incorrectly
marked as changed.
Specifically, that includes jumps from alternatives to the end of the
alternative, which from objtool's perspective are jumps to the end of
the alternative instruction block in the original function.
Note that 'jump_dest' jumps don't include sibling calls (those use
call_dest), nor do they include jumps to/from .cold sub functions (those
are cross-section and need a reloc).
Fix it by hashing the opcode bytes (excluding the immediate operand)
along with a position-independent representation of the destination.
For calls, use the function name, and for jumps, use the destination's
offset within its function.
[Note the "9 bit hole" comment was wrong: it has been 8 bits since
commit 70589843b36f ("objtool: Add option to trace function validation")
added the 'trace' field. Adding the 4-bit 'immediate_len' field now
leaves a 4-bit hole.]
Fixes: 0d83da43b1e1 ("objtool/klp: Add --checksum option to generate per-function checksums") Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Alternative replacement instructions awkwardly have insn->sym set to the
function they get patched to rather than the symbol (or rather lack
thereof) they belong to in the file.
This makes it difficult to know where a given instruction actually
lives.
Add a new insn_sym() helper which preserves the existing semantic of
insn->sym. Rename insn->sym to insn->_sym, which contains the actual
ELF binary symbol (or NULL, for alternative replacements) an instruction
lives in.
The private insn->_sym value will be needed for a subsequent patch.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Rewrite the symbol correlation code, using a tiered list of
deterministic strategies in a loop. For duplicately named symbols, each
tier applies a filter with the goal of finding a 1:1 deterministic
correlation between the original and patched version of the symbol.
The three matching strategies are:
find_twin(): A funnel of progressively tighter filters. Candidates
with the same demangled name are counted at four levels: name, scope
(local-vs-global), file (strict file association), and checksum
(unchanged functions). The widest level that yields a 1:1 match wins,
narrower levels are only tried when the wider level is ambiguous.
find_twin_suffixed(): Uses already-correlated LLVM symbol pairs to map
.llvm.<hash> suffixes from orig to patched. Because all promoted
symbols from the same TU share the same hash, one correlated pair
seeds the mapping for the entire TU.
find_twin_positional(): Last resort, matches symbols by position among
same-named candidates, similar to livepatch sympos. Used for data
objects like __quirk variables where no deterministic filter can
distinguish the candidates.
Overall this works much better than the existing algorithm, particularly
with LTO kernels.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
The checksum functionality has been moved to "objtool klp checksum"
which is now used by klp-build. Remove the now-dead --checksum and
--debug-checksum options from the default objtool command.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Use the new "objtool klp checksum" subcommand instead of injecting
--checksum into every objtool invocation via OBJTOOL_ARGS during the
kernel build.
This decouples checksum generation from the build, running it in
separate post-build passes, making the code (and the patch generation
pipeline itself) more modular.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Move the checksum functionality out of the main objtool command into a
new "objtool klp checksum" subcommand.
This has the benefit of making the code (and the patch generation
process itself) more modular.
For bisectability, both "objtool --checksum" and "objtool klp checksum"
work for now. The former will be removed after klp-build has been
converted to use the new subcommand.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool: Consolidate file decoding into decode_file()
decode_sections() relies on CFI and cfi_hash initialization done
separately in check(), making it unusable outside of check().
Consolidate the initialization into decode_sections() and rename it to
decode_file(), and make it global along with free_insns() and
insn_reloc() for use by other objtool components -- namely, the checksum
code which will be moving to another file.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Extricate checksum calculation from validate_branch()
In preparation for porting the checksum code to other arches, make its
functionality independent from the CFG reverse engineering code.
Move it into a standalone calculate_checksums() function which iterates
all functions and instructions directly, rather than being called inline
from do_validate_branch().
Since checksum_update_insn() is no longer called during CFG traversal,
it needs to manually iterate the alternatives.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Handle Clang .data..Lanon anonymous data sections
Clang generates anonymous data sections named .data..Lanon.<hash>.
These need section-symbol references in the same way as .data..Lubsan
(GCC) and .data..L__unnamed_ (Clang UBSAN) sections. Without this,
convert_reloc_sym() fails when processing relocations that reference
these sections.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool: Include libsubcmd headers directly from source tree
Instead of installing libsubcmd headers to a build output directory and
including from there, include directly from tools/lib/ where they
already exist. This fixes clangd indexing which otherwise can't find
libsubcmd headers.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Don't set sym->file for section symbols
Section symbols aren't grouped after their corresponding FILE symbols.
Their sym->file should really be NULL rather than whatever random FILE
happened to be last.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
SRC and OBJ are both set to $(pwd) and are always identical. The script
already enforces that klp-build runs from the kernel root directory, and
builds are done in-place, making these variables unnecessary.
Suggested-by: Song Liu <song@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
vDSO code runs in userspace and can't be livepatched. Such patches also
cause spurious "new function" errors due to generated files like
vdso*-image.c having unstable line numbers across builds.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
If a build error occurs and the user hits Ctrl-C while a large patch is
being reverted during cleanup, the cleanup EXIT trap gets re-triggered
and tries to re-revert the already partially-reverted patch. That
causes 'patch -R' to repeatedly prompt
"Unreversed patch detected! Ignore -R? [n]"
for each already-reverted hunk, with no way to break out.
Fix it by adding '--force' to the patch revert command in
revert_patch(), which causes it to silently ignore already-reverted
hunks. And ignore errors, as the cleanup is always best-effort.
For similar reasons, add to APPLIED_PATCHES before (rather than after)
applying the patch in apply_patch() so an interrupted apply will also
get cleaned up.
Fixes: d36a7343f4ba ("livepatch/klp-build: switch to GNU patch and recountdiff") Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
The errtrace option (combined with the ERR trap) already serves the same
function (and more) as errexit, so errexit is redundant. And it has
more pitfalls. Remove it.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
klp-build: Fix checksum comparison for changed offsets
The klp-build -f/--show-first-changed feature uses diff to compare
checksum log lines between original and patched objects. However, diff
compares entire lines, including the offset field. When a function is
at a different section offset, the offset field differs even though the
instruction checksum is identical, causing the wrong instruction to be
printed.
Only compare the checksum field when looking for the first changed
instruction. Also print both the original and patched offsets when they
differ.
Fixes: 78be9facfb5e ("livepatch/klp-build: Add --show-first-changed option to show function divergence") Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
If .config is out of date with the kernel source, 'make syncconfig'
hangs while waiting for user input on new config options. Detect the
mismatch and return an error.
Fixes: 6f93f7b06810 ("livepatch/klp-build: Fix inconsistent kernel version") Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool: Fix reloc hash collision in find_reloc_by_dest_range()
In find_reloc_by_dest_range(), hash collisions can cause a high-offset
relocation to appear when probing a low-offset hash bucket.
Only return early when the best match found so far genuinely belongs to
the current bucket (its offset is within the bucket's stride range).
Otherwise, continue scanning later buckets which may contain
lower-offset matches.
This ensures the first reloc in the range gets returned.
Fixes: 74b873e49d92 ("objtool: Optimize find_rela_by_dest_range()") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Don't correlate .rodata.cst* constant pool objects
Clang aggregates UBSAN type descriptors into shared anonymous
.data..L__unnamed_* sections. This data is used by UBSAN trap handlers.
When a changed function has an UBSAN bounds check, klp-diff clones the
entire UBSAN data section associated with the TU. Relocations within
the cloned section that reference named rodata objects in .rodata.cst*
(like 'exponent', 'pirq_ali_set.irqmap') become KLP relocations because
those objects now get correlated.
That results in a .klp.rela.vmlinux..data section which can easily have
thousands of KLP relocs, most of which are completely superfluous, used
by functions which aren't cloned to the patch module.
The .rodata.cst* sections are SHF_MERGE constant pool sections
containing small fixed-size data (lookup tables, bitmasks) that is only
read by value. Pointer identity is never relevant for these objects, so
correlating them is unnecessary.
Exclude .rodata.cst* objects from correlation so they get cloned as
local data instead of generating KLP relocations.
It might be possible to someday treat UBSAN data sections as special
sections, and only extract the few needed entries. But this works for
now.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Fix pointer comparisons for rodata objects
klp-diff treats all rodata as uncorrelated, so any reference to it uses
a duplicated copy rather than using a KLP reloc.
For the contents of the data itself, a duplicated copy is fine.
However, pointer comparisons (e.g., f->f_op == &foo_ops) are broken.
Fix it by correlating non-anonymous rodata objects.
Also, use a new find_symbol_containing_inclusive() helper for matching
the end of a symbol so bounds calculations don't get broken, for the
case where an array or other symbol's ending address is used as part of
a bounds calculation.
While these are really two distinct changes, they need to be done in the
same patch so as to avoid introducing bisection regressions.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Inline section_reference_needed() and is_reloc_allowed() into
convert_reloc_sym() and remove the redundant is_reloc_allowed() check in
clone_reloc().
Move the is_sec_sym() checks into the convert callees so they become
no-ops when the reloc is already in the right format. This allows
convert_reloc_sym() to unconditionally dispatch to the right converter
based on section type.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Move the sec->rodata marking from check.c to elf.c so it's set during
ELF reading rather than during the check pipeline. This makes the
rodata flag available to all objtool users, including klp-diff which
reads ELF files directly without running check().
Add an is_rodata_sec() helper to elf.h for consistency with
is_text_sec() and is_string_sec().
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Fix relocation conversion failures for R_X86_64_NONE
Objtool has some hacks which NOP out certain calls/jumps and replace
their relocations with R_X86_64_NONE. The klp-diff relocation
extraction code will error out when trying to copy these relocations due
to their negative addend, which would only makes sense for a PC-relative
branch instruction. Just ignore them.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
.kcfi_traps contains references to kCFI trap instruction locations.
When a KCFI type check fails at an indirect call, the trap handler looks
up the faulting address in this section.
Add it to the special sections list so the entries get extracted for the
changed functions they reference.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Fix extraction of text annotations for alternatives
Objtool is failing to extract text annotations which reference
.altinstr_replacement instructions:
1) Alternative replacement fake symbols are NOTYPE rather than FUNC,
and they don't have sym->included set, thus they aren't recognized
by should_keep_special_sym().
2) .discard.annotate_insn gets processed before .altinstr_replacement,
so the referenced (fake) symbols don't have clones yet.
Fix the first issue by checking for a valid clone instead of
sym->included and by accepting NOTYPE symbols when processing
.discard.annotate_insn.
Fix the second issue by deferring text annotation processing until after
the other special sections have been cloned.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Fix handling of zero-length .altinstr_replacement sections
When a section is empty (e.g. only zero-length alternative
replacements), there are no symbols to convert a section symbol
reference to. Skip the reloc instead of erroring out.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Josh Poimboeuf [Tue, 31 Mar 2026 04:31:47 +0000 (21:31 -0700)]
objtool/klp: Fix --debug-checksum for duplicate symbol names
find_symbol_by_name() only returns the first match, so
--debug-checksum=<func> silently ignores any subsequent duplicately
named functions after the first.
Fix that, along with a new for_each_sym_by_name() helper.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool: Replace iterator callback with for_each_sym_by_mangled_name()
Convert the callback-based iterate_sym_by_demangled_name() with a new
for_each_sym_by_demangled_name() macro. This eliminates the callback
struct/function and makes the code more compact and readable.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
create_fake_symbols() has two phases: creating symbols from
ANNOTATE_DATA_SPECIAL entries, and a fallback that uses sh_entsize for
special sections like .static_call_sites.
When .discard.annotate_data is absent, the function returns early,
skipping the entsize fallback and silently allowing unsupported
module-local static call keys through.
Fix it by jumping to the entsize phase instead of returning early.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Assisted-by: Claude:claude-4-opus Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Clang emits various .L-prefixed local symbols beyond .Ltmp*, such as
.L__const.* for local constant data. These are assembler-local labels
not present in kallsyms, so they can never be resolved at module load
time.
Broaden the check from .Ltmp* to all .L* symbols so they get cloned into
the patch module instead.
Acked-by: Song Liu <song@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Don't report uncorrelated functions as new
Clang LTO uses __UNIQUE_ID() to generate some uniquely named wrapper
functions, like initstubs. If they're uncorrelated, prevent them from
being reported as new functions and included unnecessarily.
Note that dont_correlate() already includes prefix functions, so prefix
functions are still being ignored here.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
With LTO, the initcall infrastructure generates __initstub__kmod_*
wrapper functions in .init.text. These are the LTO equivalent of
__initcall__kmod_* data pointers, which are already excluded from
correlation.
These are __init functions whose memory is freed after boot, so there's
no reason to include or reference them in a livepatch module.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Some arch/x86/crypto/*.S files define local .set/.equ constants that get
duplicated in vmlinux.o. This causes klp-diff to fail with "Multiple
correlation candidates" errors since it can't uniquely match these
between orig and patched builds.
Skip ABS symbols in dont_correlate(). They're purely compile-time
assembly constants that are never referenced by relocations, so they
don't need correlation.
Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
objtool/klp: Fix .data..once static local non-correlation
While there was once a section named .data.once, it has since been
renamed to .data..once with commit dbefa1f31a91 ("Rename .data.once to
.data..once to fix resetting WARN*_ONCE"). Fix it.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Joe Lawrence [Wed, 8 Apr 2026 14:49:19 +0000 (10:49 -0400)]
objtool/klp: Fix is_uncorrelated_static_local() for Clang
For naming function-local static locals, GCC uses <var>.<id>, e.g.
__already_done.15, while Clang uses <func>.<var> with optional .<id>,
e.g. create_worker.__already_done.111
The existing is_uncorrelated_static_local() check only matches the GCC
convention where the variable name is a prefix. Handle both cases by
checking for a prefix match (GCC) and by checking after the first dot
separator (Clang).
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Suraj Kandpal [Mon, 4 May 2026 10:11:18 +0000 (15:41 +0530)]
drm/i915/hdcp: Drop mgr->base.lock acquisition in intel_conn_to_vcpi()
Now that intel_conn_to_vcpi() reads the MST topology state via
drm_atomic_get_new_mst_topology_state(), the topology state belongs
to this atomic commit and is already serialized through the atomic
state's private object machinery. There is no need to additionally
take mgr->base.lock here.
Taking it from the HDCP enable path in commit_tail with
state->base.acquire_ctx is also unsafe: by this point the
acquire_ctx is no longer in a state where new modeset locks may be
acquired through it, which produces a modeset-lock splat on MST +
HDCP. Drop the drm_modeset_lock() call.
Suraj Kandpal [Mon, 4 May 2026 10:11:17 +0000 (15:41 +0530)]
drm/i915/hdcp: Use new MST topology state in intel_conn_to_vcpi()
intel_conn_to_vcpi() runs from the HDCP enable path in commit_tail
and looks up the VCPI via mgr->base.state. When an ALLOCATE_PAYLOAD
is being driven on the mgr just before HDCP enable, that payload
list is being mutated in place, so the lookup can miss the port and
trip drm_WARN_ON(!payload), causing HDCP to be programmed with
VCPI 0.
Use drm_atomic_get_new_mst_topology_state() to read the topology
state attached to this atomic commit (stable, decided in
atomic_check), and bail out cleanly when no topology state or
payload is present for this port instead of WARNing.
Various names for Qualcomm as a company are used in user-visible config
options. Switch to unified "Qualcomm" so it will be easier for users to
identify the options when for example running menuconfig.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add airoha_fe_get() and airoha_qdma_get() as utility routines for reading
a masked field from a specified register.
This is a non-functional refactor, no logical changes are introduced to
the existing codebase.
Daniel Golle [Sat, 2 May 2026 10:55:02 +0000 (11:55 +0100)]
net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
The .get_stats64 callback runs in atomic context, but on
MDIO-connected switches every register read acquires the MDIO bus
mutex, which can sleep:
[ 12.645973] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:609
[ 12.654442] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 759, name: grep
[ 12.663377] preempt_count: 0, expected: 0
[ 12.667410] RCU nest depth: 1, expected: 0
[ 12.671511] INFO: lockdep is turned off.
[ 12.675441] CPU: 0 UID: 0 PID: 759 Comm: grep Tainted: G S W 7.0.0+ #0 PREEMPT
[ 12.675453] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[ 12.675456] Hardware name: Bananapi BPI-R64 (DT)
[ 12.675459] Call trace:
[ 12.675462] show_stack+0x14/0x1c (C)
[ 12.675477] dump_stack_lvl+0x68/0x8c
[ 12.675487] dump_stack+0x14/0x1c
[ 12.675495] __might_resched+0x14c/0x220
[ 12.675504] __might_sleep+0x44/0x80
[ 12.675511] __mutex_lock+0x50/0xb10
[ 12.675523] mutex_lock_nested+0x20/0x30
[ 12.675532] mt7530_get_stats64+0x40/0x2ac
[ 12.675542] dsa_user_get_stats64+0x2c/0x40
[ 12.675553] dev_get_stats+0x44/0x1e0
[ 12.675564] dev_seq_printf_stats+0x24/0xe0
[ 12.675575] dev_seq_show+0x14/0x3c
[ 12.675583] seq_read_iter+0x37c/0x480
[ 12.675595] seq_read+0xd0/0xec
[ 12.675605] proc_reg_read+0x94/0xe4
[ 12.675615] vfs_read+0x98/0x29c
[ 12.675625] ksys_read+0x54/0xdc
[ 12.675633] __arm64_sys_read+0x18/0x20
[ 12.675642] invoke_syscall.constprop.0+0x54/0xec
[ 12.675653] do_el0_svc+0x3c/0xb4
[ 12.675662] el0_svc+0x38/0x200
[ 12.675670] el0t_64_sync_handler+0x98/0xdc
[ 12.675679] el0t_64_sync+0x158/0x15c
For MDIO-connected switches, poll MIB counters asynchronously using a
delayed workqueue every second and let .get_stats64 return the cached
values under a spinlock. A mod_delayed_work() call on each read
triggers an immediate refresh so counters stay responsive when queried
more frequently.
MMIO-connected switches (MT7988, EN7581, AN7583) are not affected
because their regmap does not sleep, so they continue to read MIB
counters directly in .get_stats64.
David Carlier [Sat, 2 May 2026 14:19:45 +0000 (15:19 +0100)]
psp: strip variable-length PSP header in psp_dev_rcv()
psp_dev_rcv() unconditionally removes a fixed PSP_ENCAP_HLEN, even
when psph->hdrlen indicates that the PSP header carries optional
fields. A frame whose PSP header advertises a non-zero VC or any
extension would therefore be silently mis-decapsulated: option bytes
would spill into the inner packet head and downstream parsing would
fail on a corrupted skb.
Compute the full PSP header length from psph->hdrlen, pull the
optional bytes into the linear region, and strip the whole header
when decapsulating. Optional fields (VC, ...) are still ignored,
just discarded with the rest of the header instead of leaking.
crypt_offset and the VIRT flag are intentionally not validated here
- callers know their device's PSP implementation and can decide.
Both in-tree callers gate on hardware-validated PSP, so this is a
correctness fix rather than a reachable corruption path under
current configurations.
Fixes: 0eddb8023cee ("psp: provide decapsulation and receive helper for drivers") Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260502141945.14484-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: phy: realtek: replace magic number with register bit macros
Replace magic number with register bit macros. The description of
the RTL8211B interrupt register is obtained from publicly available
datasheet (RTL8211B(L) Rev. 1.5 Datasheet)
mptcp: sockopt: increase seq in mptcp_setsockopt_all_sf
mptcp_setsockopt_all_sf() was missing a call to sockopt_seq_inc(). This
is required not to cause missing synchronization for newer subflows
created later on.
This helper is called each time a socket option is set on subflows, and
future ones will need to inherit this option after their creation.
Fixes: 51c5fd09e1b4 ("mptcp: add TCP_MAXSEG sockopt support") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260501-net-mptcp-misc-fixes-7-1-rc3-v1-4-b70118df778e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shardul Bankar [Fri, 1 May 2026 19:35:35 +0000 (21:35 +0200)]
mptcp: use MPTCP_RST_EMPTCP for ACK HMAC validation failure
When HMAC validation fails on a received ACK + MP_JOIN in
subflow_syn_recv_sock(), the subflow is reset with reason
MPTCP_RST_EPROHIBIT ("Administratively prohibited"). This is
incorrect: HMAC validation failure is an MPTCP protocol-level
error, not an administrative policy denial.
The mirror site on the client, in subflow_finish_connect(), already
uses MPTCP_RST_EMPTCP ("MPTCP-specific error") for the same kind of
HMAC failure on the SYN/ACK + MP_JOIN. Use the same reason on the
server side for symmetry and accuracy.
Shardul Bankar [Fri, 1 May 2026 19:35:34 +0000 (21:35 +0200)]
mptcp: use MPJoinSynAckHMacFailure for SynAck HMAC failure
In subflow_finish_connect(), HMAC validation of the server's HMAC
in SYN/ACK + MP_JOIN increments MPTCP_MIB_JOINACKMAC ("HMAC was
wrong on ACK + MP_JOIN") on failure. The function processes the
SYN/ACK, not the ACK; the matching MPTCP_MIB_JOINSYNACKMAC counter
("HMAC was wrong on SYN/ACK + MP_JOIN") exists but is not
incremented anywhere in the tree.
The mirror site on the server, subflow_syn_recv_sock(), already
uses JOINACKMAC correctly for ACK HMAC failure. Use JOINSYNACKMAC
at the SYN/ACK validation site so each counter reflects the packet
whose HMAC actually failed.
net: mana: hardening: Reject zero max_num_queues from GDMA_QUERY_MAX_RESOURCES
In a CVM environment, hardware responses cannot be trusted. The
GDMA_QUERY_MAX_RESOURCES command returns resource limits used to
determine the maximum number of queues.
In mana_gd_query_max_resources(), gc->max_num_queues is initialized
from num_online_cpus() and successively clamped by the hardware-reported
max_eq, max_cq, max_sq, max_rq, and num_msix_usable values. If any of
these hardware values is zero, gc->max_num_queues becomes zero and the
function returns success. This leads to a confusing failure later when
alloc_etherdev_mq() is called with zero queues, returning NULL and
producing a misleading -ENOMEM error.
Add an explicit zero check for gc->max_num_queues after all clamping
steps and return -ENOSPC for a clear early failure, consistent with the
existing gc->num_msix_usable <= 1 guard.
net: mana: hardening: Reject zero max_num_queues from MANA_QUERY_VPORT_CONFIG
As a part of MANA hardening for CVM, validate that max_num_sq and
max_num_rq returned by MANA_QUERY_VPORT_CONFIG are not zero. These
values flow into apc->num_queues, which is used as an allocation count
and loop bound. A zero value would result in zero-size allocations and
incorrect driver behavior.
====================
net: bridge: mcast: support exponential field encoding
Description:
This series addresses a mismatch in how multicast query
intervals and response codes are handled across IPv4 (IGMPv3)
and IPv6 (MLDv2). While decoding logic currently exists,
the corresponding encoding logic is missing during query
packet generation. This leads to incorrect intervals being
transmitted when values exceed their linear thresholds.
The patches introduce a unified floating-point encoding
approach based on RFC3376 and RFC3810, ensuring that large
intervals are correctly represented in QQIC and MRC fields
using the exponent-mantissa format.
Key Changes:
* ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation
Removes legacy macros in favor of a cleaner, unified
calculation for retrieving intervals from encoded fields,
improving code maintainability.
* ipv6: mld: rename mldv2_mrc() and add mldv2_qqi()
Standardizes MLDv2 terminology by renaming mldv2_mrc()
to mldv2_mrd() (Maximum Response Delay) and introducing
a new API mldv2_qqi for QQI calculation, improving code
readability.
* ipv4: igmp: encode multicast exponential fields
Introduces the logic to dynamically calculate the exponent
and mantissa using bit-scan (fls). This ensures QQIC and
MRC fields (8-bit) are properly encoded when transmitting
query packets with intervals that exceed their respective
linear threshold value of 128 (for QQI/MRT).
* ipv6: mld: encode multicast exponential fields
Applies similar encoding logic for MLDv2. This ensures
QQIC (8-bit) and MRC (16-bit) fields are properly encoded
when transmitting query packets with intervals that exceed
their respective linear thresholds (128 for QQI; 32768
for MRD).
* selftests: net: bridge: add MRC and QQIC field encoding tests
Updates bridge selftests to validate both linear and non-linear
(exponential) encoding for MRC and QQIC fields, ensuring
protocol compliance across IGMPv3 and MLDv2.
Impact:
These changes ensure that multicast queriers and listeners
stay synchronized on timing intervals, preventing protocol
timeouts or premature group membership expiration caused
by incorrectly formatted packet headers.
====================
Ujjal Roy [Sat, 2 May 2026 13:19:06 +0000 (13:19 +0000)]
selftests: net: bridge: add MRC and QQIC field encoding tests
Enhance vlmc_query_intvl_test and vlmc_query_response_intvl_test in
bridge_vlan_mcast.sh to validate IGMPv3/MLDv2 protocol compliance for
MRC and QQIC field encoding across both linear and exponential ranges.
TEST: Vlan multicast snooping enable [ OK ]
TEST: Vlan mcast_query_interval global option default value [ OK ]
TEST: Number of tagged IGMPv2 general query [ OK ]
TEST: IGMPv3 QQIC linear value 60(s) [ OK ]
TEST: MLDv2 QQIC linear value 60(s) [ OK ]
TEST: IGMPv3 QQIC non linear value 160(s) [ OK ]
TEST: MLDv2 QQIC non linear value 160(s) [ OK ]
TEST: Vlan mcast_query_response_interval global option default value [ OK ]
TEST: IGMPv3 MRC linear value of 60(x0.1s) [ OK ]
TEST: MLDv2 MRC linear value of 24000(ms) [ OK ]
TEST: IGMPv3 MRC non linear value of 240(x0.1s) [ OK ]
TEST: MLDv2 MRC non linear value of 48000(ms) [ OK ]
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Link: https://patch.msgid.link/20260502131907.987-6-royujjal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ujjal Roy [Sat, 2 May 2026 13:19:05 +0000 (13:19 +0000)]
ipv6: mld: encode multicast exponential fields
In MLD, MRC and QQIC fields are not correctly encoded when
generating query packets. Since the receiver of the query
interprets these fields using the MLDv2 floating-point
decoding logic, any value that exceeds the linear threshold
is incorrectly parsed as an exponential value, leading to
an incorrect interval calculation.
Encode and assign the corresponding protocol fields during
query generation. Introduce the logic to dynamically
calculate the exponent and mantissa using bit-scan (fls).
This ensures MRC (16-bit) and QQIC (8-bit) fields are
properly encoded when transmitting query packets with
intervals that exceed their respective linear thresholds
(32768 for MRD; 128 for QQI).
RFC3810: If Maximum Response Code >= 32768, the Maximum
Response Code field represents a floating-point value as
follows:
0 1 2 3 4 5 6 7 8 9 A B C D E F
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| exp | mant |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RFC3810: If QQIC >= 128, the QQIC field represents a
floating-point value as follows:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1| exp | mant |
+-+-+-+-+-+-+-+-+
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Link: https://patch.msgid.link/20260502131907.987-5-royujjal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ujjal Roy [Sat, 2 May 2026 13:19:04 +0000 (13:19 +0000)]
ipv4: igmp: encode multicast exponential fields
In IGMP, MRC and QQIC fields are not correctly encoded
when generating query packets. Since the receiver of the
query interprets these fields using the IGMPv3 floating-
point decoding logic, any value that exceeds the linear
threshold is incorrectly parsed as an exponential value,
leading to an incorrect interval calculation.
Encode and assign the corresponding protocol fields during
query generation. Introduce the logic to dynamically
calculate the exponent and mantissa using bit-scan (fls).
This ensures MRC and QQIC fields (8-bit) are properly
encoded when transmitting query packets with intervals
that exceed their respective linear threshold value of
128 (for MRT/QQI).
RFC3376: for both MRC and QQIC, values >= 128 represent
the same floating-point encoding as follows:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1| exp | mant |
+-+-+-+-+-+-+-+-+
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Link: https://patch.msgid.link/20260502131907.987-4-royujjal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>