power: sequencing: pcie-m2: Allow creating serdev for multiple PCI devices
Current code makes it possible to create serdev for only one PCI device.
But for scaling this driver, it is necessary to allow creating serdev for
multiple PCI devices.
Hence, add provision for it by creating 'struct pwrseq_pci_dev' for each
PCI device that requires serdev and add them to
'pwrseq_pcie_m2_ctx::pci_devices' list.
Juergen Gross [Tue, 26 May 2026 15:05:13 +0000 (17:05 +0200)]
x86/xen: Get rid of last XEN_LAZY_MMU uses
There are only very few use cases of XEN_LAZY_MMU left. Get rid of
them in order to avoid having to call enter_lazy(XEN_LAZY_MMU) and
leave_lazy(XEN_LAZY_MMU).
The query in xen_batched_set_pte() can be replaced by using
is_lazy_mmu_mode_active() instead.
As xen_flush_lazy_mmu() will be called only with lazy MMU mode being
active, the test for the lazy mode can just be dropped.
In xen_start_context_switch() and xen_end_context_switch() use
__task_lazy_mmu_mode_pause() and __task_lazy_mmu_mode_resume(),
allowing to drop xen_enter_lazy_mmu() and xen_leave_lazy_mmu()
completely.
Call arch_flush_lazy_mmu_mode() from arch_leave_lazy_mmu_mode(), as
this is the only required action now.
Drop the lazy mmu enter and leave paravirt hooks, leaving the flush
hook as the only needed one.
Juergen Gross [Fri, 22 May 2026 15:21:14 +0000 (17:21 +0200)]
x86/xen: Remove Xen debugfs support
The only Xen file in debugfs is for dumping the p2m table when running
as a Xen PV guest. This might have been useful when the PV code was
young, but there haven't been any p2m related bugs requiring the p2m
dump since ages.
Juergen Gross [Fri, 22 May 2026 15:21:12 +0000 (17:21 +0200)]
x86/xen: Guard PV-only stuff in xen-ops.h with CONFIG_XEN_PV
A lot of arch/x86/xen/xen-ops.h is meant to be for PV only. Guard all
of it with CONFIG_XEN_PV in order to avoid someone misusing it in
non-PV builds. Additionally any 64-bit tests for now guarded items can
be dropped.
Move the enum pt_level definition to mmu_pv.c, as it is used only there.
Len Bao [Sat, 23 May 2026 13:28:01 +0000 (13:28 +0000)]
xen/mcelog: mark g_physinfo, ncpus and xen_mce_chrdev_device as __ro_after_init
The 'g_physinfo' and 'ncpus' variables are initialized only during the
init phase in the 'bind_virq_for_mce' function and never changed. So,
mark them as __ro_after_init.
The 'xen_mce_chrdev_device' variable is initialized only in the
declaration and never changed. So, this variable could be 'const', but
using the 'misc_register' and 'misc_deregister' functions discards the
'const' qualifier. Therefore, as an alternative, mark it as
__ro_after_init.
Signed-off-by: Len Bao <len.bao@gmx.us> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20260523132802.25391-1-len.bao@gmx.us>
Bryam Vargas [Sat, 6 Jun 2026 07:43:45 +0000 (07:43 +0000)]
wifi: mac80211: bound S1G TIM PVB walk to the TIM element
ieee80211_s1g_check_tim() parses the S1G Partial Virtual Bitmap (PVB) of a
received TIM element. The TIM is handed in as the element payload:
ieee802_11_parse_elems_full() stores elems->tim = elem->data and
elems->tim_len = elem->datalen (net/mac80211/parse.c), so the valid bytes
are [tim, tim + tim_len).
When walking the encoded blocks the function passes the walker an end
sentinel of (const u8 *)tim + tim_len + 2, i.e. two bytes past the end of
the element. ieee80211_s1g_find_target_block() loops while (ptr + 1 <= end)
and dereferences ptr (and the per-mode ieee80211_s1g_len_*() helpers read
*ptr), so it can read up to two bytes beyond the TIM element -- an
out-of-bounds read of adjacent skb/heap data when the TIM is the last
element in the frame. The +2 appears to account for the element id/len
header, but tim already points past that header at the element payload, so
the addend is wrong.
Pass the correct element end, (const u8 *)tim + tim_len.
xen/platform-pci: Simplify initialization of pci_device_id array
Instead of using a list initializer---that is hard to read unless you know
the structure of struct pci_device_id by heart---use the PCI_VDEVICE
macro to assign the needed values and drop all explicit but unneeded
zeros.
This doesn't introduce any changes to the compiled result of the array.
Akashdeep Kaur [Wed, 3 Jun 2026 07:24:38 +0000 (12:54 +0530)]
cpufreq: ti: Add EPROBE_DEFER for K3 SoCs
On K3 SoCs, ti-cpufreq relies on k3-socinfo to register the SoC
device before soc_device_match() can return valid revision
information. If ti-cpufreq probes before k3-socinfo,
soc_device_match() returns NULL, leading to incorrect CPU frequency
scaling behavior.
Add a needs_k3_socinfo flag to ti_cpufreq_soc_data (similar to
the existing multi_regulator pattern) to defer probe when k3-socinfo
hasn't registered the SoC device yet.
Taniya Das [Fri, 22 May 2026 15:16:23 +0000 (20:46 +0530)]
cpufreq: qcom: Add cpufreq scaling support for Qualcomm Shikra SoC
The Qualcomm Shikra cpufreq hardware is functionally identical to EPSS,
but supports only up to 12 frequency lookup table (LUT) entries. When all
12 entries are populated, the existing repetitive LUT entry check may read
beyond valid entries and expose incorrect frequencies. Hence, introduce
shikra_epss_soc_data that reuses EPSS configuration with appropriate LUT
entries limit.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Qualcomm Shikra cpufreq hardware is functionally identical to EPSS,
but supports only up to 12 frequency lookup table (LUT) entries. Introduce
Shikra specific bindings to represent this constrained EPSS variant.
m68k: coldfire: use ColdFire specifc IO access in SoC code
Convert all ColdFire specific SoC/board setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.
m68k: coldfire: use ColdFire specifc IO access in system code
Convert all ColdFire specific system setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.
With the basic ColdFire IO register functions now consistently named
with a "mcf_read"/"mcf_write" prefix it makes sense to name the timers
internal access defines the same way. Convert the local __raw_readtrr
and __raw_writetrr defines to use the consistent prefixes too. Thus
the change is:
m68k: coldfire: use ColdFire specifc IO access in timer code
Convert all ColdFire specific timer setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.
m68k: coldfire: use ColdFire specifc IO access in interrupt code
Convert all ColdFire specific interrupt setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.
m68k: coldfire: use ColdFire specific IO access in headers
Convert all m68k/ColdFire specific header file code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.
m68k: coldfire: create IO access functions for internal registers
The internal peripheral registers contained in all varieties of ColdFire
SoCs require simple big endian access ranging in sizes from 8, 16 and 32
bit. Currently there is a mixture of IO access methods used across the
various CPU support code, some using readx/writex and some using the
simpler __raw_readx/__raw_writew.
The readx/writex use cases are particularly kludgy in that they contain
code to differentiate internal register access and other general attached
peripheral register access - say on a PCI bus. In effect this means that
the readx/writex family for ColdFire is non-standard. This ultimately
ends up causing problems with definitions of other IO access support
functions like ioreadx/ioreadxbe/iowritex/iowritexbe which in the
generic case are defined in terms of readx/writex.
Create a set of internal only register access methods to ultimately
replace all internal register access code. The new access functions
mirror the existing readx/writex family but using the preferred 8/16/32
suffixes.
m68k: defconfig: add config for SnapGear/NETtel board
Add a default configuration for a basic M5307 based NETtel board.
This is primarily to improve defconfig build coverage. This platforms
uses the SMSC ethernet drivers for network ports, and has a few other
minor quirks that make it different from other ColdFire platforms.
Add a default configuration for a basic M54418 based EVB board.
The SoC has been supported for a long time but there is no default
configuration. Create one to improve build and test coverage.
Add a default configuration file for the Freescale M5329EVB board.
Although the SoC type has been supported for a long time there has been
no defconfig for the base platform. Create one to give better build
and test coverage.
m68k: coldfire: select legacy gpiolib interface for mcfqspi
The common coldfire code uses the old GPIO number based interfaces for
at least the QSPI chipselect lines. Select the required Kconfig symbol
to keep it building when that becomes optional.
Apparently there are no devices attached to a QSPI controller in any of
the coldfire boards, so this is not actually used in upstream kernels.
David Gow [Sat, 6 Jun 2026 02:03:15 +0000 (10:03 +0800)]
kunit:tool: Don't write to stdout when it should be disabled
The kunit_parser module accepts a 'printer' object which is used as a
destination for all output. This is typically set to stdout, so that the
parsed results are visible, but can be set to a special 'null_printer' to
implement options where not all results are always printed.
However, there are a few places where use of stdout is hardcoded, notably
in handling crashed tests and in outputting the colour escape sequences.
Properly use the specified printer for all output. This is okay for the
colour handling (as this is already gated behind isatty() anyway), and also
for the crash handling, as cases where printer != stdout are separately
printed afterwards.
David Gow [Sat, 6 Jun 2026 01:38:18 +0000 (09:38 +0800)]
kunit: tool: Add (primitive) support for outputting JUnit XML
This is used by things like Jenkins and other CI systems, which can
pretty-print the test output and potentially provide test-level comparisons
between runs.
The implementation here is pretty basic: it only provides the raw results,
split into tests and test suites, and doesn't provide any overall metadata.
However, CI systems like Jenkins can ingest it and it is already useful.
David Gow [Sat, 6 Jun 2026 01:38:17 +0000 (09:38 +0800)]
kunit: tool: Parse and print the reason tests are skipped
When a KUnit test (or other KTAP test) is skipped, a "skip reason" can be
provided. kunit.py has never done anything with this, ignoring anything
included in the KTAP output after the 'SKIP' directive.
Since we have it, and it's used, print it in a nice friendly yellow in
parentheses after a skipped test's name.
(And, by parsing it, it can be included in the JUnit results as well.)
This series fixes AA-deadlocks where NMI and tracepoint BPF programs
re-enter the per-CPU or global LRU lock already held on the same CPU
(syzbot c69a0a2c816716f1e0d5, 18b26edb69b2e19f3b33).
Patch 1 converts every LRU lock site to rqspinlock_t
and adds explicit recovery for some failures so no node leaks.
Patch 2 refreshes Documentation/bpf/map_lru_hash_update.dot to show
the new rqspinlock failure exits and recovery routes.
Patch 3 introduces a stress test.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v3:
- Removed RFC tag
- Link to v2: https://patch.msgid.link/20260603-lru_map_spin-v2-0-7060cfb6cdac@meta.com
Changes in v2:
- Patch 1: __bpf_lru_node_move_in() now clears pending_free only when
moving to the FREE list.
- Patch 3: address sashiko's feedback.
- Link to v1: https://patch.msgid.link/20260528-lru_map_spin-v1-0-4f52223170cf@meta.com
====================
Introduces stress test for bpf_lru_list that exercises
lock-failures and orphan-recovery, added by the LRU rqspinlock
conversion.
Runs three subtests: common LRU, per-CPU LRU lists (BPF_F_NO_COMMON_LRU),
and per-CPU LRU map. Each pins one userspace hammer per CPU and attaches
the perf_event NMI BPF prog (update+delete mix) on every online CPU.
Pre-fix, lockdep fires the "INITIAL USE -> IN-NMI" splat during stress.
After stress test, drain_then_verify_capacity() drains every key
and refills the lru map.
A stranded node on any CPU's pool would have forced eviction of
a just-inserted key on that CPU, surfacing here as a missing lookup.
Marked serial_ because per-CPU pinning and high-rate HW perf events
would perturb parallel tests.
Mykyta Yatsenko [Sun, 7 Jun 2026 20:30:42 +0000 (13:30 -0700)]
Documentation/bpf: Refresh map_lru_hash_update.dot for rqspinlock
Reflect the rqspinlock conversion and orphan-recovery paths added in
the previous commit:
- All LRU locks are rqspinlock_t; any acquire can fail (AA or
timeout). A shared "rqspinlock acquire failed" terminal collapses
to the existing -ENOMEM exit. Dashed arrows from each acquire site
mark the failure paths.
- The per-CPU local freelist is now lockless (free_llist).
- Post-steal: re-acquiring loc_l->lock to insert the stolen node
into the local pending list can fail; on failure the node is
published to free_llist instead of being orphaned, and the call
returns -ENOMEM.
- Steal-loop victim lock failure is silent: skip the victim and try
the next CPU.
Mykyta Yatsenko [Sun, 7 Jun 2026 20:30:41 +0000 (13:30 -0700)]
bpf: Fix NMI/tracepoint re-entry deadlock on lru locks
NMI and tracepoint BPF programs can re-enter the per-CPU or global
LRU lock that bpf_lru_pop_free()/push_free() already hold on the
same CPU, AA-deadlocking. Lockdep reports "inconsistent
{INITIAL USE} -> {IN-NMI}" on &l->lock (syzbot c69a0a2c816716f1e0d5)
and "possible recursive locking detected" on &loc_l->lock (syzbot 18b26edb69b2e19f3b33).
Prior trylock and rqspinlock based fixes (see links) were nacked
because compromised on reliability.
This patch converts every LRU lock site to rqspinlock_t and adds a
recovery path for some failure windows to avoid node leaks.
Failure recovery:
- *_pop_free top-level: return NULL; prealloc_lru_pop() already
treats that as no-free-element (-ENOMEM).
- Cross-CPU steal: skip the victim's locked loc_l, try next CPU.
- Post-steal local lock fail: publish stolen node to lockless
per-CPU free_llist; next pop on this CPU picks it up.
- push_free fail: mark node pending_free=1. __local_list_flush(),
__local_list_pop_pending() reclaim the node from pending_list.
__bpf_lru_list_shrink_inactive() reclaims the node from inactive
list. Nodes from active list are reclaimed by __bpf_lru_list_shrink()
or after __bpf_lru_list_rotate_active() demotes it to the inactive.
Now that the Rust KUnit tests are protected with Kconfig, update the
documentation to mention it.
Signed-off-by: Yury Norov <ynorov@nvidia.com> Reviewed-by: David Gow <david@davidgow.net> Acked-by: Gary Guo <gary@garyguo.net> Link: https://patch.msgid.link/20260417031531.315281-4-ynorov@nvidia.com
[ Fixed the paragraph by moving the new sentence above. Added gate
in the other example as well. Applied proper formatting. Reworded
slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
There are 6 individual Rust KUnit test suites (plus the doctests one). All
the tests are compiled unconditionally now, which adds ~200 kB to the
kernel image for me on x86_64. As Rust matures, this bloating will
inevitably grow.
Add Kconfig.test which includes a RUST_KUNIT_TESTS menu, and all
individual tests under it.
As usual, new tests are all enabled if KUNIT_ALL_TESTS=y.
Suggested-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Yury Norov <ynorov@nvidia.com> Reviewed-by: David Gow <david@davidgow.net> Acked-by: Gary Guo <gary@garyguo.net> Link: https://patch.msgid.link/20260417031531.315281-3-ynorov@nvidia.com
[ Fixed capitalization. Used singular for "API" for consistency.
Reworded to clarify these are suites and that there exists
the doctests one (which is the biggest at the moment by
far). - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
rust: tests: drop 'use crate' in bitmap and atomic KUnit tests
The following patch makes usage of macros::kunit_tests crate conditional
on the corresponding configs. When the configs are disabled, compiler
warns on unused crate. So, embed it in unit test declaration.
The wakeup condition if a min timeout is present and has expired is that
at least _one_ CQE was posted. Thus set the cq_tail target to
->cq_min_tail + 1. Without this commit a spurious wakeup can result in a
premature wakeup because io_should_wake() will return true even if _no_
CQE was posted at all.
Cc: Tip ten Brink <tip@tenbrinkmeijs.com> Fixes: e15cb2200b93 ("io_uring: fix min_wait wakeups for SQPOLL") Cc: stable@vger.kernel.org Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Link: https://patch.msgid.link/20260606201120.1441447-1-lk@c--e.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sun, 7 Jun 2026 22:05:47 +0000 (16:05 -0600)]
io_uring/kbuf: don't truncate end buffer for bundles
If buffers have been peeked for a bundle receive, the kernel will
truncate the end buffer, if the available length is shorter than the
buffer itself. This is unnecessary, as applications iterating bundle
receives must always use the minimum size of the buffer length and the
remaining number of bytes in the bundle. The examples in liburing do
that as well, eg examples/proxy.c.
If the kernel does truncate this buffer AND the current transfer fails,
then the buffer will be left with a smaller size than what is otherwise
available.
Just remove the buffer truncation, as it's not necessary in the first
place.
Linus Torvalds [Sun, 7 Jun 2026 20:12:29 +0000 (13:12 -0700)]
Merge tag 'x86-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Add more AMD Zen6 models (Pratik Vishwakarma)
- Avoid confusing bootup message by the Intel resctl enumeration
code when running on certain AMD systems (Tony Luck)
* tag 'x86-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Only check Intel systems for SNC
x86/CPU/AMD: Add more Zen6 models
Linus Torvalds [Sun, 7 Jun 2026 20:02:02 +0000 (13:02 -0700)]
Merge tag 'timers-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
- Fix the arch_inlined_clockevent_set_next_coupled() prototype in the
!CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST case (Naveen Kumar Chaudhary)
- Fix an off-by-1 bug in the sys_settimeofday() usecs validation code
(Naveen Kumar Chaudhary)
- Mark vdso_k_*_data pointers as __ro_after_init (Thomas Weißschuh)
- Fix livelock race in tmigr_handle_remote_up() (Amit Matityahu)
* tag 'timers-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Fix livelock in tmigr_handle_remote_up()
vdso/datastore: Mark vdso_k_*_data pointers as __ro_after_init
time: Fix off-by-one in settimeofday() usec validation
clockevents: Fix duplicate type specifier in stub function parameter
Linus Torvalds [Sun, 7 Jun 2026 19:54:37 +0000 (12:54 -0700)]
Merge tag 'sched-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fix from Ingo Molnar:
- Fix uninitialized stack variable in rseq_exit_user_update() (Qing
Wang)
* tag 'sched-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Fix using an uninitialized stack variable in rseq_exit_user_update()
Linus Torvalds [Sun, 7 Jun 2026 19:43:21 +0000 (12:43 -0700)]
Merge tag 'locking-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
- Fix a NULL pointer dereference bug in the FUTEX_CMP_REQUEUE_PI
code (Ji'an Zhou)
- Fix a NULL pointer dereference bug in the rtmutex code (Davidlohr
Bueso)
* tag 'locking-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Skip remove_waiter() when waiter is not enqueued
futex/requeue: Prevent NULL pointer dereference in remove_waiter() on self-deadlock
v8 changes:
- add back the btf_is_union check to btf_get_type_size [sashiko]
v7 changes:
- added ftrace_hash_count stub for !CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS cade [sashiko]
- selftests fixes [sashiko]
- use hash_ptr in select_trampoline_lock [sashiko]
- changed the check duplicate logic in check_dup_ids [sashiko]
- use sort_r_nonatomic in check_dup_ids [sashiko]
- added BPF_TRACE_FSESSION_MULTI to can_be_sleepable,
plus added testcase for sleepable fsession
- make bpf_tracing_multi_opts pointer fields as const
- add ___migrate_enable to trace_blacklist
v6 changes:
- move ftrace_hash_count declaration under CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS [sashiko]
- fix ftrace_hash_remove check/deref [sashiko]
- disable context access for multi programs by using stub function with no arguments
for verification [sashiko]
- add __used for bpf_multi_func, and removed arguments, we do not allow direct access [sashiko]
- rebased on latest loongarch changes, fix ppc build
- guard update_ftrace_direct_del with ftrace_hash_count on rollback [sashiko]
- fix noreturn attachment condition in bpf_check_attach_btf_id_multi [sashiko]
- fail early on multiple same IDs provided by user [sashiko]
- fix selftests error paths [sashiko]
- add MAX_RESOLVE_DEPTH check to btf_get_type_size [sashiko]
- use btf__pointer_size [sashiko]
- fixed compilation on powerpc [sashiko]
- added verifier fails selftest
- after discussing with Song, it was determined that cleaning up FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER
is not strictly necessary — keeping the trampoline in the ipmodify_enabled state is acceptable.
The race condition this introduces remains unlikely, so the concern raised in [1] will not be
addressed at this time.
[1] https://lore.kernel.org/bpf/aec7bAbGlnEo3R1g@krava/
v5 changes:
- add dedicated hashes used for detach, so there's no need to allocate
them on detach [sashiko]
- safely release old trampoline images [sashiko]
- add cond_resched() to couple of loops [sashiko]
- validate attr->link_create.target_fd [sashiko]
- allow only bpf_get_func_ret() for return value retrieval [sashiko]
- do not allow attachment of fexit/fsession_multi for noreturn functions [sashiko]
- fixed double free/close in libbpf btf cleanup, in separate patch [sashiko]
- make btf_type_is_traceable_func closer to btf_distill_func_proto [sashiko]
- add prog->attach_btf_obj_fd check to collect_func_ids_by_glob,
to check we don't load module programs for kernel [sashiko]
- make sure program is loaded in bpf_program__attach_tracing_multi [sashiko]
- several selftests fixes [sashiko]
- add attach_type to fdinfo output [Leon Hwang]
- selftests cleanup fixes [Leon Hwang]
v4 changes:
- unlink rollback fix (added ftrace_hash_count) [bot]
- use const for some bpf_link_create_opts tracing_multi members [bot]
- adding missing comment for lockdep keys [bot]
- selftest error path fixes (leaks) and other assorted test fixes [Leon Hwang]
- several compile fixes wrt CONFIG_BPF_SYSCALL and CONFIG_BPF_JIT [kernel test robot]
- make ftrace_hash_clear global, because it's needed in rollback
v3 changes:
- fix module parsing [Leon Hwang]
- use function traceable check from libbpf [Leon Hwang]
- use ptr_to_u64 and fix/updated few comments [ci]
- display cookies as decimal numbers [ci]
- added link_create.flags check [ci]
- fix error path in bpf_trampoline_multi_detach [ci]
- make fentry/fexit.multi not extendable [ci]
- add missing OPTS_VALID to bpf_program__attach_tracing_multi [ci]
v2 changes:
- allocate data.unreg in bpf_trampoline_multi_attach for rollback path [ci]
and fixed link count setup in rollback path [ci]
- several small assorted fixes [ci]
- added loongarch and powerpc changes for struct bpf_tramp_node change
- added support to attach functions from modules
- added tests for sleepable programs
- added rollback tests
v1 changes:
- added ftrace_hash_count as wrapper for hash_count [Steven]
- added trampoline mutex pool [Andrii]
- reworked 'struct bpf_tramp_node' separatoin [Andrii]
- the 'struct bpf_tramp_node' now holds pointer to bpf_link,
which is similar to what we do for uprobe_multi;
I understand it's not a fundamental change compared to previous
version which used bpf_prog pointer instead, but I don't see better
way of doing this.. I'm happy to discuss this further if there's
better idea
- reworked 'struct bpf_fsession_link' based on bpf_tramp_node
- made btf__find_by_glob_kind function internal helper [Andrii]
- many small assorted fixes [Andrii,CI]
- added session support [Leon Hwang]
- added cookies support
- added more tests
Note I plan to send linkinfo support separately, the patchset is big enough.
Jiri Olsa [Sat, 6 Jun 2026 12:39:54 +0000 (14:39 +0200)]
selftests/bpf: Add tracing multi attach rollback tests
Adding tests for the rollback code when the tracing_multi
link won't get attached, covering 2 reasons:
- wrong btf id passed by user, where all previously allocated
trampolines will be released
- trampoline for requested function is fully attached (has already
maximum programs attached) and the link fails, the rollback code
needs to release all previously link-ed trampolines and release
them
We need the bpf_fentry_test* unattached for the tests to pass,
so the rollback tests are serial.
Jiri Olsa [Sat, 6 Jun 2026 12:39:47 +0000 (14:39 +0200)]
selftests/bpf: Add tracing multi skel/pattern/ids module attach tests
Adding tests for tracing_multi link attachment via all possible
libbpf apis - skeleton, function pattern and btf ids on top of
bpf_testmod kernel module.
User can specify functions to attach with 'pattern' argument that
allows wildcards (*?' supported) or provide BTF ids of functions
in array directly via opts argument. These options are mutually
exclusive.
When using BTF ids, user can also provide cookie value for each
provided id/function, that can be retrieved later in bpf program
with bpf_get_attach_cookie helper. Each cookie value is paired with
provided BTF id with the same array index.
Adding support to auto attach programs with following sections:
The provided <pattern> is used as 'pattern' argument in
bpf_program__attach_kprobe_multi_opts function.
The <pattern> allows to specify optional kernel module name with
following syntax:
<module>:<function_pattern>
In order to attach tracing_multi link to a module functions:
- program must be loaded with 'module' btf fd
(in attr::attach_btf_obj_fd)
- bpf_program__attach_tracing_multi must either have
pattern with module spec or BTF ids from the module
Jiri Olsa [Sat, 6 Jun 2026 12:39:44 +0000 (14:39 +0200)]
libbpf: Add btf_type_is_traceable_func function
Adding btf_type_is_traceable_func function to perform same checks
as the kernel's btf_distill_func_proto function to prevent attachment
on some of the functions.
Exporting the function via libbpf_internal.h because it will be used
by benchmark test in following changes.
Adding bpf_trampoline_multi_attach/detach functions that allows to
attach/detach tracing program to multiple functions/trampolines.
The attachment is defined with bpf_program and array of BTF ids of
functions to attach the bpf program to.
Adding bpf_tracing_multi_link object that holds all the attached
trampolines and is initialized in attach and used in detach.
The attachment allocates or uses currently existing trampoline
for each function to attach and links it with the bpf program.
The attach works as follows:
- we get all the needed trampolines
- lock them and add the bpf program to each (__bpf_trampoline_link_prog)
- the trampoline_multi_ops passed in __bpf_trampoline_link_prog gathers
ftrace_hash (ip -> trampoline) objects
- we call update_ftrace_direct_add/mod to update needed locations
- we unlock all the trampolines
The detach works as follows:
- we lock all the needed trampolines
- remove the program from each (__bpf_trampoline_unlink_prog)
- the trampoline_multi_ops passed in __bpf_trampoline_unlink_prog gathers
ftrace_hash (ip -> trampoline) objects
- we call update_ftrace_direct_del/mod to update needed locations
- we unlock and put all the trampolines
We store the old image/flags in the trampoline before the update
and use it in case we need to rollback the attachment.
We keep the ftrace_hash objects allocated during attach in the link
so they can be used for detach as well.
Adding trampoline_(un)lock_all functions to (un)lock all trampolines
to gate the tracing_multi attachment.
Note this is supported only for archs (x86_64) with ftrace direct and
have single ops support.
Jiri Olsa [Sat, 6 Jun 2026 12:39:36 +0000 (14:39 +0200)]
bpf: Move sleepable verification code to btf_id_allow_sleepable
Move sleepable verification code to btf_id_allow_sleepable function.
It will be used in following changes.
Adding code to retrieve type's name instead of passing it from
bpf_check_attach_target function, because this function will be
called from another place in following changes and it's easier
to retrieve the name directly in here.
Jiri Olsa [Sat, 6 Jun 2026 12:39:35 +0000 (14:39 +0200)]
bpf: Add multi tracing attach types
Adding new program attach types multi tracing attachment:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
and their base support in verifier code.
Programs with such attach type will use specific link attachment
interface coming in following changes.
This was suggested by Andrii some (long) time ago and turned out
to be easier than having special program flag for that.
Bpf programs with such types have 'bpf_multi_func' function set as
their attach_btf_id and keep module reference when it's specified
by attach_prog_fd.
They are also accepted as sleepable programs during verification,
and the real validation for specific BTF_IDs/functions will happen
during the multi link attachment in following changes.
Jiri Olsa [Sat, 6 Jun 2026 12:39:34 +0000 (14:39 +0200)]
bpf: Factor fsession link to use struct bpf_tramp_node
Now that we split trampoline attachment object (bpf_tramp_node) from
the link object (bpf_tramp_link) we can use bpf_tramp_node as fsession's
fexit attachment object and get rid of the bpf_fsession_link object.
The link holds the bpf_prog pointer and forces one link - one program
binding logic. In following changes we want to attach program to multiple
trampolines but we want to keep just one bpf_link object.
The 'struct bpf_tramp_link' defines standard single trampoline link
and 'struct bpf_tramp_node' is the attachment trampoline object with
pointer to the bpf_link object.
This will allow us to define link for multiple trampolines, like:
Jiri Olsa [Sat, 6 Jun 2026 12:39:32 +0000 (14:39 +0200)]
bpf: Add bpf_trampoline_add/remove_prog functions
Separate bpf_trampoline_add/remove_prog functions from
__bpf_trampoline_link/unlink functions to be able to add/remove
trampoline programs without the image being updated in following
changes. No functional change is intended.
Jiri Olsa [Sat, 6 Jun 2026 12:39:31 +0000 (14:39 +0200)]
bpf: Move trampoline image setup into bpf_trampoline_ops callbacks
Moving trampoline image setup into bpf_trampoline_ops callbacks,
so we can have different image handling for multi attachment which
is coming in following changes.
Jiri Olsa [Sat, 6 Jun 2026 12:39:30 +0000 (14:39 +0200)]
bpf: Add struct bpf_trampoline_ops object
In following changes we will need to override ftrace direct attachment
behaviour. In order to do that we are adding struct bpf_trampoline_ops
object that defines callbacks for ftrace direct attachment:
register_fentry
unregister_fentry
modify_fentry
The new struct bpf_trampoline_ops object is passed as an argument to
__bpf_trampoline_link/unlink_prog functions.
At the moment the default trampoline_ops is set to the current ftrace
direct attachment functions, so there's no functional change for the
current code.
Jiri Olsa [Sat, 6 Jun 2026 12:39:29 +0000 (14:39 +0200)]
bpf: Use mutex lock pool for bpf trampolines
Adding mutex lock pool that replaces bpf trampolines mutex.
For tracing_multi link coming in following changes we need to lock all
the involved trampolines during the attachment. This could mean thousands
of mutex locks, which is not convenient.
As suggested by Andrii we can replace bpf trampolines mutex with mutex
pool, where each trampoline is hash-ed to one of the locks from the pool.
It's better to lock all the pool mutexes (32 at the moment) than
thousands of them.
There is 48 (MAX_LOCK_DEPTH) lock limit allowed to be simultaneously
held by task, so we need to keep 32 mutexes (5 bits) in the pool, so
when we lock them all in following changes the lockdep won't scream.
Removing the mutex_is_locked in bpf_trampoline_put, because we removed
the mutex from bpf_trampoline.
Abdun Nihaal [Thu, 14 May 2026 08:24:42 +0000 (13:54 +0530)]
fbdev: vesafb: fix memory leak in vesafb_probe()
Since commit 73ce73c30ba9 ("fbdev: Transfer video= option strings to
caller; clarify ownership") the string returned from fb_get_options()
is expected to be freed by the caller. But the string is not freed in
vesafb_probe(). Fix that by freeing the option string after setup.
Fixes: 73ce73c30ba9 ("fbdev: Transfer video= option strings to caller; clarify ownership") Cc: stable@vger.kernel.org Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
Abdun Nihaal [Thu, 14 May 2026 08:24:41 +0000 (13:54 +0530)]
fbdev: efifb: fix memory leak in efifb_probe()
Since commit 73ce73c30ba9 ("fbdev: Transfer video= option strings to
caller; clarify ownership") the string returned from fb_get_options()
is expected to be freed by the caller, but the string is not freed in
efifb_probe(). Fix that by freeing the option string after setup.
Fixes: 73ce73c30ba9 ("fbdev: Transfer video= option strings to caller; clarify ownership") Cc: stable@vger.kernel.org Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
Abdun Nihaal [Thu, 14 May 2026 08:24:40 +0000 (13:54 +0530)]
fbdev: uvesafb: fix potential memory leak in uvesafb_probe()
Due to an incorrect goto label, memory allocated for modedb and modelist
in uvesafb_vbe_init() is not freed in some error paths. Fix this by
updating the goto label.
Abdun Nihaal [Thu, 14 May 2026 08:24:39 +0000 (13:54 +0530)]
fbdev: tridentfb: fix potential memory leak in trident_pci_probe()
In trident_pci_probe(), the memory allocated for modelist using
fb_videomode_to_modelist() is not freed in subsequent error paths.
Fix that by calling fb_destroy_modelist().
Abdun Nihaal [Thu, 14 May 2026 08:24:38 +0000 (13:54 +0530)]
fbdev: tdfxfb: fix potential memory leak in tdfxfb_probe()
In tdfxfb_probe(), the memory allocated for modelist using
fb_videomode_to_modelist() when CONFIG_FB_3DFX_I2C is defined, is not
freed in the subsequent error paths.
Fix that by calling fb_destroy_modelist().
Fixes: 215059d2421f ("tdfxfb: make use of DDC information about connected monitor") Cc: stable@vger.kernel.org Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Signed-off-by: Helge Deller <deller@gmx.de>
Abdun Nihaal [Thu, 14 May 2026 08:24:37 +0000 (13:54 +0530)]
fbdev: s3fb: fix potential memory leak in s3_pci_probe()
In s3_pci_probe(), the memory allocated for modelist using
fb_videomode_to_modelist() is not freed in subsequent error paths.
Fix that by calling fb_destroy_modelist()
Abdun Nihaal [Thu, 14 May 2026 08:24:36 +0000 (13:54 +0530)]
fbdev: nvidia: fix potential memory leak in nvidiafb_probe()
In nvidiafb_probe(), the memory allocated for modelist in
nvidia_set_fbinfo() is not freed in the subsequent error paths.
Fix that by calling fb_destroy_modelist().
Abdun Nihaal [Thu, 14 May 2026 08:24:35 +0000 (13:54 +0530)]
fbdev: i740fb: fix potential memory leak in i740fb_probe()
In i740fb_probe(), the memory allocated in fb_videomode_to_modelist()
for modelist is not freed in the error paths. Fix that by calling
fb_destroy_modelist().
Abdun Nihaal [Thu, 14 May 2026 08:24:33 +0000 (13:54 +0530)]
fbdev: radeon: fix potential memory leak in radeonfb_pci_register()
The function radeonfb_pci_register() allocates memory for modelist
(by calling radeon_check_modes() which calls fb_add_videomode()).
The memory is appended to info->modelist, but is not freed in subsequent
error paths. Fix this by calling fb_destroy_modelist().
Rosen Penev [Mon, 18 May 2026 21:13:03 +0000 (14:13 -0700)]
fbdev: imxfb: Use of_device_get_match_data()
Use of_device_get_match_data() to fetch the platform ID entry directly
instead of open-coding an of_match_device() lookup. No NULL check is
needed as every compatible string has a corresponding data section.
This also lets the driver drop the of_device.h include.
Jiacheng Yu [Thu, 14 May 2026 09:19:18 +0000 (17:19 +0800)]
fbcon: Use correct type for vc_resize() return value
The return value of vc_resize() is int, but fbcon_set_disp() stores it
in an unsigned long variable. While the !ret check happens to work
correctly by coincidence (negative values become large positive values),
the types should match. Use int instead.
Eliminates the following W=3 warning:
drivers/video/fbdev/core/fbcon.c: In function 'fbcon_set_disp':
drivers/video/fbdev/core/fbcon.c:1494:14: warning: implicit conversion from 'int' to 'unsigned long' [-Wconversion]