git.ipfire.org Git - thirdparty/kernel/linux.git/log

cpufreq: ti: Add EPROBE_DEFER for K3 SoCs

On K3 SoCs, ti-cpufreq relies on k3-socinfo to register the SoC
device before soc_device_match() can return valid revision
information. If ti-cpufreq probes before k3-socinfo,
soc_device_match() returns NULL, leading to incorrect CPU frequency
scaling behavior.

Add a needs_k3_socinfo flag to ti_cpufreq_soc_data (similar to
the existing multi_regulator pattern) to defer probe when k3-socinfo
hasn't registered the SoC device yet.

Signed-off-by: Akashdeep Kaur <a-kaur@ti.com>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: qcom: Add cpufreq scaling support for Qualcomm Shikra SoC

The Qualcomm Shikra cpufreq hardware is functionally identical to EPSS,
but supports only up to 12 frequency lookup table (LUT) entries. When all
12 entries are populated, the existing repetitive LUT entry check may read
beyond valid entries and expose incorrect frequencies. Hence, introduce
shikra_epss_soc_data that reuses EPSS configuration with appropriate LUT
entries limit.

Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

dt-bindings: cpufreq: Document Qualcomm Shikra SoC EPSS

The Qualcomm Shikra cpufreq hardware is functionally identical to EPSS,
but supports only up to 12 frequency lookup table (LUT) entries. Introduce
Shikra specific bindings to represent this constrained EPSS variant.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Imran Shaik <imran.shaik@oss.qualcomm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

m68k: coldfire: use ColdFire specifc IO access in SoC code

Convert all ColdFire specific SoC/board setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: use ColdFire specifc IO access in system code

Convert all ColdFire specific system setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: rename timer register access defines

With the basic ColdFire IO register functions now consistently named
with a "mcf_read"/"mcf_write" prefix it makes sense to name the timers
internal access defines the same way. Convert the local __raw_readtrr
and __raw_writetrr defines to use the consistent prefixes too. Thus
the change is:

__raw_readtrr --> mcf_readtrr
__raw_writetrr --> mcf_writetrr

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: use ColdFire specifc IO access in timer code

Convert all ColdFire specific timer setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: use ColdFire specifc IO access in interrupt code

Convert all ColdFire specific interrupt setup code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: use ColdFire specific IO access in headers

Convert all m68k/ColdFire specific header file code to only use the
newly created internal register access methods. This is replacing the
mixed and inconsistent use of readx/writex and __raw_readx/__raw_writex
for internal SoC registers.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: create IO access functions for internal registers

The internal peripheral registers contained in all varieties of ColdFire
SoCs require simple big endian access ranging in sizes from 8, 16 and 32
bit. Currently there is a mixture of IO access methods used across the
various CPU support code, some using readx/writex and some using the
simpler __raw_readx/__raw_writew.

The readx/writex use cases are particularly kludgy in that they contain
code to differentiate internal register access and other general attached
peripheral register access - say on a PCI bus. In effect this means that
the readx/writex family for ColdFire is non-standard. This ultimately
ends up causing problems with definitions of other IO access support
functions like ioreadx/ioreadxbe/iowritex/iowritexbe which in the
generic case are defined in terms of readx/writex.

Create a set of internal only register access methods to ultimately
replace all internal register access code. The new access functions
mirror the existing readx/writex family but using the preferred 8/16/32
suffixes.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: defconfig: update all ColdFire defconfigs

Refresh all ColdFire base defconfig files, making sure that modules
is turned on for all targets.

Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: defconfig: add config for SnapGear/NETtel board

Add a default configuration for a basic M5307 based NETtel board.
This is primarily to improve defconfig build coverage. This platforms
uses the SMSC ethernet drivers for network ports, and has a few other
minor quirks that make it different from other ColdFire platforms.

Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: defconfig: add config for M54418EVB board

Add a default configuration for a basic M54418 based EVB board.
The SoC has been supported for a long time but there is no default
configuration. Create one to improve build and test coverage.

Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: defconfig: add config for M5329EVB board

Add a default configuration file for the Freescale M5329EVB board.
Although the SoC type has been supported for a long time there has been
no defconfig for the base platform. Create one to give better build
and test coverage.

Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: coldfire: select legacy gpiolib interface for mcfqspi

The common coldfire code uses the old GPIO number based interfaces for
at least the QSPI chipselect lines. Select the required Kconfig symbol
to keep it building when that becomes optional.

Apparently there are no devices attached to a QSPI controller in any of
the coldfire boards, so this is not actually used in upstream kernels.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

m68k: update boards company name

Sysam name has changed to Kernelspace, updating each reference.

Signed-off-by: Angelo Dureghello <angelo@kernel-space.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

config: stmark2: defconfig updates

Various updates, mainly to reduce kernel size.

Signed-off-by: Angelo Dureghello <angelo@kernel-space.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>

kunit:tool: Don't write to stdout when it should be disabled

The kunit_parser module accepts a 'printer' object which is used as a
destination for all output. This is typically set to stdout, so that the
parsed results are visible, but can be set to a special 'null_printer' to
implement options where not all results are always printed.

However, there are a few places where use of stdout is hardcoded, notably
in handling crashed tests and in outputting the colour escape sequences.

Properly use the specified printer for all output. This is okay for the
colour handling (as this is already gated behind isatty() anyway), and also
for the crash handling, as cases where printer != stdout are separately
printed afterwards.

Link: https://lore.kernel.org/r/20260606020317.264178-1-david@davidgow.net
Fixes: 062a9dd9bad7 ("kunit: tool: Only print the summary")
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

kunit: tool: Add (primitive) support for outputting JUnit XML

This is used by things like Jenkins and other CI systems, which can
pretty-print the test output and potentially provide test-level comparisons
between runs.

The implementation here is pretty basic: it only provides the raw results,
split into tests and test suites, and doesn't provide any overall metadata.
However, CI systems like Jenkins can ingest it and it is already useful.

Link: https://lore.kernel.org/r/20260606013827.240790-2-david@davidgow.net
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

kunit: tool: Parse and print the reason tests are skipped

When a KUnit test (or other KTAP test) is skipped, a "skip reason" can be
provided. kunit.py has never done anything with this, ignoring anything
included in the KTAP output after the 'SKIP' directive.

Since we have it, and it's used, print it in a nice friendly yellow in
parentheses after a skipped test's name.

(And, by parsing it, it can be included in the JUnit results as well.)

Link: https://lore.kernel.org/r/20260606013827.240790-1-david@davidgow.net
Signed-off-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

Merge branch 'bpf-fix-lru-nmi-tracepoint-re-entry-deadlock'

Mykyta Yatsenko says:

====================
bpf: Fix LRU NMI/tracepoint re-entry deadlock

This series fixes AA-deadlocks where NMI and tracepoint BPF programs
re-enter the per-CPU or global LRU lock already held on the same CPU
(syzbot c69a0a2c816716f1e0d5, 18b26edb69b2e19f3b33).

Patch 1 converts every LRU lock site to rqspinlock_t
and adds explicit recovery for some failures so no node leaks.

Patch 2 refreshes Documentation/bpf/map_lru_hash_update.dot to show
the new rqspinlock failure exits and recovery routes.

Patch 3 introduces a stress test.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v3:
- Removed RFC tag
- Link to v2: https://patch.msgid.link/20260603-lru_map_spin-v2-0-7060cfb6cdac@meta.com

Changes in v2:
- Patch 1: __bpf_lru_node_move_in() now clears pending_free only when
moving to the FREE list.
- Patch 3: address sashiko's feedback.
- Link to v1: https://patch.msgid.link/20260528-lru_map_spin-v1-0-4f52223170cf@meta.com
====================

Link: https://patch.msgid.link/20260607-lru_map_spin-v3-0-bcd9332e911b@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Stress LRU rqspinlock recovery paths

Introduces stress test for bpf_lru_list that exercises
lock-failures and orphan-recovery, added by the LRU rqspinlock
conversion.

Runs three subtests: common LRU, per-CPU LRU lists (BPF_F_NO_COMMON_LRU),
and per-CPU LRU map. Each pins one userspace hammer per CPU and attaches
the perf_event NMI BPF prog (update+delete mix) on every online CPU.
Pre-fix, lockdep fires the "INITIAL USE -> IN-NMI" splat during stress.

After stress test, drain_then_verify_capacity() drains every key
and refills the lru map.
A stranded node on any CPU's pool would have forced eviction of
a just-inserted key on that CPU, surfacing here as a missing lookup.

Marked serial_ because per-CPU pinning and high-rate HW perf events
would perturb parallel tests.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260607-lru_map_spin-v3-3-bcd9332e911b@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Documentation/bpf: Refresh map_lru_hash_update.dot for rqspinlock

Reflect the rqspinlock conversion and orphan-recovery paths added in
the previous commit:

- All LRU locks are rqspinlock_t; any acquire can fail (AA or
   timeout). A shared "rqspinlock acquire failed" terminal collapses
   to the existing -ENOMEM exit. Dashed arrows from each acquire site
   mark the failure paths.

- The per-CPU local freelist is now lockless (free_llist).

- Post-steal: re-acquiring loc_l->lock to insert the stolen node
   into the local pending list can fail; on failure the node is
   published to free_llist instead of being orphaned, and the call
   returns -ENOMEM.

- Steal-loop victim lock failure is silent: skip the victim and try
   the next CPU.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260607-lru_map_spin-v3-2-bcd9332e911b@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix NMI/tracepoint re-entry deadlock on lru locks

NMI and tracepoint BPF programs can re-enter the per-CPU or global
LRU lock that bpf_lru_pop_free()/push_free() already hold on the
same CPU, AA-deadlocking. Lockdep reports "inconsistent
{INITIAL USE} -> {IN-NMI}" on &l->lock (syzbot c69a0a2c816716f1e0d5)
and "possible recursive locking detected" on &loc_l->lock (syzbot
18b26edb69b2e19f3b33).

Prior trylock and rqspinlock based fixes (see links) were nacked
because compromised on reliability.

This patch converts every LRU lock site to rqspinlock_t and adds a
recovery path for some failure windows to avoid node leaks.

Failure recovery:

- *_pop_free top-level: return NULL; prealloc_lru_pop() already
   treats that as no-free-element (-ENOMEM).

- Cross-CPU steal: skip the victim's locked loc_l, try next CPU.

- Post-steal local lock fail: publish stolen node to lockless
   per-CPU free_llist; next pop on this CPU picks it up.

- push_free fail: mark node pending_free=1. __local_list_flush(),
   __local_list_pop_pending() reclaim the node from pending_list.
   __bpf_lru_list_shrink_inactive() reclaims the node from inactive
   list. Nodes from active list are reclaimed by __bpf_lru_list_shrink()
   or after __bpf_lru_list_rotate_active() demotes it to the inactive.

Fixes: 3a08c2fd7634 ("bpf: LRU List")
Reported-by: syzbot+c69a0a2c816716f1e0d5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c69a0a2c816716f1e0d5
Reported-by: syzbot+18b26edb69b2e19f3b33@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=18b26edb69b2e19f3b33
Link: https://lore.kernel.org/bpf/CAPPBnEYO4R+m+SpVc2gNj_x31R6fo1uJvj2bK2YS1P09GWT6kQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/CAPPBnEZmFA3ab8Uc=PEm0bdojZy=7T_F5_+eyZSHyZR3MBG4Vw@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20251030030010.95352-1-dongml2@chinatelecom.cn/
Link: https://lore.kernel.org/bpf/20260119142120.28170-1-leon.hwang@linux.dev/
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260607-lru_map_spin-v3-1-bcd9332e911b@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Documentation: rust: testing: add Kconfig guidance

Now that the Rust KUnit tests are protected with Kconfig, update the
documentation to mention it.

Signed-off-by: Yury Norov <ynorov@nvidia.com>
Reviewed-by: David Gow <david@davidgow.net>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260417031531.315281-4-ynorov@nvidia.com
[ Fixed the paragraph by moving the new sentence above. Added gate
in the other example as well. Applied proper formatting. Reworded
slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: tests: add Kconfig for KUnit test

There are 6 individual Rust KUnit test suites (plus the doctests one). All
the tests are compiled unconditionally now, which adds ~200 kB to the
kernel image for me on x86_64. As Rust matures, this bloating will
inevitably grow.

Add Kconfig.test which includes a RUST_KUNIT_TESTS menu, and all
individual tests under it.

As usual, new tests are all enabled if KUNIT_ALL_TESTS=y.

Suggested-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
Reviewed-by: David Gow <david@davidgow.net>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260417031531.315281-3-ynorov@nvidia.com
[ Fixed capitalization. Used singular for "API" for consistency.
  Reworded to clarify these are suites and that there exists
  the doctests one (which is the biggest at the moment by
  far). - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: tests: drop 'use crate' in bitmap and atomic KUnit tests

The following patch makes usage of macros::kunit_tests crate conditional
on the corresponding configs. When the configs are disabled, compiler
warns on unused crate. So, embed it in unit test declaration.

Signed-off-by: Yury Norov <ynorov@nvidia.com>
Reviewed-by: David Gow <david@davidgow.net>
Acked-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260417031531.315281-2-ynorov@nvidia.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

Linux 7.1-rc7

io_uring/wait: fix min_timeout behavior

The wakeup condition if a min timeout is present and has expired is that
at least _one_ CQE was posted. Thus set the cq_tail target to
->cq_min_tail + 1. Without this commit a spurious wakeup can result in a
premature wakeup because io_should_wake() will return true even if _no_
CQE was posted at all.

Cc: Tip ten Brink <tip@tenbrinkmeijs.com>
Fixes: e15cb2200b93 ("io_uring: fix min_wait wakeups for SQPOLL")
Cc: stable@vger.kernel.org
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Link: https://patch.msgid.link/20260606201120.1441447-1-lk@c--e.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: don't truncate end buffer for bundles

If buffers have been peeked for a bundle receive, the kernel will
truncate the end buffer, if the available length is shorter than the
buffer itself. This is unnecessary, as applications iterating bundle
receives must always use the minimum size of the buffer length and the
remaining number of bytes in the bundle. The examples in liburing do
that as well, eg examples/proxy.c.

If the kernel does truncate this buffer AND the current transfer fails,
then the buffer will be left with a smaller size than what is otherwise
available.

Just remove the buffer truncation, as it's not necessary in the first
place.

Link: https://lore.kernel.org/io-uring/CAAEr8jbY60noGj1fw_k91UJRBkyiRVoS6=nLhZ7Svwidjn4CAA@mail.gmail.com/
Reported-by: Federico Brasili <federico.brasili@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'x86-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- Add more AMD Zen6 models (Pratik Vishwakarma)

- Avoid confusing bootup message by the Intel resctl enumeration
   code when running on certain AMD systems (Tony Luck)

* tag 'x86-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Only check Intel systems for SNC
  x86/CPU/AMD: Add more Zen6 models

Merge tag 'timers-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:

- Fix the arch_inlined_clockevent_set_next_coupled() prototype in the
   !CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST case (Naveen Kumar Chaudhary)

- Fix an off-by-1 bug in the sys_settimeofday() usecs validation code
   (Naveen Kumar Chaudhary)

- Mark vdso_k_*_data pointers as __ro_after_init (Thomas Weißschuh)

- Fix livelock race in tmigr_handle_remote_up() (Amit Matityahu)

* tag 'timers-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Fix livelock in tmigr_handle_remote_up()
  vdso/datastore: Mark vdso_k_*_data pointers as __ro_after_init
  time: Fix off-by-one in settimeofday() usec validation
  clockevents: Fix duplicate type specifier in stub function parameter

Merge tag 'sched-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull rseq fix from Ingo Molnar:

- Fix uninitialized stack variable in rseq_exit_user_update() (Qing
Wang)

* tag 'sched-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Fix using an uninitialized stack variable in rseq_exit_user_update()

Merge tag 'locking-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:

- Fix a NULL pointer dereference bug in the FUTEX_CMP_REQUEUE_PI
   code (Ji'an Zhou)

- Fix a NULL pointer dereference bug in the rtmutex code (Davidlohr
   Bueso)

* tag 'locking-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Skip remove_waiter() when waiter is not enqueued
  futex/requeue: Prevent NULL pointer dereference in remove_waiter() on self-deadlock

Merge tag 'regulator-fix-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"Arnd's randconfig testing turned up a missing selection of
CONFIG_IRQ_DOMAIN which was causing build breaks"

* tag 'regulator-fix-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: mt6363: select CONFIG_IRQ_DOMAIN

rhashtable: Fix rhashtable_next_key() build warnings

rhashtable.o builds with warnings as rhashtable_next_key() kdoc
from lib/rhashtable.c does not have the arguments descriptions.

Move rhashtable_next_key() kdoc from header to c file, matching
other functions.

Move rhashtable_next_key() next to the other forward declarations
in the header file.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606061925.WI4bYI8k-lkp@intel.com/
Fixes: 8f4fa9f89b72 ("rhashtable: Add rhashtable_next_key() API")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260606-rhash_fixes_1-v1-1-932ab036e6bc@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-tracing_multi-link'

Jiri Olsa says:

====================
bpf: tracing_multi link

Add tracing_multi link support that allows fast attachment
of tracing program to many functions.

RFC: https://lore.kernel.org/bpf/20260203093819.2105105-1-jolsa@kernel.org/
v1: https://lore.kernel.org/bpf/20260220100649.628307-1-jolsa@kernel.org/
v2: https://lore.kernel.org/bpf/20260304222141.497203-1-jolsa@kernel.org/
v3: https://lore.kernel.org/bpf/20260316075138.465430-1-jolsa@kernel.org/
v4: https://lore.kernel.org/bpf/20260324081846.2334094-1-jolsa@kernel.org/
v5: https://lore.kernel.org/bpf/20260417192502.194548-1-jolsa@kernel.org/
v6: https://lore.kernel.org/bpf/20260527113951.46265-1-jolsa@kernel.org/
v7: https://lore.kernel.org/bpf/20260603110554.29590-1-jolsa@kernel.org/

v8 changes:
- add back the btf_is_union check to btf_get_type_size [sashiko]

v7 changes:
- added ftrace_hash_count stub for !CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS cade [sashiko]
- selftests fixes [sashiko]
- use hash_ptr in select_trampoline_lock [sashiko]
- changed the check duplicate logic in check_dup_ids [sashiko]
- use sort_r_nonatomic in check_dup_ids [sashiko]
- added BPF_TRACE_FSESSION_MULTI to can_be_sleepable,
  plus added testcase for sleepable fsession
- make bpf_tracing_multi_opts pointer fields as const
- add ___migrate_enable to trace_blacklist

v6 changes:
- move ftrace_hash_count declaration under CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS [sashiko]
- fix ftrace_hash_remove check/deref [sashiko]
- disable context access for multi programs by using stub function with no arguments
  for verification [sashiko]
- add __used for bpf_multi_func, and removed arguments, we do not allow direct access [sashiko]
- rebased on latest loongarch changes, fix ppc build
- guard update_ftrace_direct_del with ftrace_hash_count on rollback [sashiko]
- fix noreturn attachment condition in bpf_check_attach_btf_id_multi [sashiko]
- fail early on multiple same IDs provided by user [sashiko]
- fix selftests error paths [sashiko]
- add MAX_RESOLVE_DEPTH check to btf_get_type_size [sashiko]
- use btf__pointer_size [sashiko]
- fixed compilation on powerpc [sashiko]
- added verifier fails selftest
- after discussing with Song, it was determined that cleaning up FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER
  is not strictly necessary — keeping the trampoline in the ipmodify_enabled state is acceptable.
  The race condition this introduces remains unlikely, so the concern raised in [1] will not be
  addressed at this time.
  [1] https://lore.kernel.org/bpf/aec7bAbGlnEo3R1g@krava/

v5 changes:
- add dedicated hashes used for detach, so there's no need to allocate
  them on detach [sashiko]
- safely release old trampoline images [sashiko]
- add cond_resched() to couple of loops [sashiko]
- validate attr->link_create.target_fd [sashiko]
- allow only bpf_get_func_ret() for return value retrieval [sashiko]
- do not allow attachment of fexit/fsession_multi for noreturn functions [sashiko]
- fixed double free/close in libbpf btf cleanup, in separate patch [sashiko]
- make btf_type_is_traceable_func closer to btf_distill_func_proto [sashiko]
- add prog->attach_btf_obj_fd check to collect_func_ids_by_glob,
  to check we don't load module programs for kernel [sashiko]
- make sure program is loaded in bpf_program__attach_tracing_multi [sashiko]
- several selftests fixes [sashiko]
- add attach_type to fdinfo output [Leon Hwang]
- selftests cleanup fixes [Leon Hwang]

v4 changes:
- unlink rollback fix (added ftrace_hash_count) [bot]
- use const for some bpf_link_create_opts tracing_multi members [bot]
- adding missing comment for lockdep keys [bot]
- selftest error path fixes (leaks) and other assorted test fixes [Leon Hwang]
- several compile fixes wrt CONFIG_BPF_SYSCALL and CONFIG_BPF_JIT [kernel test robot]
- make ftrace_hash_clear global, because it's needed in rollback

v3 changes:
- fix module parsing [Leon Hwang]
- use function traceable check from libbpf [Leon Hwang]
- use ptr_to_u64 and fix/updated few comments [ci]
- display cookies as decimal numbers [ci]
- added link_create.flags check [ci]
- fix error path in bpf_trampoline_multi_detach [ci]
- make fentry/fexit.multi not extendable [ci]
- add missing OPTS_VALID to bpf_program__attach_tracing_multi [ci]

v2 changes:
- allocate data.unreg in bpf_trampoline_multi_attach for rollback path [ci]
  and fixed link count setup in rollback path [ci]
- several small assorted fixes [ci]
- added loongarch and powerpc changes for struct bpf_tramp_node change
- added support to attach functions from modules
- added tests for sleepable programs
- added rollback tests

v1 changes:
- added ftrace_hash_count as wrapper for hash_count [Steven]
- added trampoline mutex pool [Andrii]
- reworked 'struct bpf_tramp_node' separatoin [Andrii]
  - the 'struct bpf_tramp_node' now holds pointer to bpf_link,
    which is similar to what we do for uprobe_multi;
    I understand it's not a fundamental change compared to previous
    version which used bpf_prog pointer instead, but I don't see better
    way of doing this.. I'm happy to discuss this further if there's
    better idea
- reworked 'struct bpf_fsession_link' based on bpf_tramp_node
- made btf__find_by_glob_kind function internal helper [Andrii]
- many small assorted fixes [Andrii,CI]
- added session support [Leon Hwang]
- added cookies support
- added more tests

Note I plan to send linkinfo support separately, the patchset is big enough.

thanks,
jirka
====================

Link: https://patch.msgid.link/20260606123955.345967-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi attach rollback tests

Adding tests for the rollback code when the tracing_multi
link won't get attached, covering 2 reasons:

  - wrong btf id passed by user, where all previously allocated
    trampolines will be released
  - trampoline for requested function is fully attached (has already
    maximum programs attached) and the link fails, the rollback code
    needs to release all previously link-ed trampolines and release
    them

We need the bpf_fentry_test* unattached for the tests to pass,
so the rollback tests are serial.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-30-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi attach benchmark test

Adding benchmark test that attaches to (almost) all allowed tracing
functions and display attach/detach times.

  # ./test_progs -t tracing_multi_bench_attach -v
  bpf_testmod.ko is already unloaded.
  Loading bpf_testmod.ko...
  Successfully loaded bpf_testmod.ko.
  serial_test_tracing_multi_bench_attach:PASS:btf__load_vmlinux_btf 0 nsec
  serial_test_tracing_multi_bench_attach:PASS:tracing_multi_bench__open_and_load 0 nsec
  serial_test_tracing_multi_bench_attach:PASS:get_syms 0 nsec
  serial_test_tracing_multi_bench_attach:PASS:bpf_program__attach_tracing_multi 0 nsec
  serial_test_tracing_multi_bench_attach: found 51186 functions
  serial_test_tracing_multi_bench_attach: attached in   1.295s
  serial_test_tracing_multi_bench_attach: detached in   0.243s
  #507     tracing_multi_bench_attach:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
  Successfully unloaded bpf_testmod.ko.

Exporting skip_entry as is_unsafe_function and using it in the test.

Also updating trace_blacklist with ___migrate_enable to be in sync
with kernel functions deny list.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-29-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi verifier fails test

Adding tests for verifier fails on tracing multi programs.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-28-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi attach fails test

Adding tests for attach fails on tracing multi link.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-27-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi session test

Adding tests for tracing multi link session.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-26-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi cookies test

Adding tests for using cookies on tracing multi link.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-25-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi intersect tests

Adding tracing multi tests for intersecting attached functions.

Using bits from (from 1 to 16 values) to specify (up to 4) attached
programs, and randomly choosing bpf_fentry_test* functions they are
attached to.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-24-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi skel/pattern/ids module attach tests

Adding tests for tracing_multi link attachment via all possible
libbpf apis - skeleton, function pattern and btf ids on top of
bpf_testmod kernel module.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-23-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tracing multi skel/pattern/ids attach tests

Adding tests for tracing_multi link attachment via all possible
libbpf apis - skeleton, function pattern and btf ids.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-22-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add support to create tracing multi link

Adding bpf_program__attach_tracing_multi function for attaching
tracing program to multiple functions.

  struct bpf_link *
  bpf_program__attach_tracing_multi(const struct bpf_program *prog,
                                    const char *pattern,
                                    const struct bpf_tracing_multi_opts *opts);

User can specify functions to attach with 'pattern' argument that
allows wildcards (*?' supported) or provide BTF ids of functions
in array directly via opts argument. These options are mutually
exclusive.

When using BTF ids, user can also provide cookie value for each
provided id/function, that can be retrieved later in bpf program
with bpf_get_attach_cookie helper. Each cookie value is paired with
provided BTF id with the same array index.

Adding support to auto attach programs with following sections:

  fsession.multi/<pattern>
  fsession.multi.s/<pattern>
  fentry.multi/<pattern>
  fexit.multi/<pattern>
  fentry.multi.s/<pattern>
  fexit.multi.s/<pattern>

The provided <pattern> is used as 'pattern' argument in
bpf_program__attach_kprobe_multi_opts function.

The <pattern> allows to specify optional kernel module name with
following syntax:

  <module>:<function_pattern>

In order to attach tracing_multi link to a module functions:
- program must be loaded with 'module' btf fd
  (in attr::attach_btf_obj_fd)
- bpf_program__attach_tracing_multi must either have
  pattern with module spec or BTF ids from the module

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-21-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add btf_type_is_traceable_func function

Adding btf_type_is_traceable_func function to perform same checks
as the kernel's btf_distill_func_proto function to prevent attachment
on some of the functions.

Exporting the function via libbpf_internal.h because it will be used
by benchmark test in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-20-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add bpf_link_create support for tracing_multi link

Adding bpf_link_create support for tracing_multi link with
new tracing_multi record in struct bpf_link_create_opts.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-19-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add bpf_object_cleanup_btf function

Adding bpf_object_cleanup_btf function to cleanup btf objects.
It will be used in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-18-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add support for tracing_multi link fdinfo

Adding tracing_multi link fdinfo support with following output:

pos:    0
flags:  02000000
mnt_id: 19
ino:    3087
link_type:      tracing_multi
link_id:        9
prog_tag:       599ba0e317244f86
prog_id:        94
attach_type:    59
cnt:    10
obj-id   btf-id  cookie  func
1        91593   8       bpf_fentry_test1+0x4/0x10
1        91595   9       bpf_fentry_test2+0x4/0x10
1        91596   7       bpf_fentry_test3+0x4/0x20
1        91597   5       bpf_fentry_test4+0x4/0x20
1        91598   4       bpf_fentry_test5+0x4/0x20
1        91599   2       bpf_fentry_test6+0x4/0x20
1        91600   3       bpf_fentry_test7+0x4/0x10
1        91601   1       bpf_fentry_test8+0x4/0x10
1        91602   10      bpf_fentry_test9+0x4/0x10
1        91594   6       bpf_fentry_test10+0x4/0x10

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-17-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add support for tracing_multi link session

Adding support to use session attachment with tracing_multi link.

Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows
the BPF_TRACE_FSESSION behaviour but on the tracing_multi link.

Such program is called on entry and exit of the attached function
and allows to pass cookie value from entry to exit execution.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add support for tracing_multi link cookies

Add support to specify cookies for tracing_multi link.

Cookies are provided in array where each value is paired with provided
BTF ID value with the same array index.

Such cookie can be retrieved by bpf program with bpf_get_attach_cookie
helper call.

We need to sort cookies array together with ids array in check_dup_ids,
to keep the id->cookie relation.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-15-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add support for tracing multi link

Adding new link to allow to attach program to multiple function
BTF IDs. The link is represented by struct bpf_tracing_multi_link.

To configure the link, new fields are added to bpf_attr::link_create
to pass array of BTF IDs;

  struct {
    __aligned_u64 ids;
    __u32         cnt;
  } tracing_multi;

Each BTF ID represents function (BTF_KIND_FUNC) that the link will
attach bpf program to.

We use previously added bpf_trampoline_multi_attach/detach functions
to attach/detach the link.

The linkinfo/fdinfo callbacks will be implemented in following changes.

Note this is supported only for archs (x86_64) with ftrace direct and
have single ops support.

  CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS &&
  CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS

Note using sort_r (instead of plain sort) in check_dup_ids, because we
will use the swap callback in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-14-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add bpf_trampoline_multi_attach/detach functions

Adding bpf_trampoline_multi_attach/detach functions that allows to
attach/detach tracing program to multiple functions/trampolines.

The attachment is defined with bpf_program and array of BTF ids of
functions to attach the bpf program to.

Adding bpf_tracing_multi_link object that holds all the attached
trampolines and is initialized in attach and used in detach.

The attachment allocates or uses currently existing trampoline
for each function to attach and links it with the bpf program.

The attach works as follows:
- we get all the needed trampolines
- lock them and add the bpf program to each (__bpf_trampoline_link_prog)
- the trampoline_multi_ops passed in __bpf_trampoline_link_prog gathers
  ftrace_hash (ip -> trampoline) objects
- we call update_ftrace_direct_add/mod to update needed locations
- we unlock all the trampolines

The detach works as follows:
- we lock all the needed trampolines
- remove the program from each (__bpf_trampoline_unlink_prog)
- the trampoline_multi_ops passed in __bpf_trampoline_unlink_prog gathers
  ftrace_hash (ip -> trampoline) objects
- we call update_ftrace_direct_del/mod to update needed locations
- we unlock and put all the trampolines

We store the old image/flags in the trampoline before the update
and use it in case we need to rollback the attachment.

We keep the ftrace_hash objects allocated during attach in the link
so they can be used for detach as well.

Adding trampoline_(un)lock_all functions to (un)lock all trampolines
to gate the tracing_multi attachment.

Note this is supported only for archs (x86_64) with ftrace direct and
have single ops support.

  CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS &&
  CONFIG_HAVE_SINGLE_FTRACE_DIRECT_OPS

It also needs CONFIG_BPF_SYSCALL enabled.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-13-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move sleepable verification code to btf_id_allow_sleepable

Move sleepable verification code to btf_id_allow_sleepable function.
It will be used in following changes.

Adding code to retrieve type's name instead of passing it from
bpf_check_attach_target function, because this function will be
called from another place in following changes and it's easier
to retrieve the name directly in here.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-12-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add multi tracing attach types

Adding new program attach types multi tracing attachment:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI

and their base support in verifier code.

Programs with such attach type will use specific link attachment
interface coming in following changes.

This was suggested by Andrii some (long) time ago and turned out
to be easier than having special program flag for that.

Bpf programs with such types have 'bpf_multi_func' function set as
their attach_btf_id and keep module reference when it's specified
by attach_prog_fd.

They are also accepted as sleepable programs during verification,
and the real validation for specific BTF_IDs/functions will happen
during the multi link attachment in following changes.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Factor fsession link to use struct bpf_tramp_node

Now that we split trampoline attachment object (bpf_tramp_node) from
the link object (bpf_tramp_link) we can use bpf_tramp_node as fsession's
fexit attachment object and get rid of the bpf_fsession_link object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add struct bpf_tramp_node object

Adding struct bpf_tramp_node to decouple the link out of the trampoline
attachment info.

At the moment the object for attaching bpf program to the trampoline is
'struct bpf_tramp_link':

  struct bpf_tramp_link {
       struct bpf_link link;
       struct hlist_node tramp_hlist;
       u64 cookie;
  }

The link holds the bpf_prog pointer and forces one link - one program
binding logic. In following changes we want to attach program to multiple
trampolines but we want to keep just one bpf_link object.

Splitting struct bpf_tramp_link into:

  struct bpf_tramp_link {
       struct bpf_link link;
       struct bpf_tramp_node node;
  };

  struct bpf_tramp_node {
       struct bpf_link *link;
       struct hlist_node tramp_hlist;
       u64 cookie;
  };

The 'struct bpf_tramp_link' defines standard single trampoline link
and 'struct bpf_tramp_node' is the attachment trampoline object with
pointer to the bpf_link object.

This will allow us to define link for multiple trampolines, like:

  struct bpf_tracing_multi_link {
       struct bpf_link link;
       ...
       int nodes_cnt;
       struct bpf_tracing_multi_node nodes[] __counted_by(nodes_cnt);
  };

Cc: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add bpf_trampoline_add/remove_prog functions

Separate bpf_trampoline_add/remove_prog functions from
__bpf_trampoline_link/unlink functions to be able to add/remove
trampoline programs without the image being updated in following
changes. No functional change is intended.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move trampoline image setup into bpf_trampoline_ops callbacks

Moving trampoline image setup into bpf_trampoline_ops callbacks,
so we can have different image handling for multi attachment which
is coming in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add struct bpf_trampoline_ops object

In following changes we will need to override ftrace direct attachment
behaviour. In order to do that we are adding struct bpf_trampoline_ops
object that defines callbacks for ftrace direct attachment:

   register_fentry
   unregister_fentry
   modify_fentry

The new struct bpf_trampoline_ops object is passed as an argument to
__bpf_trampoline_link/unlink_prog functions.

At the moment the default trampoline_ops is set to the current ftrace
direct attachment functions, so there's no functional change for the
current code.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Use mutex lock pool for bpf trampolines

Adding mutex lock pool that replaces bpf trampolines mutex.

For tracing_multi link coming in following changes we need to lock all
the involved trampolines during the attachment. This could mean thousands
of mutex locks, which is not convenient.

As suggested by Andrii we can replace bpf trampolines mutex with mutex
pool, where each trampoline is hash-ed to one of the locks from the pool.

It's better to lock all the pool mutexes (32 at the moment) than
thousands of them.

There is 48 (MAX_LOCK_DEPTH) lock limit allowed to be simultaneously
held by task, so we need to keep 32 mutexes (5 bits) in the pool, so
when we lock them all in following changes the lockdep won't scream.

Removing the mutex_is_locked in bpf_trampoline_put, because we removed
the mutex from bpf_trampoline.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ftrace: Add add_ftrace_hash_entry function

Renaming __add_hash_entry to add_ftrace_hash_entry and making it global,
it will be used in following changes outside ftrace.c object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ftrace: Add ftrace_hash_remove function

Adding ftrace_hash_remove function that removes all entries
from struct ftrace_hash object without freeing them.

It will be used in following changes where entries are allocated
as part of another structure and are free-ed separately.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ftrace: Add ftrace_hash_count function

Adding external ftrace_hash_count function so we could get hash
count outside of ftrace object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260606123955.345967-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fbdev: vesafb: fix memory leak in vesafb_probe()

Since commit 73ce73c30ba9 ("fbdev: Transfer video= option strings to
caller; clarify ownership") the string returned from fb_get_options()
is expected to be freed by the caller. But the string is not freed in
vesafb_probe(). Fix that by freeing the option string after setup.

Fixes: 73ce73c30ba9 ("fbdev: Transfer video= option strings to caller; clarify ownership")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: efifb: fix memory leak in efifb_probe()

Since commit 73ce73c30ba9 ("fbdev: Transfer video= option strings to
caller; clarify ownership") the string returned from fb_get_options()
is expected to be freed by the caller, but the string is not freed in
efifb_probe(). Fix that by freeing the option string after setup.

Fixes: 73ce73c30ba9 ("fbdev: Transfer video= option strings to caller; clarify ownership")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: uvesafb: fix potential memory leak in uvesafb_probe()

Due to an incorrect goto label, memory allocated for modedb and modelist
in uvesafb_vbe_init() is not freed in some error paths. Fix this by
updating the goto label.

Fixes: 8bdb3a2d7df4 ("uvesafb: the driver core")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: tridentfb: fix potential memory leak in trident_pci_probe()

In trident_pci_probe(), the memory allocated for modelist using
fb_videomode_to_modelist() is not freed in subsequent error paths.
Fix that by calling fb_destroy_modelist().

Fixes: 6a5e3bd0c8bc ("tridentfb: Add DDC support")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: tdfxfb: fix potential memory leak in tdfxfb_probe()

In tdfxfb_probe(), the memory allocated for modelist using
fb_videomode_to_modelist() when CONFIG_FB_3DFX_I2C is defined, is not
freed in the subsequent error paths.
Fix that by calling fb_destroy_modelist().

Fixes: 215059d2421f ("tdfxfb: make use of DDC information about connected monitor")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: s3fb: fix potential memory leak in s3_pci_probe()

In s3_pci_probe(), the memory allocated for modelist using
fb_videomode_to_modelist() is not freed in subsequent error paths.
Fix that by calling fb_destroy_modelist()

Fixes: 86c0f043a737 ("s3fb: add DDC support")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: nvidia: fix potential memory leak in nvidiafb_probe()

In nvidiafb_probe(), the memory allocated for modelist in
nvidia_set_fbinfo() is not freed in the subsequent error paths.
Fix that by calling fb_destroy_modelist().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: i740fb: fix potential memory leak in i740fb_probe()

In i740fb_probe(), the memory allocated in fb_videomode_to_modelist()
for modelist is not freed in the error paths. Fix that by calling
fb_destroy_modelist().

Fixes: 5350c65f4f15 ("Resurrect Intel740 driver: i740fb")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: carminefb: fix potential memory leak in alloc_carmine_fb()

The memory allocated for modelist in fb_videomode_to_modelist() is not
freed in the subsequent error path.
Fix that by calling fb_destroy_modelist()

Fixes: 2ece5f43b041 ("fbdev: add the carmine FB driver")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: radeon: fix potential memory leak in radeonfb_pci_register()

The function radeonfb_pci_register() allocates memory for modelist
(by calling radeon_check_modes() which calls fb_add_videomode()).
The memory is appended to info->modelist, but is not freed in subsequent
error paths. Fix this by calling fb_destroy_modelist().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: metronomefb: fix potential memory leak in metronomefb_probe()

The memory allocated for pagerefs in fb_deferred_io_init() is not freed
on the error path. Fix it by calling fb_deferred_io_cleanup().

Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: broadsheetfb: fix potential memory leak in broadsheetfb_probe()

The memory allocated for pagerefs in fb_deferred_io_init() is not freed
on the error path. Fix it by calling fb_deferred_io_cleanup().

Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: hecubafb: fix potential memory leak in hecubafb_probe()

The memory allocated for pagerefs in fb_deferred_io_init() is not freed
on the error path. Fix it by calling fb_deferred_io_cleanup().

Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct")
Cc: stable@vger.kernel.org
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: atmel_lcdfb: Use of_device_get_match_data()

Use of_device_get_match_data() to retrieve the driver match data instead
of open-coding the OF match lookup and dereferencing match->data.

This also removes the deprecated of_device.h include from the driver.

No need for NULL check as every compatible has a corresponding data
component.

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: imxfb: Use of_device_get_match_data()

Use of_device_get_match_data() to fetch the platform ID entry directly
instead of open-coding an of_match_device() lookup. No NULL check is
needed as every compatible string has a corresponding data section.

This also lets the driver drop the of_device.h include.

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbcon: Use correct type for vc_resize() return value

The return value of vc_resize() is int, but fbcon_set_disp() stores it
in an unsigned long variable. While the !ret check happens to work
correctly by coincidence (negative values become large positive values),
the types should match. Use int instead.

Eliminates the following W=3 warning:

drivers/video/fbdev/core/fbcon.c: In function 'fbcon_set_disp':
drivers/video/fbdev/core/fbcon.c:1494:14: warning: implicit conversion from 'int' to 'unsigned long' [-Wconversion]

Fixes: af0db3c1f898 ("fbdev: Fix vmalloc out-of-bounds write in fast_imageblit")
Cc: stable@vger.kernel.org # v6.17+
Signed-off-by: Jiacheng Yu <yujiacheng3@huawei.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: Consistently define pci_device_ids using named initializers

... and PCI device helpers.

The various struct pci_device_id arrays were initialized mostly by list
expressions. This isn't easily readable if you're not into PCI. Using
named initializers is more explicit and thus easier to parse. Also use
PCI_DEVICE* helper macros to assign .vendor, .device, .subvendor and
.subdevice where appropriate and skip explicit assignments of 0 (which
the compiler takes care of).

The secret plan is to make struct pci_device_id::driver_data an
anonymous union (similar to
https://lore.kernel.org/all/cover.1776579304.git.u.kleine-koenig@baylibre.com/)
and that requires named initializers. But it's also a nice cleanup on
its own.

While touching all these arrays, unify usage of whitespace and comma
in a few drivers.

This change doesn't introduce changes to the compiled pci_device_id
array. Tested on x86 and arm64.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

fbdev: matroxfb/ssd1307fb: Use named initializers for struct i2c_device_id

While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.

While touching all these arrays, unify usage of whitespace in the list
terminator.

This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Signed-off-by: Helge Deller <deller@gmx.de>

Merge tag 'input-for-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- two quirks for atkbd to deal with laptops that can not handle
   "deactivate" command on the keyboard PS/2 port

* tag 'input-for-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: atkbd - skip deactivate for HONOR BCC-N's internal keyboard
  Input: atkbd - add DMI quirk for Lenovo Yoga Air 14 (83QK)

KVM: arm64: Fix block mapping validity check in stage-1 walker

For the 64K granule size, FEAT_LPA determines whether a level 1 mapping
is allowed. Using the result of has_52bit_pa() is too restrictive, as it
also checks the selected output addressi size in TCR.(I)PS. Fix it by
only checking FEAT_LPA.

Fixes: 5da3a3b27a01 ("KVM: arm64: Expand valid block mappings to FEAT_LPA/LPA2 support")
Signed-off-by: Wei-Lin Chang <weilin.chang@arm.com>
Link: https://patch.msgid.link/20260605185255.2431996-1-weilin.chang@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Set a Linux errno on SMCCC error in kvm_call_hyp_nvhe()

If kvm_call_hyp_nvhe() fails with an SMCCC error code, we WARN().
However, the returned value isn't initialized and the caller might get
garbage or 0 which is likely to be interpreted as success.

Set a default -EOPNOTSUPP error value, ensuring all callers get the
message when hypercalls fail.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260603110312.2909844-1-vdonnefort@google.com
[maz: changed error value to -EOPNOTSUPP as suggested by Will,
tidied up change log]
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Roll back partial shares on kvm_share_hyp() failure

kvm_share_hyp() shares a range one page at a time. If share_pfn_hyp()
fails partway through, the pages already shared by this call are left
shared, while the caller treats the whole range as failed and never
unshares them.

Unshare those pages before returning the error. If an unshare itself
fails the page is leaked: it stays shared with the hypervisor and is
no longer reusable for pKVM, but no isolation guarantee is broken, so
WARN and continue. Not expected in practice.

Fixes: a83e2191b7f1 ("KVM: arm64: pkvm: Refcount the pages shared with EL2")
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260529121755.2923500-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Avoid host/hyp share desync on unshare hypercall failure

unshare_pfn_hyp() erases the tracking node from hyp_shared_pfns
and frees it before invoking __pkvm_host_unshare_hyp. If the
hypercall fails (e.g. EL2 refcount still held, or page-state
mismatch), the host loses its record while EL2 still holds the
share, breaking later share/unshare attempts on the same pfn.

Invoke the hypercall first; erase and free only on success.

Document at the kvm_unshare_hyp() call site that the WARN_ON() is
left non-fatal: a failed unshare leaks the page (it stays shared
with the hypervisor) but breaks no isolation guarantee.

Fixes: 52b28657ebd7 ("KVM: arm64: pkvm: Unshare guest structs during teardown")
Reported-by: Sashiko (local):gemini-3.1-pro
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260529121755.2923500-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Free hyp-share tracking node when share hypercall fails

share_pfn_hyp() inserts a tracking node into hyp_shared_pfns and
then invokes __pkvm_host_share_hyp. If the hypercall rejects the
share (page-state mismatch at EL2), the node stays in the tree
with refcount 1: a phantom share that leaks the allocation and
that a later unshare will trust.

Erase the node and free it on hypercall failure.

Fixes: a83e2191b7f1 ("KVM: arm64: pkvm: Refcount the pages shared with EL2")
Reported-by: Sashiko (local):gemini-3.1-pro
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260529121755.2923500-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Flush HCR_EL2.VSE to deliver SErrors to pKVM guests

With pKVM enabled, the host injects a virtual SError by setting
HCR_EL2.VSE on its vCPU copy, but flush_hyp_vcpu() only flows TWI/TWE
into the hyp vCPU that runs, so VSE never reaches it and a deferred
(masked) SError is never delivered. VSE is a host-owned injection
control, not a trap-configuration bit, so restricting the host's
trap-register values should not have dropped it.

Flow it on entry; sync_hyp_vcpu() already copies hcr_el2 back, so
delivery is reflected to the host. THis makes it consistent with
the existing forwarding of VSESR_EL2, which qualifies the Serror.

Fixes: b56680de9c648 ("KVM: arm64: Initialize trap register values in hyp in pKVM")
Reported-by: Sashiko (local):gemini-3.1-pro
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20260531154548.1505799-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Bound used_lrs when flushing the pKVM hyp vCPU

flush_hyp_vcpu() copies the host vGIC state into the hyp's private vCPU
on every run. The vGIC list register save and restore use used_lrs as
their loop bound and expect it to stay within the number of implemented
list registers. While this is generally the case, flush_hyp_vcpu()
copies vgic_v3 verbatim and does not enforce this, so a value provided
by the host is used at EL2 to index vgic_lr[] and access ICH_LR<n>_EL2
(host -> EL2).

Fix by clamping used_lrs to the number of implemented list registers
after the copy, as the trusted path already does in
vgic_flush_lr_state(). The number of implemented list registers is
constant after init, so it is replicated once from
kvm_vgic_global_state.nr_lr into hyp_gicv3_nr_lr rather than read on
every entry.

Cc: stable@vger.kernel.org
Fixes: be66e67f1750 ("KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260606175614.83273-3-imv4bel@gmail.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

KVM: arm64: Clear __hyp_running_vcpu when flushing the pKVM hyp vCPU

flush_hyp_vcpu() copies the host vCPU context into the hyp's private
vCPU on every run. ctxt_to_vcpu() expects a guest context to have a
NULL __hyp_running_vcpu, which is only ever set on the host context, so
that it resolves the vCPU via container_of(). While this is generally
the case, flush_hyp_vcpu() copies the context verbatim and does not
enforce this, so a value provided by the host is dereferenced at EL2
(host -> EL2).

Fix by clearing __hyp_running_vcpu after the copy.

Cc: stable@vger.kernel.org
Fixes: be66e67f1750 ("KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260606175614.83273-2-imv4bel@gmail.com
Signed-off-by: Marc Zyngier <maz@kernel.org>

riscv: also select ARCH_KEEP_MEMBLOCK if kexec is selected

On RISC-V, also select ARCH_KEEP_MEMBLOCK if kexec is selected, not
only if ACPI is selected. This is because kexec requires the memblock
areas to be kept after boot to initialize the secondary kernel. This
is needed for both Device Tree and ACPI platforms.

Signed-off-by: Han Gao <gaohan@iscas.ac.cn>
Link: https://patch.msgid.link/20260519165546.123105-1-gaohan@iscas.ac.cn
[pjw@kernel.org: change to add the dependency on kexec, rather than making it unconditional;
rewrite the patch description accordingly]
Signed-off-by: Paul Walmsley <pjw@kernel.org>

net/mlx5: Add sd_group_size bits for SD management

Currently, mlx5 is querying the MPIR register to get the number of PFs
that should comprise the SD group.
However, this register does not reflect the correct number in complex
deployments. Hence, add an sd_group_size field to nic_vport_context to
determine the correct number of PFs, and add an sd_group_size capability
bit to indicate whether FW supports it.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260529052359.389413-3-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

net/mlx5: Update IFC allowed_list_size field bits

The vport context allowed_list_size was increased from 12 to 16 bits.

Writing to this field is protected by the log_max_current_uc/mc_list
capabilities. On older FW versions these capabilities are limited
to < 2K and only the high bits of the field are extended. This means
that the change is backward compatible with older FW versions.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260529052359.389413-2-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

ALSA: es18xx: check control allocation before private data setup

snd_es18xx_mixer() creates controls with snd_ctl_new1() and then stores
bookkeeping pointers or sets private_free before calling snd_ctl_add().
snd_ctl_new1() can return NULL on allocation failure, so those writes
can dereference a NULL control pointer.

Check the returned control pointers before using them and return -ENOMEM
on allocation failure.

Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Link: https://patch.msgid.link/20260607074219.3-1-ruoyuw560@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n

The CONFIG_FILE_LOCKING=n stub for break_lease() takes a 'bool wait'
argument, whereas the CONFIG_FILE_LOCKING=y version and every caller pass
an openmode as an 'unsigned int mode'. The mismatch was introduced when
__break_lease() was reworked to use flags: only the stub was switched to
'bool wait', a stray leftover from the neighbouring break_layout()
helper. The real prototype kept 'unsigned int mode'.

This was harmless until O_WRONLY changed from the octal literal 00000001
to (1 << 0). clang's -Wtautological-constant-compare then fires on the
implicit shift-to-bool conversion at the first FILE_LOCKING=n caller:

  fs/open.c:112:29: warning: converting the result of '<<' to a boolean
                    always evaluates to true [-Wtautological-constant-compare]
    112 | error = break_lease(inode, O_WRONLY);

Restore the stub's parameter to 'unsigned int mode' so it matches the
real prototype and every caller. The stub still just returns 0, so there
is no functional change; it removes the type inconsistency and silences
the warning.

Root cause diagnosed by Nathan Chancellor.

Fixes: 4be9f3cc582a ("filelock: rework the __break_lease API to use flags")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606071029.DKCs8WOs-lkp@intel.com/
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfilter: nf_conntrack: use get_unaligned_be32() in tcp_sack()

The timestamp-only fast path dereferences the option stream as
*(__be32 *)ptr, which assumes 4-byte alignment that the TCP option
stream does not guarantee. Use get_unaligned_be32() instead, which
reads the value safely and already returns host byte order, so the
htonl() on the comparison constant can be dropped.

This matches the existing get_unaligned_be32() use later in the same
function.

Assisted-by: Claude:Opus-4.7
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: avoid num_encaps underflow on bridge VLAN untag

The DEV_PATH_BR_VLAN_UNTAG case post-decrements info->num_encaps
inside WARN_ON_ONCE(). num_encaps is u8, so if it's already 0 the
decrement still happens and wraps it to 255. The break only leaves
the inner switch -- a later path entry can set info->indev back to
a real device, and we end up returning with num_encaps == 255.

nft_dev_forward_path() then walks info.encap[] (size 2) up to
num_encaps, which means an OOB stack read and a bogus count copied
into the route descriptor.

Should only happen on a malformed bridge path stack, hence the WARN,
but worth handling sanely. Move the decrement out of the WARN.

[ While at this, remove the WARN_ON_ONCE since this can only happen
with a buggy bridge path stack --pablo ].

Fixes: e990cef6516d ("netfilter: flowtable: add bridge vlan filtering support")
Signed-off-by: David Carlier <devnexen@gmail.com>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>