git.ipfire.org Git - thirdparty/kernel/linux.git/log

perf header: Sanity check HEADER_HYBRID_TOPOLOGY

Add upper bound check on nr_nodes in process_hybrid_topology() to
harden against malformed perf.data files (reuses MAX_PMU_MAPPINGS,
4096).

Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_CACHE

Add upper bound check on cache entry count in process_cache() to harden
against malformed perf.data files (max 32768).

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_GROUP_DESC

Add upper bound check on nr_groups in process_group_desc() to harden
against malformed perf.data files (max 32768), and move the env
assignment after validation.

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_PMU_MAPPINGS

Add upper bound check on pmu_num in process_pmu_mappings() to harden
against malformed perf.data files (max 4096).

Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_MEM_TOPOLOGY

Add validation to process_mem_topology() to harden against malformed
perf.data files:

- Upper bound check on nr_nodes (reuses MAX_NUMA_NODES, 4096)
- Minimum section size check before allocating

This is particularly important here since nr is u64, making unbounded
values especially dangerous.

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_NUMA_TOPOLOGY

Add validation to process_numa_topology() to harden against malformed
perf.data files:

- Upper bound check on nr_nodes (max 4096)
- Minimum section size check before allocating

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_CPU_TOPOLOGY

Add validation to process_cpu_topology() to harden against malformed
perf.data files:

- Verify nr_cpus_avail was initialized (HEADER_NRCPUS processed first)
- Bounds check sibling counts (cores, threads, dies) against nr_cpus_avail
- Fix two bare 'return -1' that leaked env->cpu by using 'goto free_cpu'

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO

While working on some cleanups sashiko questioned about pre-existing
issues, namely lacking sanity checks for perf.data headers, add some
with the help of Claude.

Cc: Ian Rogers <irogers@google.com>
Cc: Swapnil Sapkal <swapnil.sapkal@amd.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Bump up the max number of command line args allowed

We need to do some upper limit validation, bump up the arbitrary limit
as per suggestion of Sashiko about command line wildcard expansion
ending up with more than 32768 args.

Link: https://sashiko.dev/#/patchset/20260408172846.96360-1-acme%40kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO

Further validate the HEADER_CPU_DOMAIN_INFO fields, this time checking
the nr_domains field.

Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sample: Fix documentation typo

s/PEF/PERF/

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf arm_spe: Improve SIMD flags setting

Fill in ASE and SME operations for the SIMD arch field.

Also set the predicate flags for SVE and SME, but differences between
them: SME does not have a predicate flag, so the setting is based on
events. SVE provides a predicate flag to indicate whether the predicate
is disabled, which allows it to be distinguished into four cases: full
predicates, empty predicates, fully predicated, and disabled predicates.

After:

    perf report -s +simd
    ...
    0.06%     0.06%  sve-test  sve-test               [.] setz                                        [p] SVE
    0.06%     0.06%  sve-test  [kernel.kallsyms]      [k] do_raw_spin_lock
    0.06%     0.06%  sve-test  sve-test               [.] getz                                        [p] SVE
    0.06%     0.06%  sve-test  [kernel.kallsyms]      [k] timekeeping_advance
    0.06%     0.06%  sve-test  sve-test               [.] getz                                        [d] SVE
    0.06%     0.06%  sve-test  [kernel.kallsyms]      [k] update_load_avg
    0.06%     0.06%  sve-test  sve-test               [.] getz                                        [e] SVE
    0.05%     0.05%  sve-test  sve-test               [.] setz                                        [e] SVE
    0.05%     0.05%  sve-test  [kernel.kallsyms]      [k] update_curr
    0.05%     0.05%  sve-test  sve-test               [.] setz                                        [d] SVE
    0.05%     0.05%  sve-test  [kernel.kallsyms]      [k] do_raw_spin_unlock
    0.05%     0.05%  sve-test  [kernel.kallsyms]      [k] timekeeping_update_from_shadow.constprop.0
    0.05%     0.05%  sve-test  sve-test               [.] getz                                        [f] SVE
    0.05%     0.05%  sve-test  sve-test               [.] setz                                        [f] SVE

Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf report: Update document for SIMD flags

Update SIMD architecture and predicate flags.

Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sort: Sort disabled and full predicated flags

According to the Arm ARM (ARM DDI 0487, L.a), section D18.2.6
"Events packet", apart from the empty predicate and partial
predicates, an SVE or SME operation can be predicate-disabled
or full predicated.

To provide complete results, introduce two predicate types for
these cases.

Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sort: Support sort ASE and SME

Support sort Advance SIMD extension (ASE) and SME.

Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf test: Make perf trace BTF general tests exclusive

Running both tests cases 126 128 together causes the first test case
126 to fail:
# for i in $(seq 3); do ./perf test 'perf trace BTF general tests' \
'perf trace record and replay'; done
126: perf trace BTF general tests    : FAILED!
128: perf trace record and replay    : Ok
126: perf trace BTF general tests    : FAILED!
128: perf trace record and replay    : Ok
126: perf trace BTF general tests    : FAILED!
128: perf trace record and replay    : Ok
#

Test case 126 fails because test case 128 runs concurrently as can
be observed using a ps -ef | grep perf output list on a different
window. Both do a perf trace command concurrently.
Make test case 'perf trace BTF general tests' exclusive.

Output after:
# for i in $(seq 3); do ./perf test 'perf trace BTF general tests' \
'perf trace record and replay'; done
127: perf trace BTF general tests                   : Ok
155: perf trace record and replay                   : Ok
127: perf trace BTF general tests                   : Ok
155: perf trace record and replay                   : Ok
127: perf trace BTF general tests                   : Ok
155: perf trace record and replay                   : Ok
#

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf data: Clean up use_stdio and structures

use_stdio was associated with struct perf_data and not perf_data_file
meaning there was implicit use of fd rather than fptr that may not be
safe. For example, in perf_data_file__write. Reorganize perf_data_file
to better abstract use_stdio, add kernel-doc and more consistently use
perf_data__ accessors so that use_stdio is better respected.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Replace basename() calls with perf_basename()

As noticed in a sashiko review for a patch adding a missing libgen.h
in a file using basename():

https://sashiko.dev/#/patchset/20260402001740.2220481-1-acme%40kernel.org

So avoid these subtleties and instead reuse the gnu_basename() function
we had in srcline.c, renaming it to perf_basename() and replace
basename() calls with it, simplifying several cases by removing now
needless strdups.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Use calloc() where applicable

Instead of using zalloc(nr_entries * sizeof_entry) that is what calloc()
does.

In some places where linux/zalloc.h isn't needed, remove it, add when
needed and was getting it indirectly.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Do validation of perf.data HEADER_CPU_DOMAIN_INFO

As suggested in an unrelated sashiko review:

https://sashiko.dev/#/patchset/20260407195145.2372104-1-acme%40kernel.org

"
Could a malformed perf.data file provide out-of-bounds values for cpu and
domain?
These variables are read directly from the file and used as indices for
cd_map and cd_map[cpu]->domains without any validation against
env->nr_cpus_avail or max_sched_domains.
Similar to the issue above, this is an existing lack of validation that
becomes apparent when looking at the allocation boundaries.
"

Validate it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Use a max number of command line args

Sashiko suggests we use some reasonable max number of args to avoid
overflows when reading perf.data files, do it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf bench: Constify tables

Those tables and variables don't change, better capture this by
explicitely using 'const'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Make more global variables static

`make check` will run sparse on the perf code base. A frequent warning
is "warning: symbol '...' was not declared. Should it be static?" Go
through and make global definitions without declarations static.

In some cases it is deliberate due to dlsym accessing the symbol, this
change doesn't clean up the missing declarations for perf test suites.

Sometimes things can opportunistically be made const.

Making somethings static exposed unused functions warnings, so
restructuring of ifdefs was necessary for that.

These changes reduce the size of the perf binary by 568 bytes.

Committer notes:

Refreshed the patch, the original one fell thru the cracks, updated the
size reduction.

Remove the trace-event-scripting.c changes, break the build, noticed
with container builds and with sashiko:

https://sashiko.dev/#/patchset/20260401215306.2152898-1-acme%40kernel.org

Also make two variables static to address another sashiko review
comment:

https://sashiko.dev/#/patchset/20260402001740.2220481-1-acme%40kernel.org

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf util: Kill die() prototype, dead for a long time

In fef2a735167a827a ("perf tools: Kill die()") the die() function was
removed, but not the prototype in util.h, now when building with
LIBPERL=1, during a 'make -C tools/perf build-test' routine test, it is
failing as perl likes die() calls and then this clashes with this
remnant, remove it.

Fixes: fef2a735167a827a ("perf tools: Kill die()")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf symbols: Make variable receiving result strrchr() const

Fixing:

  util/symbol.c: In function ‘symbol__config_symfs’:
  util/symbol.c:2499:20: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
   2499 |         layout_str = strrchr(dir, ',');
        |

With recent gcc/glibc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf maps: Fix copy_from that can break sorted by name order

When an parent is copied into a child the name array is populated in
address not name order. Make sure the name array isn't flagged as sorted.

Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf maps: Fix fixup_overlap_and_insert that can break sorted by name order

When an entry in the address array is replaced, the corresponding name
entry is replaced. The entries names may sort differently and so it is
important that the sorted by name property be cleared on the maps.

Fixes: 0d11fab32714 ("perf maps: Fixup maps_by_name when modifying maps_by_address")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf maps: Move getting debug_file to verbose path

Getting debug_file can trigger warnings if not set. Avoid getting
these warnings by pushing the use under the controlling if.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf addr2line: Remove global variable addr2line_timeout_ms

Remove global variable addr2line_timeout_ms and add it as a member
to symbol_conf structure.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
[namhyung: move the initialization to util/symbol.c]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf config: Make symbol_conf::addr2line_disable_warn configurable

Make symbol_conf::addr2line_disable_warn configurable by reading
the perfconfig file.
Use section core and addr2line-disable-warn = value.
Update documentation.

Example:
# perf config -l
core.addr2line-timeout=5000
core.addr2line-disable-warn=1
#

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf config: Rename symbol_conf::disable_add2line_warn

Rename member symbol_conf::disable_add2line_warn to
symbol_conf::addr2line_disable_warn to make it consistent with other
addr2line_xxx constants.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf test: Skip sched stats test for !root

Running perf sched stats requires root and it fails to open the
schedstat file for regular users.  Let's skip the test.

  $ perf sched stats true
  Failed to open /proc/sys/kernel/sched_schedstats

Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf cgroup: Update metric leader in evlist__expand_cgroup

When the evlist is expanded the metric leader wasn't being updated. As
the original evsel is deleted this creates a use-after-free in
stat-shadow's prepare_metric. This was detected running the "perf stat
--bpf-counters --for-each-cgroup test" with sanitizers.

The change itself puts the copied evsel into the priv field (known
unused because of evsel__clone use) and then in a second pass over the
list updates the copied values using the priv pointer.

Fixes: d1c5a0e86a4e ("perf stat: Add --for-each-cgroup option")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sample: Add evsel to struct perf_sample

Add the evsel from evsel__parse_sample into the struct
perf_sample. Sometimes we want to alter the evsel associated with a
sample, such as with off-cpu bpf-output events. In general the evsel
and perf_sample are passed as a pair, but this makes an altered evsel
something of a chore to keep checking for and setting up. Later
patches will remove passing an evsel with the perf_sample and switch
to just using the perf_sample's value.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sample: Make sure perf_sample__init/exit are used

The deferred stack trace code wasn't using perf_sample__init/exit. Add
the deferred stack trace clean up to perf_sample__exit which requires
proper NULL initialization in perf_sample__init. Make the
perf_sample__exit robust to being called more than once by using
zfree. Make the error paths in evsel__parse_sample exit the
sample. Add a merged_callchain boolean to capture that callchain is
allocated, deferred_callchain doen't suffice for this. Pack the struct
variables to avoid padding bytes for this.

Similiarly powerpc_vpadtl_sample wasn't using perf_sample__init/exit,
use it for consistency and potential issues with uninitialized
variables.

Similarly guest_session__inject_events in builtin-inject wasn't using
perf_sample_init/exit. The lifetime management for fetched events is
somewhat complex there, but when an event is fetched the sample should
be initialized and needs exiting on error. The sample may be left in
place so that future injects have access to it.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sample: Document struct perf_sample

Add kernel-doc for struct perf_sample capturing the somewhat unusual
population of fields and lifetime relationships.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Save cln_size header

Store cacheline size during perf record in header, so that cacheline
size can be used for other features, like sort keys for perf report.

Testing example with feat enabled:

  $ perf record ./Example

  $ perf report --header-only | grep -C 3 cacheline
  CPU_DOMAIN_INFO info available, use -I to display
  e_machine : 62
  e_flags : 0
  cacheline size: 64
  missing features: TRACING_DATA BUILD_ID BRANCH_STACK GROUP_DESC AUXTRACE \
  STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA
  ========

[namhyung: Update the commit message and remove blank lines]
Signed-off-by: Ricky Ringler <ricky.ringler@proton.me>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tests sched stats: Write output to temp file

Writing to the perf.data file can fail in various contexts such as
continual test. Other tests write to a mktemp-ed file, make the "perf
sched stats tests" follow this convention.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf sched: Avoid crash for unexpected perf sched stats report

Doing a `perf sched record` then `perf sched stats report` crashes as
the tp_handler isn't set. Add a dummy tp_handler for it rather than
adding an extra check.

Reported-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf metrics: Make common stalled metrics conditional on having the event

The metric code uses the event parsing code but it generally assumes
all events are supported. Arnaldo reported AMD supporting
stalled-cycles-frontend but not stalled-cycles-backend [1]. An issue
with this is that before parsing happens the metric code tries to
share events within groups to reduce the number of events and
multiplexing. If the group has some supported and not supported
events, the whole group will become broken. To avoid this situation
add has_event tests to the metrics for stalled-cycles-frontend and
stalled-cycles-backend. has_events is evaluated when parsing the
metric and its result constant propagated (with if-elses) to reduce
the number of events. This means when the metric code considers
sharing the events, only supported events will be shared.

Note for backporting. This change updates
tools/perf/pmu-events/empty-pmu-events.c a convenience file for builds
on systems without python present. While the metrics.json code should
backport easily there can be conflicts on empty-pmu-events.c. In this
case the build will have left a file test-empty-pmu-events.c that can
be copied over empty-pmu-events.c to resolve issues and make an
appropriate empty-pmu-events.c for the json in the source tree at the
time of the build.

[1] https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Closes: https://lore.kernel.org/lkml/abm1nR-2xjOUBroD@x1/
Fixes: c7adeb0974f1 ("perf jevents: Add set of common metrics based on default ones")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tests kwork: Add basic kwork coverage tests

Add basic kwork coverage tests for record, report, latency, timehist
and top.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf data convert ctf: Pipe mode improvements

Handle the finished_round event. Set up the CTF events when the
feature event desc is read. In pipe mode the attr events will create
the evsels and the feature event desc events will name the evsels. The
CTF events need the evsel name, so wait until feature event descs are
read (in pipe mode) before setting up the events except for tracepoint
events. Handle the tracing_data event so that tracepoint information
is available when setting up tracepoint events.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf evsel: Make unknown event names more unique

In situations like the perf data converter the evsel__name will be
used to create babeltrace events. If the events have the same name
then creation can fail. Avoid these failures by including more
information into the unknown event names.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf ordered-events: Event processing consistency with the regular reader

Some event processing functions like perf_event__process_tracing_data
return a zero or positive value on success. Ordered event processing
handles any non-zero value as an error, which is inconsistent with
reader__process_events and reader__read_event that only treat negative
values as errors. Make the ordered events error handling consistent
with that of the events reader.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Refactor pipe mode end marker handling

In non-pipe/data mode the header has a 256-bit bitmap representing
whether a feature is enabled or not. In pipe mode features are written
out in perf_event__synthesize_features as PERF_RECORD_HEADER_FEATURE
events with a special zero sized marker for the last feature. If a new
feature is added the last feature marker event appears as that feature
from old pipe mode perf data. As the event is zero sized it will fail
to be processed and generally terminate perf.

Add a last_feat variable to the header that in non-pipe/data mode is
just HEADER_LAST_FEATURE. In pipe mode compute the last_feat by
handling zero sized feature events, assuming they are the marker and
updating last_feat accordingly. Potentially a feature event could be
zero sized and so still process the feature event, just ignore the
error if it fails.

As perf_event__process_feature can properly handle pipe mode data,
migrate users to it except for report that still wants to group events
and stop header printing with the last feature marker. Make
perf_event__process_feature non-fatal in the case of a newer feature
than this version of perf's HEADER_LAST_FEATURE, which was the
behavior all users wanted.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf session: Extra logging for failed to process events

Print log information in ordered event processing so that the cause of
finished round failing is clearer. Print the event name along with its
number when an event isn't processed. Add extra detail about where the
failure happened.

The following log lines come from running `perf data convert`. Before:
0xa250 [0x10]: failed to process type: 80

After:
0xa250 [0x10]: piped event processing failed for event of type: FEATURE (80)

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Properly warn/print when libtraceevent/libbpf support is missing

By removing the features from feat_ops with ifdefs the previous logic
would print "# (null)" when perf processed a feature that lacked
builtin support. Remove the ifdefs from feat_ops and in the relevant
functions print errors/messages about the lack of support.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Add utility to convert feature number to a string

For logging and debug messages it can be convenient to convert a
feature number to a name. Add header_feat__name for this and reuse the
data already within the feat_ops struct.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf clockid: Add missing include

clockid_t is declared in time.h but the include is missing. Reordering
header files may result in build breakages. Add the include to avoid
this.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf trace: Fix potential u64 underflow in duration calculation

Although it happens very rarely, in case of out-of-order events (i.e.
due to CPU migration when a syscall is executed), the calculation of
event duration might underflow and thus a bogus value is printed:

    2.804 ( 0.001 ms): :49553/49553 rt_sigaction(sig: QUIT, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0
    2.807 ( 0.001 ms): :49553/49553 rt_sigaction(sig: CHLD, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0
    2.815 (18446744073709.438 ms): :49553/49553 execve(filename: 0xbb173a30, argv: 0x55aabb171930, envp: 0x55aabb171120) = 0
    2.815 ( 0.534 ms): pwd/49553  ... [continued]: execve())                                           = 0

Check for possible underflow first and in case of a bogus value, do
not print it.

    2.804 ( 0.001 ms): :49553/49553 rt_sigaction(sig: QUIT, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0
    2.807 ( 0.001 ms): :49553/49553 rt_sigaction(sig: CHLD, act: 0x7fff403ed6e0, oact: 0x7fff403ed780, sigsetsize: 8) = 0
    2.815 (         ): :49553/49553 execve(filename: 0xbb173a30, argv: 0x55aabb171930, envp: 0x55aabb171120) = 0
    2.815 ( 0.534 ms): pwd/49553  ... [continued]: execve())                                           = 0

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf header: Validate build_id filename length to prevent buffer overflow

The build_id parsing functions calculate a filename length from the
event header size and read directly into a stack buffer of PATH_MAX
bytes without bounds checking. A malformed perf.data file with a
crafted header.size can cause the length to be negative or exceed
PATH_MAX, resulting in a stack buffer overflow.

Add bounds checking for the filename length in both
perf_header__read_build_ids() and the ABI quirk variant. Print a
warning message when invalid length is detected.

Signed-off-by: SeungJu Cheon <suunj1331@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf metricgroup: Refine error logs

Return -ENOENT when no metric/group matches, and directly use the return
value from expr__find_ids(), so -EINVAL is reserved for parse failures.

Print separate logs to make it clear.

Before:

  perf stat -C 5 -vvv
  Using CPUID 0x00000000410fd490
  metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound
  parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES)
  Failure to read '#slots'
  literal: #slots = nan
  syntax error
  Cannot find metric or group `Default'

After:

  perf stat -C 5 -vvv
  Using CPUID 0x00000000410fd490
  metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound
  parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES)
  Failure to read '#slots'
  literal: #slots = nan
  syntax error
  Fail to parse metric or group `Default'

Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf expr: Add '\n' in literal parse errors

Add a trailing newline for logs.

Before:

  perf stat -C 5
  Failure to read '#slots'Cannot find metric or group `Default'

After:

  perf stat -C 5
  Failure to read '#slots'
  Cannot find metric or group `Default'

Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf expr: Return -EINVAL for syntax error in expr__find_ids()

expr__find_ids() propagates the parser return value directly.  For syntax
errors, the parser can return a positive value, but callers treat it as
success, e.g., for below case on Arm64 platform:

  metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound
  parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES)
  Failure to read '#slots' literal: #slots = nan
  syntax error

Convert positive parser returns in expr__find_ids() to -EINVAL, as a
result, the error value will be respected by callers.

Before:

  perf stat -C 5
  Failure to read '#slots'Failure to read '#slots'Failure to read '#slots'Failure to read '#slots'Segmentation fault

After:

  perf stat -C 5
  Failure to read '#slots'Cannot find metric or group `Default'

Fixes: ded80bda8bc9 ("perf expr: Migrate expr ids table to a hashmap")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf test: Skip perf data type profiling tests for s390

Test case 'perf data type profiling tests' fails on s390 with this
error:

  # ./perf mem record -- ./perf test -w code_with_type
  failed: no PMU supports the memory events
  # echo $?
  255
  #

because s390 does not support memory events at all. According to the
man page, perf annotate --code-with-type only works with memory
instructions only.  As command 'perf mem record ...' is not supported
on s390, skip this test for s390.

Output before:
# ./perf test 'perf data type profiling tests'
77: perf data type profiling tests                        : FAILED!

Output after:
# ./perf test 'perf data type profiling tests'
77: perf data type profiling tests                        : Skip

Fixes: f60a5c22967b8 ("perf tests: Test annotate with data type profiling and rust")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: prevent null dsos from being added

When sorting the dso array we sometimes get a crash due to null
comparisons in comparator functions. So prevent __dsos__add from
adding null to the dso array to avoid out-of-memory related errors.

Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf test: Fix ratio_to_prev event parsing test

test__ratio_to_prev() assumed the first event in a group is the leader,
which is not the case when the event is expanded into two event groups
on hybrid PMU's with auto counter reload support. Instead, iterate over the
event group generated for each core PMU. Also update "wrong leader" test to
check that the subordinate event has the correct leader instead of checking
that it is not the group leader. Finally, do not exit immediately if a PMU
without auto counter reload support is found.

Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 56be0fe5f62c ("perf record: Add auto counter reload parse and regression tests")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Fix module symbol resolution for non-zero .text sh_addr

When perf resolves symbols from kernel module ELF files (ET_REL),
it converts symbol addresses to file offsets so that sample IPs
can be matched to the correct symbol. The conversion adjusts each
symbol's st_value:

sym->st_value -= shdr->sh_addr - shdr->sh_offset;

For vmlinux (ET_EXEC), st_value is a virtual address and sh_addr
is the section's virtual base, so subtracting sh_addr and adding
sh_offset correctly yields a file offset.

For kernel modules (ET_REL), st_value is a section-relative
offset. The module loader ignores sh_addr entirely and places
symbols at module_base + st_value. Converting to file offset
requires only adding sh_offset; subtracting sh_addr introduces an
error equal to sh_addr bytes.

When .text has sh_addr == 0 -- the historical norm for simple
modules -- both formulas produce the same result and the bug is
latent. As modules gain more metadata sections before .text (.note,
.static_call.text, etc.), the linker assigns .text a non-zero
sh_addr, exposing the defect. For example, nfsd.ko on this kernel
has sh_addr=0xa80, kvm-intel.ko has sh_addr=0x1e90.

The effect is that all .text symbols in affected modules
shift by sh_addr bytes relative to sample IPs, causing perf
report to attribute samples to incorrect, nearby symbols. This
was observed as 13% of LLC-load-miss samples misattributed
to nfsd_file_get_dio_attrs when the actual hot function was
nfsd_cache_lookup, approximately 0xa80 bytes away in the symbol
table.

Use the existing dso__rel() flag (already set for ET_REL modules)
to select the correct adjustment: add sh_offset for ET_REL,
subtract (sh_addr - sh_offset) for ET_EXEC/ET_DYN.

Fixes: 0131c4ec794a ("perf tools: Make it possible to read object code from kernel modules")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf trace: Skip unnecessary synthesis for summary-only mode

It needs to synthesize task info for the comm name.  The mmap
information is only needed for callchain symbolization which is not used
by the summary mode.  Also total or cgroup summary mode don't require
the task info.  Let's skip the processing if possible.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf stat: Fix crash on arm64

Perf stat is crashing on arm64 hosts with the following issue:

  # make -C tools/perf DEBUG=1
  # perf stat sleep 1
  perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
  [1]    1220794 IOT instruction (core dumped)  ./perf stat

The sorting function introduced by commit a745c0831c15c ("perf stat:
Sort default events/metrics") compares events based on their individual
properties. This can cause events from different groups to be
interleaved, resulting in group members appearing before their leaders
in the sorted evlist.

When the iterator opens events in list order, a group member may be
processed before its leader has been opened.

For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
could be sorted before its leader, causing the crash when CPU_CYCLES
tries to get its group fd from the not-yet-opened leader.

Fix this by comparing events based on their leader's attributes instead
of their own attributes when the events are in different groups. This
ensures all members of a group share the same sort key as their leader,
keeping groups together and guaranteeing leaders are opened before their
members.

Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
Tested-by: Dmitry Ilvokhin <d@ilvokhin.com>
Tested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf test: Fix perf stat --bpf-counters on hybrid machines

The test constantly fails on my Intel hybrid machine.  The issue was it
has two events in the output even if I only gave it one event.

  $ perf stat -e instructions -- perf test -w sqrtloop

   Performance counter stats for 'perf test -w sqrtloop':

         910,856,421      cpu_atom/instructions/                (28.05%)
      14,852,865,997      cpu_core/instructions/                (96.79%)

         1.014313341 seconds time elapsed

         1.004114000 seconds user
         0.008174000 seconds sys

Let's modify the awk script to add the values for each line and print
the total.  The variable 'i' has a number of input lines that have valid
output and variable 'c' has the sum of actual counter values.  That way
it should work on any platforms.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tests: Write test files to tmpdir

Writing to the test output files in the current working directory can
fail in various contexts such as continual test. Other tests write to
a mktemp-ed file, make the "perf script task-analyszer tests" follow
this convention too. Currently this isn't possible for the perf.data
file due to a lack of perf script support, add a variable for when
this support is available.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

libperf cpumap: Make index and nr types unsigned

The index into the cpumap array and the number of entries within the
array can never be negative, so let's make them unsigned. This is
prompted by reports that gcc 13 with -O6 is giving a
alloc-size-larger-than errors. The change makes the cpumap changes and
then updates the declaration of index variables throughout perf and
libperf to be unsigned. The two things are hard to separate as
compiler warnings about mixing signed and unsigned types breaks the
build.

Reported-by: Chingbin Li <liqb365@163.com>
Closes: https://lore.kernel.org/lkml/20260212025127.841090-1-liqb365@163.com/
Tested-by: Chingbin Li <liqb365@163.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf beauty: Move copy of fadvise.h from tools/include/ to tools/perf/trace/beauty/include/

As it is not really used when compiling anything, just being parsed to
collect number->string tables for 'perf trace'.

  $ git grep fadvise.h tools/
  tools/perf/Makefile.perf:$(fadvise_advice_array): $(beauty_uapi_linux_dir)/fadvise.h $(fadvise_advice_tbl)
  tools/perf/check-headers.sh:  "include/uapi/linux/fadvise.h"
  tools/perf/trace/beauty/fadvise.sh:grep -E $regex ${header_dir}/fadvise.h | \
  tools/perf/trace/beauty/fadvise.sh:# tools/include/uapi/linux/fadvise.h for details.
  $

Link: https://lore.kernel.org/r/CAP-5=fVBNQVF8k3JUQjH1nkP69ZVp8BqP+uwygcx=xO0zC4xrg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf beauty: Move tools/include/uapi/drm to tools/perf/trace/beauty/include/uapi

As it is used only to parse ioctl numbers, not to build perf and so far
no other tools/ living tool uses it, so to clean up tools/include/ to be
used just for building tools, to have access to things available in the
kernel and not yet in the system headers, move it to the directory where
just the tools/perf/trace/beauty/ scripts can use to generate tables
used by perf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf build: Add -funsigned-char to default CFLAGS

Commit 3bc753c06dd0 ("kbuild: treat char as always unsigned") made
chars unsigned by default in the Linux kernel. To avoid similar kinds
of bugs and warnings, make unsigned chars the default for the perf tool.

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Add --pmu-filter option for filtering PMUs

This patch adds a new --pmu-filter option to perf-stat command to allow
filtering events on specific PMUs. This is useful when there are
multiple PMUs with same type (e.g. hisi_sicl2_cpa0 and hisi_sicl0_cpa0).

[root@localhost tmp]# perf stat -M cpa_p0_avg_bw
Performance counter stats for 'system wide':

    19,417,779,115      hisi_sicl0_cpa0/cpa_cycles/      #     0.00 cpa_p0_avg_bw
                 0      hisi_sicl0_cpa0/cpa_p0_wr_dat/
                 0      hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/
                 0      hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/
    19,417,751,103      hisi_sicl10_cpa0/cpa_cycles/     #     0.00 cpa_p0_avg_bw
                 0      hisi_sicl10_cpa0/cpa_p0_wr_dat/
                 0      hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/
                 0      hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/
    19,417,730,679      hisi_sicl2_cpa0/cpa_cycles/      #     0.31 cpa_p0_avg_bw
        75,635,749      hisi_sicl2_cpa0/cpa_p0_wr_dat/
        18,520,640      hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/
                 0      hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/
    19,417,674,227      hisi_sicl8_cpa0/cpa_cycles/      #     0.00 cpa_p0_avg_bw
                 0      hisi_sicl8_cpa0/cpa_p0_wr_dat/
                 0      hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/
                 0      hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/

      19.417734480 seconds time elapsed

[root@localhost tmp]# perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw
Performance counter stats for 'system wide':

     6,234,093,559      cpa_cycles                       #     0.60 cpa_p0_avg_bw
        50,548,465      cpa_p0_wr_dat
         7,552,182      cpa_p0_rd_dat_64b
                 0      cpa_p0_rd_dat_32b

       6.234139320 seconds time elapsed

Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf report: Add comm_nodigit sort key

The "comm" column allows grouping events by the process command. It is
intended to group like programs, despite having different PIDs. But some
workloads may adjust their own command, so that a unique identifier
(e.g. a PID or some other numeric value) is part of the command name.
This destroys the utility of "comm", forcing perf to place each unique
process name into its own bucket, which can contribute to a
combinatorial explosion of memory use in perf report.

Create a less strict version of this column, which ignores digits when
comparing command names. Commands whose names are the same (ignoring
digits) are sorted into the same histogram buckets, and displayed with
the placeholder value "<N>" in the place of digits. For example,
hypothetical command names "kworker/1" "kworker/2" "kworker/3" would
sort into the same bucket and be represented as "kworker/<N>".

Committer testing:

  $ perf report -s comm,comm_nodigit | grep -F "<N>"
       0.01%  CPU 6/TCG        CPU <N>/TCG
       0.01%  kworker/53:2-mm  kworker/<N>:<N>-mm
       0.01%  migration/24     migration/<N>
       0.01%  kworker/24:1-ev  kworker/<N>:<N>-ev
       0.01%  llvmpipe-8       llvmpipe-<N>

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf stat: Fix opt->value type for parse_cache_level

Commit f5803651b4a4 ("perf stat: Choose the most disaggregate command
line option") changed aggregation option handling for `perf stat` but
not `perf stat report` leading to parse_cache_level being passed a
struct in the `perf stat` case but erroneously an aggr_mode enum value
for `perf stat report`. Change the `perf stat report` aggregation
handling to use the same opt_aggr_mode as `perf stat`. Also, just pass
the boolean for consistency with other boolean argument handling.

Fixes: f5803651b4a4 ("perf stat: Choose the most disaggregate command line option")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf lock: Fix option value type in parse_max_stack

The value is a void* and the address of an int, max_stack_depth, is
set up in the perf lock options. The parse_max_stack function treats
the int* as a long*, make this more correct by declaring the value to
be an int*.

Fixes: 0a277b622670 ("perf lock contention: Check --max-stack option")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf record: Add support for arch_sdt_arg_parse_op() on s390

commit e5e66adfe45a6 ("perf regs: Remove __weak attributive arch_sdt_arg_parse_op() function")
removes arch_sdt_arg_parse_op() functions and reveals missing s390 support.
The following warning is printed:

Unknown ELF machine 22, standard arguments parse will be skipped.

ELF machine 22 is the EM_S390 host. This happens with command
# ./perf record -v -- stress-ng -t 1s --matrix 0
when the event is not specified.

Add s390 specific __perf_sdt_arg_parse_op_s390() function to support
-architecture calls to arch_sdt_arg_parse_op() for s390.
The warning disappears.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Jan Polensky <japo@linux.ibm.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tools build: Correct link flags for libopenssl

The perf static build reports that the BPF skeleton is disabled due to
the missing libopenssl feature.

Use PKG_CONFIG to determine the link flags for libopenssl. Add
"--static" to the PKG_CONFIG command for static linking.

Fixes: 7678523109d1 ("tools/build: Add a feature test for libopenssl")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Merge tag 'perf-tools-fixes-for-v7.0-2-2026-03-23' into perf-tools-next

To get the various fixes for v7.0.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>

tools headers: Synchronize linux/build_bug.h with the kernel sources

To pick up the changes in:

  6ffd853b0b10e1e2 ("build_bug.h: correct function parameters names in kernel-doc")

That just add some comments, addressing this perf tools build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/linux/build_bug.h include/linux/build_bug.h

Please take a look at tools/include/uapi/README for further info on this
synchronization process.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources

To pick the changes in:

  e2ffe85b6d2bb778 ("KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM")

That just rebuilds kvm-stat.c on x86, no change in functionality.

This silences these perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools headers UAPI: Sync linux/kvm.h with the kernel sources

To pick the changes in:

  da142f3d373a6dda ("KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlay")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the 'perf trace' ioctl syscall argument beautifiers.

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

tools arch x86: Sync the msr-index.h copy with the kernel sources

To pick up the changes from these csets:

  9073428bb204d921 ("x86/sev: Allow IBPB-on-Entry feature for SNP guests")

That cause no changes to tooling as it doesn't include a new MSR to be
captured by the tools/perf/trace/beauty/tracepoints/x86_msr.sh script.

Just silences this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h

Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf evlist: Improve default event for s390

Frame pointer callchains are not supported on s390 and dwarf
callchains are only supported on software events.

Switch the default event from the hardware 'cycles' event to the
software 'cpu-clock' or 'task-clock' on s390 if callchains are
enabled. Move some of the target initialization earlier in builtin-top
and builtin-record, so it is ready for use by evlist__new_default.

If frame pointer callchains are requested on s390 show a
warning. Modify the '-g' option of `perf top` and `perf record` to
default to dwarf callchains on s390.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf callchain: Refactor callchain option parsing

record_opts__parse_callchain is shared by builtin-record and
builtin-trace, it is declared in callchain.h. Move the declaration to
callchain.c for consistency with the header. In other cases make the
option callback a small static stub that then calls into callchain.c.

Make the no argument '-g' callchain option just a short-cut for
'--call-graph fp' so that there is consistency in how the arguments
are handled. This requires the const char* string to be strdup-ed in
__parse_callchain_report_opt. For consistency also make
parse_callchain_record use strdup and remove some unnecessary
casts. Also, be more explicit about the '-g' behavior if there is a
.perfconfig file setting.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf evsel: Constify option arguments to config functions

The options are used to configure the evsel but are not themselves
configured. Make the arguments const to better capture this.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf target: Constify simple check functions

Allow the target to be const in callers.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf evsel: Improve falling back from cycles

Switch to using evsel__match rather than comparing perf_event_attr
values, this is robust on hybrid architectures.
Ensure evsel->pmu matches the evsel->core.attr.
Remove exclude bits that get set in other fallback attempts when
switching the event.
Log the event name with modifiers when switching the event on fallback.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf dwarf-aux: Collect all variable locations for insn tracking

Previously, only the first DWARF location entry was collected for each
variable. This was based on the assumption that instruction tracking
could reconstruct the remaining state. However, variables may have
different locations across different address ranges, and relying solely
on instruction tracking can miss valid type information.

Change __die_collect_vars_cb() to iterate over all location entries
using dwarf_getlocations() in a loop. This ensures that variables with
multiple location ranges are properly tracked, improving type coverage.

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf annotate-data: Use DWARF location ranges to preserve reg state

When a function call occurs, caller-saved registers are typically
invalidated since the callee may clobber them. However, DWARF debug info
provides location ranges that indicate exactly where a variable is valid
in a register.

Track the DWARF location range end address in type_state_reg and use it
to determine if a caller-saved register should be preserved across a
call. If the current call address is within the DWARF-specified lifetime
of the variable, keep the register state valid instead of invalidating
it.

This improves type annotation for code where the compiler knows a
register value survives across calls (e.g., when the callee is known not
to clobber certain registers or when the value is reloaded after the
call at the same logical location).

Changes:
- Add `end` and `has_range` fields to die_var_type to capture DWARF
location range information
- Add `lifetime_active` and `lifetime_end` fields to type_state_reg
- Check location lifetime before invalidating caller-saved registers

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf annotate-data: Invalidate caller-saved regs for all calls

Previously, the x86 call handler returned early without invalidating
caller-saved registers when the call target symbol could not be resolved
(func == NULL). This violated the ABI which requires caller-saved
registers to be considered clobbered after any call instruction.

Fix this by:
1. Always invalidating caller-saved registers for any call instruction
(except __fentry__ which preserves registers)
2. Using dl->ops.target.name as fallback when func->name is unavailable,
allowing return type lookup for more call targets

This is a conservative change that may reduce type coverage for indirect
calls (e.g., callq *(%rax)) where we cannot determine the return type
but it ensures correctness.

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf annotate-data: Add invalidate_reg_state() helper for x86

Add a helper function to consistently invalidate register state instead
of field assignments. This ensures kind, ok, and copied_from are all
properly cleared when a register becomes invalid.

The helper sets:
- kind = TSR_KIND_INVALID
- ok = false
- copied_from = -1

Replace all invalidation patterns with calls to this helper. No
functional change and this removes some incorrect annotations that were
caused by incomplete invalidation (e.g. a obsolete copied_from from an
invalidated register).

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf annotate-data: Handle global variable access with const register

When a register holds a constant value (TSR_KIND_CONST) and is used with
a negative offset, treat it as a potential global variable access
instead of falling through to CFA (frame) handling.

This fixes cases like array indexing with computed offsets:

movzbl -0x7d72725a(%rax), %eax # array[%rax]

Where %rax contains a computed index and the negative offset points to a
global array. Previously this fell through to the CFA path which doesn't
handle global variables, resulting in "no type information".

The fix redirects such accesses to check_kernel which calls
get_global_var_type() to resolve the type from the global variable
cache. This is only done for kernel DSOs since the pattern relies on
kernel-specific global variable resolution. We could also treat
registers with integer types to the global variable path, but this
requires more changes.

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf annotate-data: Collect global variables without name

Previously, global_var__collect() required get_global_var_info() to
succeed (i.e., the variable must have a symbol name) before caching a
global variable. This prevented variables that exist in DWARF but lack
symbol table coverage from being cached.

Remove the symbol table requirement since DW_OP_addr already provides
the variable's address directly from DWARF. The symbol table lookup is
now optional to obtain the variable name when available.

Also remove the var_offset != 0 check, which was intended to skip
variables where the access address doesn't match the symbol start. The
symbol table lookup is now optional and I found removing this check has
no effect on the annotation results for both kernel and userspace
programs.

Test results show improved annotation coverage especially for userspace
programs with RIP-relative addressing instructions.

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf dwarf-aux: Handle array types in die_get_member_type

When a struct member is an array type, die_get_member_type() would stop
iterating since array types weren't handled in the loop. This caused
accesses to array elements within structs to not resolve properly.

Add array type handling by resolving the array to its element type and
calculating the offset within an element using modulo arithmetic

This improves type annotation coverage for struct members that are
arrays.

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf annotate-data: Improve type comparison from different scopes

When comparing types from different scopes, first compare their type
offsets. A larger offset means the field belongs to an outer (enclosing)
struct. This helps resolve cases where a pointer is found in an inner
scope, but a struct containing that pointer exists in an outer scope.
Previously, is_better_type would prefer the pointer type, but the struct
type is actually more complete and should be chosen.

Prefer types from outer scopes when is_better_type cannot determine a
better type. This is a heuristic for the case `struct A { struct B; }`
where A and B have the same size but I think in most cases A is in the
outer scope and should be preferred.

Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf dwarf-aux: Skip check_variable for variable lookup

Both die_find_variable_by_reg and die_find_variable_by_addr call
match_var_offset which already performs sufficient checking and type
matching. The additional check_variable call is redundant, and its
need_pointer logic is only a heuristic. Since DWARF encodes accurate
type information, which match_var_offset verifies, skipping
check_variable improves both coverage and accuracy.

Return the matched type from die_find_variable_by_reg and
die_find_variable_by_addr via the existing `type` field in
find_var_data, removing the need for check_variable in
find_data_type_die.

Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf dwarf-aux: Preserve typedefs in match_var_offset

Preserve typedefs in match_var_offset to match the results by
__die_get_real_type. Also move the (offset == 0) branch after the
is_pointer check to ensure the correct type is used, fixing cases where
an incorrect pointer type was chosen when the access offset was 0.

Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf dwarf-aux: Add die_get_pointer_type to get pointer types

When a variable type is wrapped in typedef/qualifiers, callers may need
to first resolve it to the underlying DW_TAG_pointer_type or
DW_TAG_array_type. A simple tag check is not enough and directly calling
__die_get_real_type() can stop at the pointer type (e.g. typedef ->
pointer) instead of the pointee type.

Add die_get_pointer_type() helper that follows typedef/qualifier chains
and returns the underlying pointer DIE. Use it in annotate-data.c so
pointer checks and dereference work correctly for typedef'd pointers.

Signed-off-by: Zecheng Li <zli94@ncsu.edu>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf symbol: Add RISCV case in get_plt_sizes

According to RISC-V psABI specification, the PLT (Program Linkage Table)
has the following layout:
- The first PLT entry occupies two 16-byte entries (32 bytes total)
- Subsequent PLT entries take up 16 bytes each

This aligns with the binutils-gdb implementation which defines the same
PLT sizes for RISC-V architecture.

Update get_plt_sizes() to set plt_header_size=32 and plt_entry_size=16
for EM_RISCV, matching the architecture's standard ABI.

Since AARCH64, LOONGARCH, and RISCV have the same PLT size definition,
they are merged together.

Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-riscv.c
Signed-off-by: Chen Pei <cp0613@linux.alibaba.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Remove duplicate include of stat.h

Remove duplicate inclusion of stat.h in intel-tpebs.c to clean up
redundant code.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf tools: Remove duplicate include of debug.h

Remove duplicate inclusion of debug.h in symbol.c to clean up redundant
code.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf: tools: cs-etm: Enhance raw Coresight trace debug display

When compiling perf with CORESIGHT=1, an additional build option may
be used: CSTRACE_RAW=1, which will cause the CoreSight formatted trace
frames to be printed out during a perf --dump command. This is useful
when investigating issues with trace generation, decode or possible
data corruption.

e.g. for ETMv4 trace source into a formatted ETR sink a dump -

. ... CoreSight ETMV4I Trace data: size 0x28c150 bytes
Idx:0; ID:14; I_ASYNC : Alignment Synchronisation.
Idx:12; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 };
                Decoder Sync point TINFO
Idx:17; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
                Addr=0x0000000000000000;

becomes with CSTRACE_RAW=1:

. ... CoreSight ETMV4I Trace data: size 0x28c150 bytes
Frame Data; Index 0; ID_DATA[0x14]; 00 00 00 00 00 00 00 00
                                    00 00 00 80 01 01
Idx:0; ID:14; I_ASYNC : Alignment Synchronisation.
Frame Data; Index 16; ID_DATA[0x14]; 00 9d 00 00 00 00 00 00
                                     00 00 04 85 57 08 f2
Idx:12; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 };
                Decoder Sync point TINFO
Idx:17; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
                Addr=0x0000000000000000;

CSTRACE_RAW=1 has no effect on ETE + TRBE trace as there is no trace
formatting in the TRBE buffer.

This patch enhances the output so that for each packet the individual
bytes associated with the packet are printed.

Thus for ETMv4 this now becomes:

. ... CoreSight ETMV4I Trace data: size 0x28c150 bytes
Frame Data; Index 0; ID_DATA[0x14]; 00 00 00 00 00 00 00 00
                                    00 00 00 80 01 01
Idx:0; ID:14;[0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x80];
             I_ASYNC : Alignment Synchronisation.
Frame Data; Index 16; ID_DATA[0x14]; 00 9d 00 00 00 00 00 00
                                     00 00 04 85 57 08 f2
Idx:12; ID:14; [0x01 0x01 0x00 ]; I_TRACE_INFO : Trace Info.; INFO=0x0
                                  { CC.0 }; Decoder Sync point TINFO
Idx:17; ID:14; [0x9d 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ];
               I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
       Addr=0x0000000000000000;

ETE trace output changes from:

Idx:0; ID:14; I_ASYNC : Alignment Synchronisation.
Idx:12; ID:14; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0, TSTATE.0 };
               Decoder Sync point TINFO
Idx:15; ID:14; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
               Addr=0xFFFF80007CF7F56C;
becoming:

Idx:0; ID:14;[0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x80];
              I_ASYNC : Alignment Synchronisation.
Idx:12; ID:14; [0x01 0x01 0x00 ]; I_TRACE_INFO : Trace Info.; INFO=0x0
               { CC.0, TSTATE.0 }; Decoder Sync point TINFO
Idx:15; ID:14; [0x9d 0x5b 0x7a 0xf7 0x7c 0x00 0x80 0xff 0xff ];
               I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
       Addr=0xFFFF80007CF7F56C;

Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Mike Leach <mike.leach@arm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

perf: tools: cs-etm: Fix print issue for Coresight debug in ETE/TRBE trace

Building perf with CORESIGHT=1 and the optional CSTRACE_RAW=1 enables
additional debug printing of raw trace data when using command:-
perf report --dump.

This raw trace prints the CoreSight formatted trace frames, which may be
used to investigate suspected issues with trace quality / corruption /
decode.

These frames are not present in ETE + TRBE trace.
This fix removes the unnecessary call to print these frames.

This fix also rationalises implementation - original code had helper
function that unnecessarily repeated initialisation calls that had
already been made.

Due to an addtional fault with the OpenCSD library, this call when ETE/TRBE
are being decoded will cause a segfault in perf. This fix also prevents
that problem for perf using older (<= 1.8.0 version) OpenCSD libraries.

Fixes: 68ffe3902898 ("perf tools: Add decoder mechanic to support dumping trace data")
Reported-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Mike Leach <mike.leach@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Merge remote-tracking branch 'torvalds/master' into perf-tools

To pick up some extra files that need to be sync'ed with the kernel
sources to try and reduce the number of PRs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge tag 'soc-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
"The firmware drivers for ARM SCMI, FF-A and the Tee subsystem, as
  well as the reset controller and cache controller subsystem all see
  small bugfixes for reference ounting errors, ABI correctness, and
  NULL pointer dereferences.

  Similarly, there are multiple reference counting fixes in drivers/soc/
  for vendor specific drivers (rockchips, microchip), while the
  freescale drivers get a fix for a race condition and error handling.

  The devicetree fixes for Rockchips and NXP got held up, so for
  the moment there is only Renesas fixing problesm with SD card
  initialization, a boot hang on one board and incorrect descriptions
  for interrupts and clock registers on some SoCs. The Microchip
  polarfire gets a dts fix for a boot time warning.

  A defconfig fix avoids a warning about a conflicting assignment"

* tag 'soc-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (21 commits)
  ARM: multi_v7_defconfig: Drop duplicate CONFIG_TI_PRUSS=m
  firmware: arm_scmi: Spelling s/mulit/multi/, s/currenly/currently/
  firmware: arm_scmi: Fix NULL dereference on notify error path
  firmware: arm_scpi: Fix device_node reference leak in probe path
  firmware: arm_ffa: Remove vm_id argument in ffa_rxtx_unmap()
  arm64: dts: renesas: r8a78000: Fix out-of-range SPI interrupt numbers
  arm64: dts: renesas: rzg3s-smarc-som: Set bypass for Versa3 PLL2
  arm64: dts: renesas: r9a09g087: Fix CPG register region sizes
  arm64: dts: renesas: r9a09g077: Fix CPG register region sizes
  arm64: dts: renesas: r9a09g057: Remove wdt{0,2,3} nodes
  arm64: dts: renesas: rzv2-evk-cn15-sd: Add ramp delay for SD0 regulator
  arm64: dts: renesas: rzt2h-n2h-evk: Add ramp delay for SD0 card regulator
  tee: shm: Remove refcounting of kernel pages
  reset: rzg2l-usbphy-ctrl: Check pwrrdy is valid before using it
  soc: fsl: cpm1: qmc: Fix error check for devm_ioremap_resource() in qmc_qe_init_resources()
  soc: fsl: qbman: fix race condition in qman_destroy_fq
  soc: rockchip: grf: Add missing of_node_put() when returning
  cache: ax45mp: Fix device node reference leak in ax45mp_cache_init()
  cache: starfive: fix device node leak in starlink_cache_init()
  riscv: dts: microchip: add can resets to mpfs
  ...