Ian Rogers [Thu, 1 May 2025 07:00:03 +0000 (00:00 -0700)]
perf symbol-minimal: Fix double free in filename__read_build_id
Running the "perf script task-analyzer tests" with address sanitizer
showed a double free:
```
FAIL: "test_csv_extended_times" Error message: "Failed to find required string:'Out-Out;'."
=================================================================
==19190==ERROR: AddressSanitizer: attempting double-free on 0x50b000017b10 in thread T0:
#0 0x55da9601c78a in free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a)
#1 0x55da96640c63 in filename__read_build_id tools/perf/util/symbol-minimal.c:221:2
0x50b000017b10 is located 0 bytes inside of 112-byte region [0x50b000017b10,0x50b000017b80)
freed by thread T0 here:
#0 0x55da9601ce40 in realloc (perf+0x260e40) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a)
#1 0x55da96640ad6 in filename__read_build_id tools/perf/util/symbol-minimal.c:204:10
previously allocated by thread T0 here:
#0 0x55da9601ca23 in malloc (perf+0x260a23) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a)
#1 0x55da966407e7 in filename__read_build_id tools/perf/util/symbol-minimal.c:181:9
The buf_size if always set to phdr->p_filesz, but that may be 0
causing a free and realloc to return NULL. This is treated in
filename__read_build_id like a failure and the buffer is freed again.
To avoid this problem only grow buf, meaning the buf_size will never
be 0. This also reduces the number of memory (re)allocations.
Fixes: b691f64360ecec49 ("perf symbols: Implement poor man's ELF parser") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250501070003.22251-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
# perf report --header | grep cpudesc
# cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
# perf mem report -F overhead,dtlb,dso --stdio | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2K of event 'cycles:P'
# Total weight : 2637
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
#
# ---------- D-TLB -----------
# Overhead L1-Hit L2-Hit Miss Other Shared Object
# ........ ............................ .................................
#
77.47% 18.4% 0.1% 0.6% 80.9% [kernel.kallsyms]
5.61% 36.5% 0.7% 1.4% 61.5% libxul.so
2.77% 39.7% 0.0% 12.3% 47.9% libc.so.6
2.01% 34.0% 1.9% 1.9% 62.3% libglib-2.0.so.0.8400.1
1.93% 31.4% 2.0% 2.0% 64.7% [amdgpu]
1.63% 48.8% 0.0% 0.0% 51.2% [JIT] tid 60168
1.14% 3.3% 0.0% 0.0% 96.7% [vdso]
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:46 +0000 (13:55 -0700)]
perf mem: Add 'cache' and 'memory' output fields
This is a breakdown of perf_mem_data_src.mem_lvl_num. But it's also
divided into two parts because the combination is bigger than 8.
Since there are many entries for different cache levels, 'cache' field
focuses on them. I generalized buffers like LFB, MAB and MHB to L1-buf
and L2-buf.
The rest goes to 'memory' field which can be RAM, CXL, PMEM, IO, etc.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:45 +0000 (13:55 -0700)]
perf hist: Hide unused mem stat columns
Some mem_stat types don't use all 8 columns. And there are cases only
samples in certain kinds of mem_stat types are available only. For that
case hide columns which has no samples.
# grep "model name" -m1 /proc/cpuinfo
model name : AMD Ryzen 9 9950X3D 16-Core Processo
# perf mem report -F overhead,op,comm --stdio
# Total Lost Samples: 0
#
# Samples: 2K of event 'cycles:P'
# Total weight : 2637
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
#
# ------ Mem Op -------
# Overhead Load Store Other Command
# ........ ..................... ...............
#
61.02% 14.4% 25.5% 60.1% swapper
5.61% 26.4% 13.5% 60.1% Isolated Web Co
5.50% 21.4% 29.7% 49.0% perf
4.74% 27.2% 15.2% 57.6% gnome-shell
4.63% 33.6% 11.5% 54.9% mdns_service
4.29% 28.3% 12.4% 59.3% ptyxis
2.16% 24.6% 19.3% 56.1% DOM Worker
0.99% 23.1% 34.6% 42.3% firefox
0.72% 26.3% 15.8% 57.9% IPC I/O Parent
0.61% 12.5% 12.5% 75.0% kworker/u130:20
0.61% 37.5% 18.8% 43.8% podman
0.57% 33.3% 6.7% 60.0% Timer
0.53% 14.3% 7.1% 78.6% KMS thread
0.49% 30.8% 7.7% 61.5% kworker/u130:3-
0.46% 41.7% 33.3% 25.0% IPDL Background
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:44 +0000 (13:55 -0700)]
perf mem: Add 'op' output field
This is an actual example of the he_mem_stat based sample breakdown. It
uses 'mem_op' field of union perf_mem_data_src which means memory
operations.
It'd have basically 'load' or 'store' which can be useful if PMU doesn't
have separate events for them like IBS or SPE. In addition, there's an
entry in case load and store happen at the same time. Also adds entries
for prefetching and execution.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:43 +0000 (13:55 -0700)]
perf hist: Implement output fields for mem stats
This is a preparation for later changes to support mem_stat output. The
new fields will need two lines for the header - the first line will show
type of mem stat and the second line will show the name of each item
which is returned by mem_stat_name().
Each element in the mem_stat array will be printed in percentage for the
hist_entry and their sum would be 100%.
Add new output field dimension only for SORT_MODE__MEM using mem_stat.
To handle possible name conflict with existing sort keys, move the order
of checking output field dimensions after the sort dimensions when it
looks for sort keys.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:42 +0000 (13:55 -0700)]
perf hist: Basic support for mem_stat accounting
Add a logic to account he->mem_stat based on mem_stat_type in hists.
Each mem_stat entry will have different meaning based on the type so the
index in the array is calculated at runtime using the corresponding
value in the sample.data_src.
Still hists has no mem_stat_types yet so this code won't work for now.
Later hists->mem_stat_types will be allocated based on what users want
in the output actually.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:41 +0000 (13:55 -0700)]
perf hist: Add struct he_mem_stat
The 'struct he_mem_stat' is to save detailed information about memory
instruction. It'll be used to show breakdown of various data from
PERF_SAMPLE_DATA_SRC. Note that this structure is generic and the
contents will be different depending on actual data it'll use later.
The information about the actual data will be saved in 'struct hists'
and its length is in nr_mem_stats. This commit just adds ground works
and does nothing since hists->nr_mem_stats is 0 for now.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 20:55:40 +0000 (13:55 -0700)]
perf hist: Support multi-line header
This is a preparation to support multi-line headers in 'perf mem report'.
Normal sort keys and output fields that don't have contents for multi-
line will print the header string at the last line only.
As we don't use multi-line headers normally, it should not have any
changes in the output.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 30 Apr 2025 18:03:21 +0000 (11:03 -0700)]
perf hist: Remove output field from sort-list properly
When it removes an output format for cancelled children or latency, it
should delete itself from the sort list as well. Otherwise assertion
in fmt_free() will fire.
Super simple test to check that at least we're not segfaulting when
trying to use 'perf report --hierarchy', more subtests should be added
to make sure the output is the expected one.
This is being merged right before a fix for that that this test detects:
# perf test hierarchy
83: perf report --hierarchy : FAILED!
# perf test -v hierarchy
--- start ---
test child forked, pid 102242
perf report --hierarchy
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.025 MB /tmp/perf-test-report.HX0N85TlPq/perf-report-hierarchy-perf.data (6 samples) ]
perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed.
/home/acme/libexec/perf-core/tests/shell/perf-report-hierarchy.sh: line 34: 102250 Aborted (core dumped) perf report --hierarchy > /dev/null
--- Cleaning up ---
---- end(-1) ----
83: perf report --hierarchy : FAILED!
#
Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/20250430180321.736939-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Tue, 29 Apr 2025 03:59:38 +0000 (03:59 +0000)]
perf test amd ibs: Add sample period unit test
IBS Fetch and IBS Op PMUs has various constraints on supported sample
periods. Add perf unit tests to test those.
Running it in parallel with other tests causes intermittent failures.
Mark it exclusive to force it to run sequentially. Sample output on a
Zen5 machine:
Without kernel fixes:
$ sudo ./perf test -vv 112
112: AMD IBS sample period:
--- start ---
test child forked, pid 8774
Using CPUID AuthenticAMD-26-2-1
IBS config tests:
-----------------
Fetch PMU tests:
0xffff : Ok (nr samples: 1078)
0x1000 : Ok (nr samples: 17030)
0xff : Ok (nr samples: 41068)
0x1 : Ok (nr samples: 40543)
0x0 : Ok
0x10000 : Ok
Op PMU tests:
0x0 : Ok
0x1 : Fail
0x8 : Fail
0x9 : Ok (nr samples: 40543)
0xf : Ok (nr samples: 40543)
0x1000 : Ok (nr samples: 18736)
0xffff : Ok (nr samples: 1168)
0x10000 : Ok
0x100000 : Fail (nr samples: 14)
0xf00000 : Fail (nr samples: 1)
0xf0ffff : Fail (nr samples: 1)
0x1f0ffff : Fail (nr samples: 1)
0x7f0ffff : Fail (nr samples: 0)
0x8f0ffff : Ok
0x17f0ffff : Ok
IBS L3MissOnly test: (takes a while)
--------------------
Fetch L3MissOnly: Fail (nr_samples: 1213)
Op L3MissOnly: Ok (nr_samples: 1193)
---- end(-1) ----
112: AMD IBS sample period : FAILED!
With kernel fixes:
$ sudo ./perf test -vv 112
112: AMD IBS sample period:
--- start ---
test child forked, pid 6939
Using CPUID AuthenticAMD-26-2-1
IBS config tests:
-----------------
Fetch PMU tests:
0xffff : Ok (nr samples: 969)
0x1000 : Ok (nr samples: 15540)
0xff : Ok (nr samples: 40555)
0x1 : Ok (nr samples: 40543)
0x0 : Ok
0x10000 : Ok
Op PMU tests:
0x0 : Ok
0x1 : Ok
0x8 : Ok
0x9 : Ok (nr samples: 40543)
0xf : Ok (nr samples: 40543)
0x1000 : Ok (nr samples: 19156)
0xffff : Ok (nr samples: 1169)
0x10000 : Ok
0x100000 : Ok (nr samples: 1151)
0xf00000 : Ok (nr samples: 76)
0xf0ffff : Ok (nr samples: 73)
0x1f0ffff : Ok (nr samples: 33)
0x7f0ffff : Ok (nr samples: 10)
0x8f0ffff : Ok
0x17f0ffff : Ok
IBS sample period constraint tests:
-----------------------------------
Fetch PMU test:
freq 0, sample_freq 0: Ok
freq 0, sample_freq 1: Ok
freq 0, sample_freq 15: Ok
freq 0, sample_freq 16: Ok (nr samples: 1203)
freq 0, sample_freq 17: Ok (nr samples: 1604)
freq 0, sample_freq 143: Ok (nr samples: 1604)
freq 0, sample_freq 144: Ok (nr samples: 1604)
freq 0, sample_freq 145: Ok (nr samples: 1604)
freq 0, sample_freq 1234: Ok (nr samples: 1604)
freq 0, sample_freq 4103: Ok (nr samples: 1343)
freq 0, sample_freq 65520: Ok (nr samples: 2254)
freq 0, sample_freq 65535: Ok (nr samples: 2136)
freq 0, sample_freq 65552: Ok (nr samples: 1158)
freq 0, sample_freq 8388607: Ok (nr samples: 257)
freq 0, sample_freq 268435455: Ok (nr samples: 8)
freq 1, sample_freq 0: Ok
freq 1, sample_freq 1: Ok (nr samples: 4)
freq 1, sample_freq 15: Ok (nr samples: 4)
freq 1, sample_freq 16: Ok (nr samples: 4)
freq 1, sample_freq 17: Ok (nr samples: 4)
freq 1, sample_freq 143: Ok (nr samples: 5)
freq 1, sample_freq 144: Ok (nr samples: 5)
freq 1, sample_freq 145: Ok (nr samples: 5)
freq 1, sample_freq 1234: Ok (nr samples: 8)
freq 1, sample_freq 4103: Ok (nr samples: 34)
freq 1, sample_freq 65520: Ok (nr samples: 458)
freq 1, sample_freq 65535: Ok (nr samples: 628)
freq 1, sample_freq 65552: Ok (nr samples: 396)
freq 1, sample_freq 8388607: Ok
Op PMU test:
freq 0, sample_freq 0: Ok
freq 0, sample_freq 1: Ok
freq 0, sample_freq 15: Ok
freq 0, sample_freq 16: Ok
freq 0, sample_freq 17: Ok
freq 0, sample_freq 143: Ok
freq 0, sample_freq 144: Ok (nr samples: 1604)
freq 0, sample_freq 145: Ok (nr samples: 1604)
freq 0, sample_freq 1234: Ok (nr samples: 1604)
freq 0, sample_freq 4103: Ok (nr samples: 1604)
freq 0, sample_freq 65520: Ok (nr samples: 2250)
freq 0, sample_freq 65535: Ok (nr samples: 2158)
freq 0, sample_freq 65552: Ok (nr samples: 2296)
freq 0, sample_freq 8388607: Ok (nr samples: 243)
freq 0, sample_freq 268435455: Ok (nr samples: 6)
freq 1, sample_freq 0: Ok
freq 1, sample_freq 1: Ok (nr samples: 4)
freq 1, sample_freq 15: Ok (nr samples: 4)
freq 1, sample_freq 16: Ok (nr samples: 4)
freq 1, sample_freq 17: Ok (nr samples: 4)
freq 1, sample_freq 143: Ok (nr samples: 4)
freq 1, sample_freq 144: Ok (nr samples: 5)
freq 1, sample_freq 145: Ok (nr samples: 4)
freq 1, sample_freq 1234: Ok (nr samples: 6)
freq 1, sample_freq 4103: Ok (nr samples: 27)
freq 1, sample_freq 65520: Ok (nr samples: 542)
freq 1, sample_freq 65535: Ok (nr samples: 550)
freq 1, sample_freq 65552: Ok (nr samples: 552)
freq 1, sample_freq 8388607: Ok
IBS ioctl() tests:
------------------
Fetch PMU tests
ioctl(period = 0x0 ): Ok
ioctl(period = 0x1 ): Ok
ioctl(period = 0xf ): Ok
ioctl(period = 0x10 ): Ok
ioctl(period = 0x11 ): Ok
ioctl(period = 0x1f ): Ok
ioctl(period = 0x20 ): Ok
ioctl(period = 0x80 ): Ok
ioctl(period = 0x8f ): Ok
ioctl(period = 0x90 ): Ok
ioctl(period = 0x91 ): Ok
ioctl(period = 0x100 ): Ok
ioctl(period = 0xfff0 ): Ok
ioctl(period = 0xffff ): Ok
ioctl(period = 0x10000 ): Ok
ioctl(period = 0x1fff0 ): Ok
ioctl(period = 0x1fff5 ): Ok
ioctl(freq = 0x0 ): Ok
ioctl(freq = 0x1 ): Ok
ioctl(freq = 0xf ): Ok
ioctl(freq = 0x10 ): Ok
ioctl(freq = 0x11 ): Ok
ioctl(freq = 0x1f ): Ok
ioctl(freq = 0x20 ): Ok
ioctl(freq = 0x80 ): Ok
ioctl(freq = 0x8f ): Ok
ioctl(freq = 0x90 ): Ok
ioctl(freq = 0x91 ): Ok
ioctl(freq = 0x100 ): Ok
Op PMU tests
ioctl(period = 0x0 ): Ok
ioctl(period = 0x1 ): Ok
ioctl(period = 0xf ): Ok
ioctl(period = 0x10 ): Ok
ioctl(period = 0x11 ): Ok
ioctl(period = 0x1f ): Ok
ioctl(period = 0x20 ): Ok
ioctl(period = 0x80 ): Ok
ioctl(period = 0x8f ): Ok
ioctl(period = 0x90 ): Ok
ioctl(period = 0x91 ): Ok
ioctl(period = 0x100 ): Ok
ioctl(period = 0xfff0 ): Ok
ioctl(period = 0xffff ): Ok
ioctl(period = 0x10000 ): Ok
ioctl(period = 0x1fff0 ): Ok
ioctl(period = 0x1fff5 ): Ok
ioctl(freq = 0x0 ): Ok
ioctl(freq = 0x1 ): Ok
ioctl(freq = 0xf ): Ok
ioctl(freq = 0x10 ): Ok
ioctl(freq = 0x11 ): Ok
ioctl(freq = 0x1f ): Ok
ioctl(freq = 0x20 ): Ok
ioctl(freq = 0x80 ): Ok
ioctl(freq = 0x8f ): Ok
ioctl(freq = 0x90 ): Ok
ioctl(freq = 0x91 ): Ok
ioctl(freq = 0x100 ): Ok
IBS freq (negative) tests:
--------------------------
freq 1, sample_freq 200000: Ok
IBS L3MissOnly test: (takes a while)
--------------------
Fetch L3MissOnly: Ok (nr_samples: 1301)
Op L3MissOnly: Ok (nr_samples: 1590)
---- end(0) ----
112: AMD IBS sample period : Ok
Committer notes:
Avoid using PAGE_SIZE as that define is also in sys/user.h
Make it a variable not to call sysconf() multiple times.
Also cast func to void * when passing it as the first arg to memcpy to
avoid this with some versions of clang:
arch/x86/tests/amd-ibs-period.c:81:3: error: no matching function for call to 'memcpy'
memcpy(func, insn1, sizeof(insn1));
^~~~~~
/usr/include/string.h:27:7: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *' for 1st argument
void *memcpy (void *__restrict, const void *__restrict, size_t);
^
/usr/include/fortify/string.h:40:27: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *const' for 1st argument
_FORTIFY_FN(memcpy) void *memcpy(void * _FORTIFY_POS0 __od,
^
arch/x86/tests/amd-ibs-period.c:87:3: error: no matching function for call to 'memcpy'
This one, for instance:
Alpine clang version 19.1.4
Target: x86_64-alpine-linux-musl
Thread model: posix
InstalledDir: /usr/lib/llvm19/bin
Configuration file: /etc/clang19/x86_64-alpine-linux-musl.cfg
System configuration file directory: /etc/clang19
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-5-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
# grep CONFIG_RUST /boot/config-6.14.3-300.fc42.x86_64
CONFIG_RUSTC_VERSION=108600
CONFIG_RUST_IS_AVAILABLE=y
CONFIG_RUSTC_LLVM_VERSION=200101
CONFIG_RUSTC_HAS_COERCE_POINTEE=y
CONFIG_RUST=y
CONFIG_RUSTC_VERSION_TEXT="rustc 1.86.0 (05f9846f8 2025-03-31) (Fedora 1.86.0-1.fc42)"
CONFIG_RUST_FW_LOADER_ABSTRACTIONS=y
CONFIG_RUST_PHYLIB_ABSTRACTIONS=y
# CONFIG_RUST_DEBUG_ASSERTIONS is not set
CONFIG_RUST_OVERFLOW_CHECKS=y
# CONFIG_RUST_BUILD_ASSERT_ALLOW is not set
#
Looking at the reason for the failure:
# perf test -v vmlinux |& grep ^ERR
ERR : 0xffffffff99efc7d0: __pfx__RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ not on kallsyms
ERR : 0xffffffff99efc7e0: _RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ not on kallsyms
#
But:
# grep -w u /proc/kallsyms ffffffff99efc7d0 u __pfx__RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_ ffffffff99efc7e0 u _RNCINvNtNtNtCsf5tcb0XGUW4_4core4iter8adapters3map12map_try_foldjNtCsagR6JbSOIa9_12drm_panic_qr7VersionuINtNtNtBa_3ops12control_flow11ControlFlowB10_ENcB10_0NCINvNvNtNtNtB8_6traits8iterator8Iterator4find5checkB10_NCNvMB12_B10_13from_segments0E0E0B12_
#
The test checks that "vmlinux symtab matches kallsyms", so it finds those two
symbols in vmlinux:
"u" The symbol is a unique global symbol. This is a GNU extension to the
standard set of ELF symbol bindings. For such a symbol the dynamic
linker will make sure that in the entire process there is just one
symbol with this name and type in use.
So lets consider 'u' symbols in /proc/kallsyms when loading it to cover this case.
Fedora:40 shows this as a 'l' symbol, so consider that as well.
With this patch 'perf test 1' is happy again:
# perf test vmlinux
1: vmlinux symtab matches kallsyms : Ok
#
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/aBE_n0PGl3g6h-cS@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jakub Brnak [Mon, 24 Mar 2025 14:45:23 +0000 (15:45 +0100)]
perf test probe_vfs_getname: Skip if no suitable line detected
In some cases when calling function add_probe_vfs_getname, line number
can't be detected by 'perf probe -L getname_flags':
78 atomic_set(&result->refcnt, 1);
// one of the following lines should have line number
// but sometimes it does not because of optimization
result->uptr = filename;
result->aname = NULL;
81 audit_getname(result);
To prevent false failures, skip the affected tests if no suitable line
numbers can be detected.
Signed-off-by: Jakub Brnak <jbrnak@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20250324144523.597557-1-jbrnak@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 1 Apr 2025 06:30:55 +0000 (23:30 -0700)]
perf lock contention: Symbolize zone->lock using BTF
The struct zone is embedded in struct pglist_data which can be allocated
for each NUMA node early in the boot process. As it's not a slab object
nor a global lock, this was not symbolized.
Since the zone->lock is often contended, it'd be nice if we can
symbolize it. On NUMA systems, node_data array will have pointers for
struct pglist_data. By following the pointer, it can calculate the
address of each zone and its lock using BTF. On UMA, it can just use
contig_page_data and its zones.
The following example shows the zone lock contention at the end.
$ sudo ./perf lock con -abl -E 5 -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.038 [sec]
contended total wait max wait avg wait address symbol
5167 18.17 ms 10.27 us 3.52 us ffff953340052d00 &kmem_cache_node (spinlock)
38 11.75 ms 465.49 us 309.13 us ffff95334060c480 &sock_inode_cache (spinlock)
3916 10.13 ms 10.43 us 2.59 us ffff953342aecb40 &kmem_cache_node (spinlock)
2963 10.02 ms 13.75 us 3.38 us ffff9533d2344098 &kmalloc-rnd-08-2k (spinlock)
216 5.05 ms 99.49 us 23.39 us ffff9542bf7d65d0 zone_lock (spinlock)
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20250401063055.7431-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20250326044001.3503432-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 26 Mar 2025 04:40:00 +0000 (21:40 -0700)]
perf trace: Implement syscall summary in BPF
When -s/--summary option is used, it doesn't need (augmented) arguments
of syscalls. Let's skip the augmentation and load another small BPF
program to collect the statistics in the kernel instead of copying the
data to the ring-buffer to calculate the stats in userspace. This will
be much more light-weight than the existing approach and remove any lost
events.
Let's add a new option --bpf-summary to control this behavior. I cannot
make it default because there's no way to get e_machine in the BPF which
is needed for detecting different ABIs like 32-bit compat mode.
No functional changes intended except for no more LOST events. :)
Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250326044001.3503432-1-namhyung@kernel.org
[ Added fixup sent from Namhyung in response to my report to make it also dependent on CONFIG_TRACE ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Junhao He [Fri, 18 Apr 2025 07:08:12 +0000 (15:08 +0800)]
MAINTAINERS: Add hisilicon PMU JSON events under its entry
The all hisilicon PMU JSON events were missing to be listed there.
Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Junhao He <hejunhao3@huawei.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250418070812.3771441-4-hejunhao3@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Junhao He [Fri, 18 Apr 2025 07:08:11 +0000 (15:08 +0800)]
perf vendor events arm64: Drop hip08 PublicDescription if same as BriefDescription
If BriefDescription and PublicDescription are the same, only
BriefDescription is needed. It will be used for both long and short
format outputs.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Junhao He <hejunhao3@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250418070812.3771441-3-hejunhao3@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Junhao He [Fri, 18 Apr 2025 07:08:10 +0000 (15:08 +0800)]
perf vendor events arm64: Fill up Desc field for Hisi hip08 hha pmu
In the same PMU, when some JSON events have the "BriefDescription" field
populated while others do not, the cmp_sevent() function will split these
two types of events into separate groups. As a result, when using perf
list to display events, the two types of events cannot be grouped together
in the output.
before patch:
$ perf list pmu
...
uncore hha:
hisi_sccl1_hha2/sdir-hit/
hisi_sccl1_hha2/sdir-lookup/
...
uncore hha:
edir-hit
[Count of The number of HHA E-Dir hit operations. Unit: hisi_sccl1_hha2]
...
after patch:
$ perf list pmu
...
uncore hha:
edir-hit
[Count of The number of HHA E-Dir hit operations. Unit: hisi_sccl1_hha2]
sdir-hit
[Count of The number of HHA S-Dir hit operations. Unit: hisi_sccl1_hha2]
sdir-lookup
[Count of the number of HHA S-Dir lookup operations. Unit: hisi_sccl1_hha2]
...
Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Junhao He <hejunhao3@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250418070812.3771441-2-hejunhao3@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 10 Apr 2025 17:36:21 +0000 (10:36 -0700)]
perf bench evlist-open-close: Reduce scope of 2 variables
Make 2 global variables local. Reduces ELF binary size by removing
relocations. For a no flags build, the perf binary size is reduced by
4,144 bytes on x86-64.
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Hao Ge <gehao@kylinos.cn> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tengda Wu <wutengda@huaweicloud.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20250410173631.1713627-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 10 Apr 2025 17:36:20 +0000 (10:36 -0700)]
perf tests record: Cleanup improvements
Remove the script output file. Add a trap debug message. Minor style
consistency changes.
Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250410173631.1713627-2-irogers@google.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Tengda Wu <wutengda@huaweicloud.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Hao Ge <gehao@kylinos.cn> Cc: James Clark <james.clark@linaro.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390x KVM and z/VM machines the CPU Measurement Facility is not
available. Events cycles and instructions do not exist. Running above
tests on s390 KVM and z/VM guests always fail with this error:
# ./perf test 84 86
84: perf stat JSON output linter : FAILED!
86: perf stat STD output linter : FAILED!
#
Ian Rogers [Wed, 23 Apr 2025 05:03:58 +0000 (22:03 -0700)]
perf tool_pmu: Fix aggregation on duration_time
evsel__count_has_error() fails counters when the enabled or running time
are 0. The duration_time event reads 0 when the cpu_map_idx != 0 to
avoid aggregating time over CPUs. Change the enable and running time
to always have a ratio of 100% so that evsel__count_has_error won't
fail.
Before:
```
$ sudo /tmp/perf/perf stat --per-core -a -M UNCORE_FREQ sleep 1
Note: the metric reads the value a different way and isn't impacted.
Fixes: 240505b2d0adcdc8 ("perf tool_pmu: Factor tool events into their own PMU") Reported-by: Stephane Eranian <eranian@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20250423050358.94310-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 31 Mar 2025 07:37:22 +0000 (00:37 -0700)]
perf hist: Allow custom output fields in hierarchy mode
Now it can handle multiple output fields and sort keys in separate
levels, so it should be ok to use it in the hierarchy mode. This
allows fully customized output format.
Namhyung Kim [Mon, 31 Mar 2025 07:37:21 +0000 (00:37 -0700)]
perf hist: Set levels in output_field_add()
It turns out that the output fields didn't consider the hierarchy mode
and put all the fields in the same level. To support hierarchy, each
non-output field should be in a separate level.
Pass a pointer to level to output_field_add() and make it increase the
level when it sees non-output fields.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250331073722.4695-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 31 Mar 2025 07:37:19 +0000 (00:37 -0700)]
perf hist: Remove formats in hierarchy when cancel children
This is to support hierarchy options with custom output fields.
Currently perf_hpp__cancel_cumulate() only removes accumulated
overhead and latency fields from the global perf_hpp_list.
This is not used in the hierarchy mode because each evsel's hist
has its own separate hpp_list. So it needs to remove the fields
from the lists too. Pass evlist to the function so that it can
iterate the evsels.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250331073722.4695-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 14 Apr 2025 17:41:34 +0000 (10:41 -0700)]
perf record: Retirement latency cleanup in evsel__config
'perf record' will fail with retirement latency events as the open
doesn't do a perf_event_open system call.
Use evsel__config() to set up such events for recording by removing the
flag and enabling sample weights - the sample weights containing the
retirement latency.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This change makes those values available within an alias/event within a
PMU and saves them into the evsel at event parse time.
When no TPEBS data is available the default values are substituted in
for TMA metrics that are using retirement latency events - currently
just those on graniterapids.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 14 Apr 2025 17:41:30 +0000 (10:41 -0700)]
perf intel-tpebs: Don't close record on read
Factor sending record control fd code into its own function.
Rather than killing the record process send it a ping when reading.
Timeouts were witnessed if done too frequently, so only ping for the
first tpebs events.
Don't kill the record command send it a stop command.
As close isn't reliably called also close on evsel__exit.
Add extra checks on the record being terminated to avoid warnings.
Adjust the locking as needed and incorporate extra -Wthread-safety
checks.
Check to do six 500ms poll timeouts when sending commands, rather than
the larger 3000ms, to allow the record process terminating to be better
witnessed.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 14 Apr 2025 17:41:27 +0000 (10:41 -0700)]
perf intel-tpebs: Refactor tpebs_results list
evsel names and metric-ids are used for matching but this can be
problematic, for example, multiple occurrences of the same retirement
latency event become a single event for the record.
Change the name of the record events so they are unique and reflect the
evsel of the retirement latency event that opens them (the retirement
latency event's evsel address is embedded within them).
This allows an evsel based close to close the event when the retirement
latency event is closed.
This is important as 'perf stat' has an evlist and the session listen to
the record events has an evlist, knowing which event should remove the
tpebs_retire_lat can't be tied to an evlist list as there is more than
1, so closing which evlist should cause the tpebs to stop?
Using the evsel and the last one out doing the tpebs_stop is cleaner.
Committer notes:
Fix the build on 32-bit systems by using unsigned long when converting
pointers to integers instead of uint64_t. Fixes:
20 4.97 debian:experimental-x-mips : FAIL gcc version 14.2.0 (Debian 14.2.0-13)
util/intel-tpebs.c: In function 'tpebs_retire_lat__find':
util/intel-tpebs.c:377:21: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
377 | if ((uint64_t)t->evsel == num)
| ^
cc1: all warnings being treated as errors
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 14 Apr 2025 17:41:21 +0000 (10:41 -0700)]
perf intel-tpebs: Rename tpebs_start to evsel__tpebs_open
Try to add more consistency to evsel by having tpebs_start renamed to
evsel__tpebs_open, passing the evsel that is being opened. The unusual
behavior of evsel__tpebs_open opening all events on the evlist is kept
and will be cleaned up further in later patches. The comments are
cleaned up as tpebs_start isn't called from evlist.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Mon, 14 Apr 2025 17:41:20 +0000 (10:41 -0700)]
perf intel-tpebs: Simplify tpebs_cmd
No need to dynamically allocate when there is 1. tpebs_pid duplicates
tpebs_cmd.pid, so remove. Use 0 as the uninitialized value (PID == 0
is reserved for the kernel) rather than -1.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250414174134.3095492-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update event topics, addition of PDIST counter into descriptions,
metrics to be generated from the TMA spreadsheet and other small clean
ups. The use of the spreadsheet for conversion has added thresholds.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-29-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update events from v1.12 to v1.13.
Update event topics, addition of PDIST counter into descriptions,
metrics to be generated from the TMA spreadsheet and other small clean
ups.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-23-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add retirement latencies for use in place of retirement latency events.
Update events from v1.06 to v1.08.
Update event topics, addition of PDIST counter into descriptions,
metrics to be generated from the TMA spreadsheet and other small clean
ups.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update events from v1.05 to v1.07. Update event topics, addition of
PDIST counter into descriptions, metrics to be generated from the TMA
spreadsheet and other small clean ups.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update events from v1.07 to v1.08. Update event topics, addition of
PIST counter into descriptions, metrics to be generated from the TMA
spreadsheet and other small clean ups.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update events from v1.28 to v1.29. Update event topics, addition of
PDIST counter into descriptions, metrics to be generated from the TMA
spreadsheet and other small clean ups.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update events from v1.28 to v1.29. Update event topics, addition of
PDIST counter into descriptions, metrics to be generated from the TMA
spreadsheet and other small clean ups.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andreas Färber <afaerber@suse.de> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250328175006.43110-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
Sync with upstream to pick up the perf-tools patches that updates the
header files copies to address the check_header.sh warnings. There are
also some libbpf updates, better pick those to be on the same page with
libbpf since perf uses it in various places.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'edac_urgent_for_v6.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Two fixes to the AMD translation library for the MI300 side of things:
- Use the row[13] bit when calculating the memory row to retire
- Mask the physical row address in order to avoid creating duplicate
error records"
* tag 'edac_urgent_for_v6.15_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS/AMD/FMPM: Get masked address
RAS/AMD/ATL: Include row[13] bit in row retirement
Merge tag 'vfs-6.15-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix NULL pointer dereference in virtiofs
- Fix slab OOB access in hfs/hfsplus
- Only create /proc/fs/netfs when CONFIG_PROC_FS is set
- Fix getname_flags() to initialize pointer correctly
- Convert dentry flags to enum
- Don't allow datadir without lowerdir in overlayfs
- Use namespace_{lock,unlock} helpers in dissolve_on_fput() instead of
plain namespace_sem so unmounted mounts are properly cleaned up
- Skip unnecessary ifs_block_is_uptodate check in iomap
- Remove an unused forward declaration in overlayfs
- Fix devpts uid/gid handling after converting to the new mount api
- Fix afs_dynroot_readdir() to not use the RCU read lock
- Fix mount_setattr() and open_tree_attr() to not pointlessly do path
lookup or walk the mount tree if no mount option change has been
requested
* tag 'vfs-6.15-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: use namespace_{lock,unlock} in dissolve_on_fput()
iomap: skip unnecessary ifs_block_is_uptodate check
fs: Fix filename init after recent refactoring
netfs: Only create /proc/fs/netfs with CONFIG_PROC_FS
mount: ensure we don't pointlessly walk the mount tree
dcache: convert dentry flag macros to enum
afs: Fix afs_dynroot_readdir() to not use the RCU read lock
hfs/hfsplus: fix slab-out-of-bounds in hfs_bnode_read_key
virtiofs: add filesystem context source name check
devpts: Fix type for uid and gid params
ovl: remove unused forward declaration
ovl: don't allow datadir only
Merge tag 'perf-tools-fixes-for-v6.15-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"A couple of fixes and the usual tooling header updates:
- fix a build error on ARM64 when libunwind is requested
- fix an infinite loop with branch stack on AMD Zen3
- sync tooling headers with the kernel source"
* tag 'perf-tools-fixes-for-v6.15-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf tools: Remove evsel__handle_error_quirks()
perf libunwind arm64: Fix missing close parens in an if statement
tools headers: Update the arch/x86/lib/memset_64.S copy with the kernel sources
tools headers: Update the x86 headers with the kernel sources
tools headers: Update the linux/unaligned.h copy with the kernel sources
tools headers: Update the uapi/asm-generic/mman-common.h copy with the kernel sources
tools headers: Update the uapi/linux/prctl.h copy with the kernel sources
tools headers: Update the syscall table with the kernel sources
tools headers: Update the VFS headers with the kernel sources
tools headers: Update the uapi/linux/perf_event.h copy with the kernel sources
tools headers: Update the socket headers with the kernel sources
tools headers: Update the KVM headers with the kernel sources
Merge tag 'erofs-for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Properly handle errors when file-backed I/O fails
- Fix compilation issues on ARM platform (arm-linux-gnueabi)
- Fix parsing of encoded extents
- Minor cleanup
* tag 'erofs-for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: remove duplicate code
erofs: fix encoded extents handling
erofs: add __packed annotation to union(__le16..)
erofs: set error to bio if file-backed IO fails
Merge tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A few more miscellaneous ext4 bug fixes and cleanups including some
syzbot failures and fixing a stale file handing refeencing an inode
previously used as a regular file, but which has been deleted and
reused as an ea_inode would result in ext4 erroneously considering
this a case of fs corruption"
* tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix off-by-one error in do_split
ext4: make block validity check resistent to sb bh corruption
ext4: avoid -Wflex-array-member-not-at-end warning
Documentation: ext4: Add fields to ext4_super_block documentation
ext4: don't treat fhandle lookup of ea_inode as FS corruption
Syzkaller detected a use-after-free issue in ext4_insert_dentry that was
caused by out-of-bounds access due to incorrect splitting in do_split.
BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847
The following loop is located right above 'if' statement.
for (i = count-1; i >= 0; i--) {
/* is more than half of this entry in 2nd half of the block? */
if (size + map[i].size/2 > blocksize/2)
break;
size += map[i].size;
move++;
}
'i' in this case could go down to -1, in which case sum of active entries
wouldn't exceed half the block size, but previous behaviour would also do
split in half if sum would exceed at the very last block, which in case of
having too many long name files in a single block could lead to
out-of-bounds access and following use-after-free.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org Fixes: 5872331b3d91 ("ext4: fix potential negative array index in do_split()") Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ojaswin Mujoo [Fri, 28 Mar 2025 06:24:52 +0000 (11:54 +0530)]
ext4: make block validity check resistent to sb bh corruption
Block validity checks need to be skipped in case they are called
for journal blocks since they are part of system's protected
zone.
Currently, this is done by checking inode->ino against
sbi->s_es->s_journal_inum, which is a direct read from the ext4 sb
buffer head. If someone modifies this underneath us then the
s_journal_inum field might get corrupted. To prevent against this,
change the check to directly compare the inode with journal->j_inode.
**Slight change in behavior**: During journal init path,
check_block_validity etc might be called for journal inode when
sbi->s_journal is not set yet. In this case we now proceed with
ext4_inode_block_valid() instead of returning early. Since systems zones
have not been set yet, it is okay to proceed so we can perform basic
checks on the blocks.
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
So, with these changes, fix the following warning:
fs/ext4/mballoc.c:3041:40: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/Z-SF97N3AxcIMlSi@kspp Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tom Vierjahn [Mon, 24 Mar 2025 22:09:30 +0000 (23:09 +0100)]
Documentation: ext4: Add fields to ext4_super_block documentation
Documentation and implementation of the ext4 super block have
slightly diverged: Padding has been removed in order to make room for
new fields that are still missing in the documentation.
Add the new fields s_encryption_level, s_first_error_errorcode,
s_last_error_errorcode to the documentation of the ext4 super block.
Fixes: f542fbe8d5e8 ("ext4 crypto: reserve codepoints used by the ext4 encryption feature") Fixes: 878520ac45f9 ("ext4: save the error code which triggered an ext4_error() in the superblock") Signed-off-by: Tom Vierjahn <tom.vierjahn@acm.org> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20250324221004.5268-1-tom.vierjahn@acm.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Hide get_vm_area() from MMUless builds
The function get_vm_area() is not defined when CONFIG_MMU is not
defined. Hide that function within #ifdef CONFIG_MMU.
- Fix output of synthetic events when they have dynamic strings
The print fmt of the synthetic event's format file use to have "%.*s"
for dynamic size strings even though the user space exported
arguments had only __get_str() macro that provided just a nul
terminated string. This was fixed so that user space could parse this
properly.
But the reason that it had "%.*s" was because internally it provided
the maximum size of the string as one of the arguments. The fix that
replaced "%.*s" with "%s" caused the trace output (when the kernel
reads the event) to write "(efault)" as it would now read the length
of the string as "%s".
As the string provided is always nul terminated, there's no reason
for the internal code to use "%.*s" anyway. Just remove the length
argument to match the "%s" that is now in the format.
- Fix the ftrace subops hash logic of the manager ops hash
The function_graph uses the ftrace subops code. The subops code is a
way to have a single ftrace_ops registered with ftrace to determine
what functions will call the ftrace_ops callback. More than one user
of function graph can register a ftrace_ops with it. The function
graph infrastructure will then add this ftrace_ops as a subops with
the main ftrace_ops it registers with ftrace. This is because the
functions will always call the function graph callback which in turn
calls the subops ftrace_ops callbacks.
The main ftrace_ops must add a callback to all the functions that the
subops want a callback from. When a subops is registered, it will
update the main ftrace_ops hash to include the functions it wants.
This is the logic that was broken.
The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
all the functions in the filter_hash but not in the notrace_hash are
attached by ftrace. The original logic would have the main ftrace_ops
filter_hash be a union of all the subops filter_hashes and the main
notrace_hash would be a intersect of all the subops filter hashes.
But this was incorrect because the notrace hash depends on the
filter_hash it is associated to and not the union of all
filter_hashes.
Instead, when a subops is added, just include all the functions of
the subops hash that are in its filter_hash but not in its
notrace_hash. The main subops hash should not use its notrace hash,
unless all of its subops hashes have an empty filter_hash (which
means to attach to all functions), and then, and only then, the main
ftrace_ops notrace hash can be the intersect of all the subops
hashes.
This not only fixes the bug, but also simplifies the code.
- Add a selftest to better test the subops filtering
Add a selftest that would catch the bug fixed by the above change.
- Fix extra newline printed in function tracing with retval
The function parameter code changed the output logic slightly and
called print_graph_retval() and also printed a newline. The
print_graph_retval() also prints a newline which caused blank lines
to be printed in the function graph tracer when retval was added.
This caused one of the selftests to fail if retvals were enabled.
Instead remove the new line output from print_graph_retval() and have
the callers always print the new line so that it doesn't have to do
special logic if it calls print_graph_retval() or not.
- Fix out-of-bound memory access in the runtime verifier
When rv_is_container_monitor() is called on the last entry on the
link list it references the next entry, which is the list head and
causes an out-of-bound memory access.
* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix out-of-bound memory access in rv_is_container_monitor()
ftrace: Do not have print_graph_retval() add a newline
tracing/selftest: Add test to better test subops filtering of function graph
ftrace: Fix accounting of subop hashes
ftrace: Properly merge notrace hashes
tracing: Do not add length to print format in synthetic events
tracing: Hide get_vm_area() from MMUless builds