]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
perf mem: Document new output fields (op, cache, mem, dtlb, snoop)
authorNamhyung Kim <namhyung@kernel.org>
Tue, 10 Jun 2025 00:57:42 +0000 (17:57 -0700)
committerArnaldo Carvalho de Melo <acme@redhat.com>
Mon, 16 Jun 2025 17:05:10 +0000 (14:05 -0300)
Update the documentation of the new fields with examples and caveats.

Also update the related documentation for AMD IBS.

Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250610005742.2173050-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools/perf/Documentation/perf-amd-ibs.txt
tools/perf/Documentation/perf-mem.txt

index 55f80beae0375a72e58fd4bdb571b29b018ac45a..54854993576070c3e5e6260fef4c85417f90528f 100644 (file)
@@ -171,23 +171,48 @@ Below is a simple example of the perf mem tool.
        # perf mem report
 
 A normal perf mem report output will provide detailed memory access profile.
-However, it can also be aggregated based on output fields. For example:
-
-       # perf mem report -F mem,sample,snoop
-       Samples: 3M of event 'ibs_op//', Event count (approx.): 23524876
-       Memory access                                 Samples  Snoop
-       N/A                                           1903343  N/A
-       L1 hit                                        1056754  N/A
-       L2 hit                                          75231  N/A
-       L3 hit                                           9496  HitM
-       L3 hit                                           2270  N/A
-       RAM hit                                          8710  N/A
-       Remote node, same socket RAM hit                 3241  N/A
-       Remote core, same node Any cache hit             1572  HitM
-       Remote core, same node Any cache hit              514  N/A
-       Remote node, same socket Any cache hit           1216  HitM
-       Remote node, same socket Any cache hit            350  N/A
-       Uncached hit                                       18  N/A
+New output fields will show related access info together.  For example:
+
+       # perf mem report -F overhead,cache,snoop,comm
+       ...
+       # Samples: 92K of event 'ibs_op//'
+       # Total weight : 531104
+       #
+       #           ---------- Cache -----------  --- Snoop ----
+       # Overhead       L1     L2 L1-buf  Other     HitM  Other  Command
+       # ........  ............................  ..............  ..........
+       #
+           76.07%     5.8%  35.7%   0.0%  34.6%    23.3%  52.8%  cc1
+            5.79%     0.2%   0.0%   0.0%   5.6%     0.1%   5.7%  make
+            5.78%     0.1%   4.4%   0.0%   1.2%     0.5%   5.3%  gcc
+            5.33%     0.3%   3.9%   0.0%   1.1%     0.2%   5.2%  as
+            5.00%     0.1%   3.8%   0.0%   1.0%     0.3%   4.7%  sh
+            1.56%     0.1%   0.1%   0.0%   1.4%     0.6%   0.9%  ld
+            0.28%     0.1%   0.0%   0.0%   0.2%     0.1%   0.2%  pkg-config
+            0.09%     0.0%   0.0%   0.0%   0.1%     0.0%   0.1%  git
+            0.03%     0.0%   0.0%   0.0%   0.0%     0.0%   0.0%  rm
+            ...
+
+Also, it can be aggregated based on various memory access info using the
+sort keys.  For example:
+
+       # perf mem report -s mem,snoop
+       ...
+       # Samples: 92K of event 'ibs_op//'
+       # Total weight : 531104
+       # Sort order   : mem,snoop
+       #
+       # Overhead       Samples  Memory access                            Snoop
+       # ........  ............  .......................................  ............
+       #
+           47.99%          1509  L2 hit                                   N/A
+           25.08%           338  core, same node Any cache hit            HitM
+           10.24%         54374  N/A                                      N/A
+            6.77%         35938  L1 hit                                   N/A
+            6.39%           101  core, same node Any cache hit            N/A
+            3.50%            69  RAM hit                                  N/A
+            0.03%           158  LFB/MAB hit                              N/A
+            0.00%             2  Uncached hit                             N/A
 
 Please refer to their man page for more detail.
 
index 965e73d377724607ebd21f54d452aba174697852..4d164836d0943119e987a6e613aebad3458f74d3 100644 (file)
@@ -119,6 +119,22 @@ REPORT OPTIONS
        And the default sort keys are changed to local_weight, mem, sym, dso,
        symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
 
+-F::
+--fields=::
+       Specify output field - multiple keys can be specified in CSV format.
+       Please see linkperf:perf-report[1] for details.
+
+       In addition to the default fields, 'perf mem report' will provide the
+       following fields to break down sample periods.
+
+       - op: operation in the sample instruction (load, store, prefetch, ...)
+       - cache: location in CPU cache (L1, L2, ...) where the sample hit
+       - mem: location in memory or other places the sample hit
+       - dtlb: location in Data TLB (L1, L2) where the sample hit
+       - snoop: snoop result for the sampled data access
+
+       Please take a look at the OUTPUT FIELD SELECTION section for caveats.
+
 -T::
 --type-profile::
        Show data-type profile result instead of code symbols.  This requires
@@ -156,6 +172,40 @@ but one sample with weight 180 and the other with weight 20:
   90%   [k] memcpy
   10%   [.] strcmp
 
+OUTPUT FIELD SELECTION
+----------------------
+"perf mem report" adds a number of new output fields specific to data source
+information in the sample.  Some of them have the same name with the existing
+sort keys ("mem" and "snoop").  So unlike other fields and sort keys, they'll
+behave differently when it's used by -F/--fields or -s/--sort.
+
+Using those two as output fields will aggregate samples altogether and show
+breakdown.
+
+  $ perf mem report -F mem,snoop
+  ...
+  # ------ Memory -------  --- Snoop ----
+  #     RAM Uncach  Other     HitM  Other
+  # .....................  ..............
+  #
+       3.5%   0.0%  96.5%    25.1%  74.9%
+
+But using the same name for sort keys will aggregate samples for each type
+separately.
+
+  $ perf mem report -s mem,snoop
+  # Overhead       Samples  Memory access                            Snoop
+  # ........  ............  .......................................  ............
+  #
+      47.99%          1509  L2 hit                                   N/A
+      25.08%           338  core, same node Any cache hit            HitM
+      10.24%         54374  N/A                                      N/A
+       6.77%         35938  L1 hit                                   N/A
+       6.39%           101  core, same node Any cache hit            N/A
+       3.50%            69  RAM hit                                  N/A
+       0.03%           158  LFB/MAB hit                              N/A
+       0.00%             2  Uncached hit                             N/A
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]