]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - releases/4.14.117/perf-x86-amd-update-generic-hardware-cache-events-for-family-17h.patch
Fixes for 5.10
[thirdparty/kernel/stable-queue.git] / releases / 4.14.117 / perf-x86-amd-update-generic-hardware-cache-events-for-family-17h.patch
1 From 0e3b74e26280f2cf8753717a950b97d424da6046 Mon Sep 17 00:00:00 2001
2 From: Kim Phillips <kim.phillips@amd.com>
3 Date: Thu, 2 May 2019 15:29:47 +0000
4 Subject: perf/x86/amd: Update generic hardware cache events for Family 17h
5 MIME-Version: 1.0
6 Content-Type: text/plain; charset=UTF-8
7 Content-Transfer-Encoding: 8bit
8
9 From: Kim Phillips <kim.phillips@amd.com>
10
11 commit 0e3b74e26280f2cf8753717a950b97d424da6046 upstream.
12
13 Add a new amd_hw_cache_event_ids_f17h assignment structure set
14 for AMD families 17h and above, since a lot has changed. Specifically:
15
16 L1 Data Cache
17
18 The data cache access counter remains the same on Family 17h.
19
20 For DC misses, PMCx041's definition changes with Family 17h,
21 so instead we use the L2 cache accesses from L1 data cache
22 misses counter (PMCx060,umask=0xc8).
23
24 For DC hardware prefetch events, Family 17h breaks compatibility
25 for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
26 Prefetch DC Fills."
27
28 L1 Instruction Cache
29
30 PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
31 compatible on Family 17h.
32
33 For prefetches, we remove the erroneous PMCx04B assignment which
34 counts how many software data cache prefetch load instructions were
35 dispatched.
36
37 LL - Last Level Cache
38
39 Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
40 on Family 17h, where the last level cache is L3. L3 counters
41 can be accessed using the existing AMD Uncore driver.
42
43 Data TLB
44
45 On Intel machines, data TLB accesses ("dTLB-loads") are assigned
46 to counters that count load/store instructions retired. This
47 is inconsistent with instruction TLB accesses, where Intel
48 implementations report iTLB misses that hit in the STLB.
49
50 Ideally, dTLB-loads would count higher level dTLB misses that hit
51 in lower level TLBs, and dTLB-load-misses would report those
52 that also missed in those lower-level TLBs, therefore causing
53 a page table walk. That would be consistent with instruction
54 TLB operation, remove the redundancy between dTLB-loads and
55 L1-dcache-loads, and prevent perf from producing artificially
56 low percentage ratios, i.e. the "0.01%" below:
57
58 42,550,869 L1-dcache-loads
59 41,591,860 dTLB-loads
60 4,802 dTLB-load-misses # 0.01% of all dTLB cache hits
61 7,283,682 L1-dcache-stores
62 7,912,392 dTLB-stores
63 310 dTLB-store-misses
64
65 On AMD Families prior to 17h, the "Data Cache Accesses" counter is
66 used, which is slightly better than load/store instructions retired,
67 but still counts in terms of individual load/store operations
68 instead of TLB operations.
69
70 So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
71 to a counter for L1 dTLB misses that hit in the L2 dTLB, and
72 "dTLB-load-misses" to a counter for L1 DTLB misses that caused
73 L2 DTLB misses and therefore also caused page table walks. This
74 results in a much more accurate view of data TLB performance:
75
76 60,961,781 L1-dcache-loads
77 4,601 dTLB-loads
78 963 dTLB-load-misses # 20.93% of all dTLB cache hits
79
80 Note that for all AMD families, data loads and stores are combined
81 in a single accesses counter, so no 'L1-dcache-stores' are reported
82 separately, and stores are counted with loads in 'L1-dcache-loads'.
83
84 Also note that the "% of all dTLB cache hits" string is misleading
85 because (a) "dTLB cache": although TLBs can be considered caches for
86 page tables, in this context, it can be misinterpreted as data cache
87 hits because the figures are similar (at least on Intel), and (b) not
88 all those loads (technically accesses) technically "hit" at that
89 hardware level. "% of all dTLB accesses" would be more clear/accurate.
90
91 Instruction TLB
92
93 On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
94 STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
95 the STLB and completed a page table walk.
96
97 For AMD Family 17h and above, for 'iTLB-loads' we replace the
98 erroneous instruction cache fetches counter with PMCx084
99 "L1 ITLB Miss, L2 ITLB Hit".
100
101 For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
102 L2 ITLB Miss", but set a 0xff umask because without it the event
103 does not get counted.
104
105 Branch Predictor (BPU)
106
107 PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.
108
109 Node Level Events
110
111 Family 17h does not have a PMCx0e9 counter, and corresponding counters
112 have not been made available publicly, so for now, we mark them as
113 unsupported for Families 17h and above.
114
115 Reference:
116
117 "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
118 Released 7/17/2018, Publication #56255, Revision 3.03:
119 https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf
120
121 [ mingo: tidied up the line breaks. ]
122 Signed-off-by: Kim Phillips <kim.phillips@amd.com>
123 Cc: <stable@vger.kernel.org> # v4.9+
124 Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
125 Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
126 Cc: Borislav Petkov <bp@alien8.de>
127 Cc: H. Peter Anvin <hpa@zytor.com>
128 Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
129 Cc: Jiri Olsa <jolsa@redhat.com>
130 Cc: Linus Torvalds <torvalds@linux-foundation.org>
131 Cc: Martin Liška <mliska@suse.cz>
132 Cc: Namhyung Kim <namhyung@kernel.org>
133 Cc: Peter Zijlstra <peterz@infradead.org>
134 Cc: Pu Wen <puwen@hygon.cn>
135 Cc: Stephane Eranian <eranian@google.com>
136 Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
137 Cc: Thomas Gleixner <tglx@linutronix.de>
138 Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
139 Cc: Vince Weaver <vincent.weaver@maine.edu>
140 Cc: linux-kernel@vger.kernel.org
141 Cc: linux-perf-users@vger.kernel.org
142 Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors")
143 Signed-off-by: Ingo Molnar <mingo@kernel.org>
144 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
145
146 ---
147 arch/x86/events/amd/core.c | 111 +++++++++++++++++++++++++++++++++++++++++++--
148 1 file changed, 108 insertions(+), 3 deletions(-)
149
150 --- a/arch/x86/events/amd/core.c
151 +++ b/arch/x86/events/amd/core.c
152 @@ -116,6 +116,110 @@ static __initconst const u64 amd_hw_cach
153 },
154 };
155
156 +static __initconst const u64 amd_hw_cache_event_ids_f17h
157 + [PERF_COUNT_HW_CACHE_MAX]
158 + [PERF_COUNT_HW_CACHE_OP_MAX]
159 + [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
160 +[C(L1D)] = {
161 + [C(OP_READ)] = {
162 + [C(RESULT_ACCESS)] = 0x0040, /* Data Cache Accesses */
163 + [C(RESULT_MISS)] = 0xc860, /* L2$ access from DC Miss */
164 + },
165 + [C(OP_WRITE)] = {
166 + [C(RESULT_ACCESS)] = 0,
167 + [C(RESULT_MISS)] = 0,
168 + },
169 + [C(OP_PREFETCH)] = {
170 + [C(RESULT_ACCESS)] = 0xff5a, /* h/w prefetch DC Fills */
171 + [C(RESULT_MISS)] = 0,
172 + },
173 +},
174 +[C(L1I)] = {
175 + [C(OP_READ)] = {
176 + [C(RESULT_ACCESS)] = 0x0080, /* Instruction cache fetches */
177 + [C(RESULT_MISS)] = 0x0081, /* Instruction cache misses */
178 + },
179 + [C(OP_WRITE)] = {
180 + [C(RESULT_ACCESS)] = -1,
181 + [C(RESULT_MISS)] = -1,
182 + },
183 + [C(OP_PREFETCH)] = {
184 + [C(RESULT_ACCESS)] = 0,
185 + [C(RESULT_MISS)] = 0,
186 + },
187 +},
188 +[C(LL)] = {
189 + [C(OP_READ)] = {
190 + [C(RESULT_ACCESS)] = 0,
191 + [C(RESULT_MISS)] = 0,
192 + },
193 + [C(OP_WRITE)] = {
194 + [C(RESULT_ACCESS)] = 0,
195 + [C(RESULT_MISS)] = 0,
196 + },
197 + [C(OP_PREFETCH)] = {
198 + [C(RESULT_ACCESS)] = 0,
199 + [C(RESULT_MISS)] = 0,
200 + },
201 +},
202 +[C(DTLB)] = {
203 + [C(OP_READ)] = {
204 + [C(RESULT_ACCESS)] = 0xff45, /* All L2 DTLB accesses */
205 + [C(RESULT_MISS)] = 0xf045, /* L2 DTLB misses (PT walks) */
206 + },
207 + [C(OP_WRITE)] = {
208 + [C(RESULT_ACCESS)] = 0,
209 + [C(RESULT_MISS)] = 0,
210 + },
211 + [C(OP_PREFETCH)] = {
212 + [C(RESULT_ACCESS)] = 0,
213 + [C(RESULT_MISS)] = 0,
214 + },
215 +},
216 +[C(ITLB)] = {
217 + [C(OP_READ)] = {
218 + [C(RESULT_ACCESS)] = 0x0084, /* L1 ITLB misses, L2 ITLB hits */
219 + [C(RESULT_MISS)] = 0xff85, /* L1 ITLB misses, L2 misses */
220 + },
221 + [C(OP_WRITE)] = {
222 + [C(RESULT_ACCESS)] = -1,
223 + [C(RESULT_MISS)] = -1,
224 + },
225 + [C(OP_PREFETCH)] = {
226 + [C(RESULT_ACCESS)] = -1,
227 + [C(RESULT_MISS)] = -1,
228 + },
229 +},
230 +[C(BPU)] = {
231 + [C(OP_READ)] = {
232 + [C(RESULT_ACCESS)] = 0x00c2, /* Retired Branch Instr. */
233 + [C(RESULT_MISS)] = 0x00c3, /* Retired Mispredicted BI */
234 + },
235 + [C(OP_WRITE)] = {
236 + [C(RESULT_ACCESS)] = -1,
237 + [C(RESULT_MISS)] = -1,
238 + },
239 + [C(OP_PREFETCH)] = {
240 + [C(RESULT_ACCESS)] = -1,
241 + [C(RESULT_MISS)] = -1,
242 + },
243 +},
244 +[C(NODE)] = {
245 + [C(OP_READ)] = {
246 + [C(RESULT_ACCESS)] = 0,
247 + [C(RESULT_MISS)] = 0,
248 + },
249 + [C(OP_WRITE)] = {
250 + [C(RESULT_ACCESS)] = -1,
251 + [C(RESULT_MISS)] = -1,
252 + },
253 + [C(OP_PREFETCH)] = {
254 + [C(RESULT_ACCESS)] = -1,
255 + [C(RESULT_MISS)] = -1,
256 + },
257 +},
258 +};
259 +
260 /*
261 * AMD Performance Monitor K7 and later, up to and including Family 16h:
262 */
263 @@ -861,9 +965,10 @@ __init int amd_pmu_init(void)
264 x86_pmu.amd_nb_constraints = 0;
265 }
266
267 - /* Events are common for all AMDs */
268 - memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
269 - sizeof(hw_cache_event_ids));
270 + if (boot_cpu_data.x86 >= 0x17)
271 + memcpy(hw_cache_event_ids, amd_hw_cache_event_ids_f17h, sizeof(hw_cache_event_ids));
272 + else
273 + memcpy(hw_cache_event_ids, amd_hw_cache_event_ids, sizeof(hw_cache_event_ids));
274
275 return 0;
276 }