]> git.ipfire.org Git - people/teissler/ipfire-2.x.git/blob - src/patches/suse-2.6.27.25/patches.trace/tracepoints.patch
Updated xen patches taken from suse.
[people/teissler/ipfire-2.x.git] / src / patches / suse-2.6.27.25 / patches.trace / tracepoints.patch
1 From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
2 Subject: Kernel Tracepoints
3
4 Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers.
5 Allows complete typing verification by declaring both tracing statement inline
6 functions and probe registration/unregistration static inline functions within
7 the same macro "DEFINE_TRACE". No format string is required. See the
8 tracepoint Documentation and Samples patches for usage examples.
9
10 Taken from the documentation patch :
11
12 "A tracepoint placed in code provides a hook to call a function (probe) that you
13 can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
14 "off" (no probe is attached). When a tracepoint is "off" it has no effect,
15 except for adding a tiny time penalty (checking a condition for a branch) and
16 space penalty (adding a few bytes for the function call at the end of the
17 instrumented function and adds a data structure in a separate section). When a
18 tracepoint is "on", the function you provide is called each time the tracepoint
19 is executed, in the execution context of the caller. When the function provided
20 ends its execution, it returns to the caller (continuing from the tracepoint
21 site).
22
23 You can put tracepoints at important locations in the code. They are lightweight
24 hooks that can pass an arbitrary number of parameters, which prototypes are
25 described in a tracepoint declaration placed in a header file."
26
27 Addition and removal of tracepoints is synchronized by RCU using the
28 scheduler (and preempt_disable) as guarantees to find a quiescent state
29 (this is really RCU "classic"). The update side uses rcu_barrier_sched()
30 with call_rcu_sched() and the read/execute side uses
31 "preempt_disable()/preempt_enable()".
32
33 We make sure the previous array containing probes, which has been scheduled for
34 deletion by the rcu callback, is indeed freed before we proceed to the next
35 update. It therefore limits the rate of modification of a single tracepoint to
36 one update per RCU period. The objective here is to permit fast batch
37 add/removal of probes on _different_ tracepoints.
38
39 Changelog :
40 - Use #name ":" #proto as string to identify the tracepoint in the
41 tracepoint table. This will make sure not type mismatch happens due to
42 connexion of a probe with the wrong type to a tracepoint declared with
43 the same name in a different header.
44 - Add tracepoint_entry_free_old.
45 - Change __TO_TRACE to get rid of the 'i' iterator.
46
47 Masami Hiramatsu <mhiramat@redhat.com> :
48 Tested on x86-64.
49
50 Performance impact of a tracepoint : same as markers, except that it adds about
51 70 bytes of instructions in an unlikely branch of each instrumented function
52 (the for loop, the stack setup and the function call). It currently adds a
53 memory read, a test and a conditional branch at the instrumentation site (in the
54 hot path). Immediate values will eventually change this into a load immediate,
55 test and branch, which removes the memory read which will make the i-cache
56 impact smaller (changing the memory read for a load immediate removes 3-4 bytes
57 per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also
58 saves the d-cache hit).
59
60 About the performance impact of tracepoints (which is comparable to markers),
61 even without immediate values optimizations, tests done by Hideo Aoki on ia64
62 show no regression. His test case was using hackbench on a kernel where
63 scheduler instrumentation (about 5 events in code scheduler code) was added.
64
65
66 Quoting Hideo Aoki about Markers :
67
68 I evaluated overhead of kernel marker using linux-2.6-sched-fixes
69 git tree, which includes several markers for LTTng, using an ia64
70 server.
71
72 While the immediate trace mark feature isn't implemented on ia64,
73 there is no major performance regression. So, I think that we
74 don't have any issues to propose merging marker point patches
75 into Linus's tree from the viewpoint of performance impact.
76
77 I prepared two kernels to evaluate. The first one was compiled
78 without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
79
80 I downloaded the original hackbench from the following URL:
81 http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
82
83 I ran hackbench 5 times in each condition and calculated the
84 average and difference between the kernels.
85
86 The parameter of hackbench: every 50 from 50 to 800
87 The number of CPUs of the server: 2, 4, and 8
88
89 Below is the results. As you can see, major performance
90 regression wasn't found in any case. Even if number of processes
91 increases, differences between marker-enabled kernel and marker-
92 disabled kernel doesn't increase. Moreover, if number of CPUs
93 increases, the differences doesn't increase either.
94
95 Curiously, marker-enabled kernel is better than marker-disabled
96 kernel in more than half cases, although I guess it comes from
97 the difference of memory access pattern.
98
99
100 * 2 CPUs
101
102 Number of | without | with | diff | diff |
103 processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
104 --------------------------------------------------------------
105 50 | 4.811 | 4.872 | +0.061 | +1.27 |
106 100 | 9.854 | 10.309 | +0.454 | +4.61 |
107 150 | 15.602 | 15.040 | -0.562 | -3.6 |
108 200 | 20.489 | 20.380 | -0.109 | -0.53 |
109 250 | 25.798 | 25.652 | -0.146 | -0.56 |
110 300 | 31.260 | 30.797 | -0.463 | -1.48 |
111 350 | 36.121 | 35.770 | -0.351 | -0.97 |
112 400 | 42.288 | 42.102 | -0.186 | -0.44 |
113 450 | 47.778 | 47.253 | -0.526 | -1.1 |
114 500 | 51.953 | 52.278 | +0.325 | +0.63 |
115 550 | 58.401 | 57.700 | -0.701 | -1.2 |
116 600 | 63.334 | 63.222 | -0.112 | -0.18 |
117 650 | 68.816 | 68.511 | -0.306 | -0.44 |
118 700 | 74.667 | 74.088 | -0.579 | -0.78 |
119 750 | 78.612 | 79.582 | +0.970 | +1.23 |
120 800 | 85.431 | 85.263 | -0.168 | -0.2 |
121 --------------------------------------------------------------
122
123 * 4 CPUs
124
125 Number of | without | with | diff | diff |
126 processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
127 --------------------------------------------------------------
128 50 | 2.586 | 2.584 | -0.003 | -0.1 |
129 100 | 5.254 | 5.283 | +0.030 | +0.56 |
130 150 | 8.012 | 8.074 | +0.061 | +0.76 |
131 200 | 11.172 | 11.000 | -0.172 | -1.54 |
132 250 | 13.917 | 14.036 | +0.119 | +0.86 |
133 300 | 16.905 | 16.543 | -0.362 | -2.14 |
134 350 | 19.901 | 20.036 | +0.135 | +0.68 |
135 400 | 22.908 | 23.094 | +0.186 | +0.81 |
136 450 | 26.273 | 26.101 | -0.172 | -0.66 |
137 500 | 29.554 | 29.092 | -0.461 | -1.56 |
138 550 | 32.377 | 32.274 | -0.103 | -0.32 |
139 600 | 35.855 | 35.322 | -0.533 | -1.49 |
140 650 | 39.192 | 38.388 | -0.804 | -2.05 |
141 700 | 41.744 | 41.719 | -0.025 | -0.06 |
142 750 | 45.016 | 44.496 | -0.520 | -1.16 |
143 800 | 48.212 | 47.603 | -0.609 | -1.26 |
144 --------------------------------------------------------------
145
146 * 8 CPUs
147
148 Number of | without | with | diff | diff |
149 processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
150 --------------------------------------------------------------
151 50 | 2.094 | 2.072 | -0.022 | -1.07 |
152 100 | 4.162 | 4.273 | +0.111 | +2.66 |
153 150 | 6.485 | 6.540 | +0.055 | +0.84 |
154 200 | 8.556 | 8.478 | -0.078 | -0.91 |
155 250 | 10.458 | 10.258 | -0.200 | -1.91 |
156 300 | 12.425 | 12.750 | +0.325 | +2.62 |
157 350 | 14.807 | 14.839 | +0.032 | +0.22 |
158 400 | 16.801 | 16.959 | +0.158 | +0.94 |
159 450 | 19.478 | 19.009 | -0.470 | -2.41 |
160 500 | 21.296 | 21.504 | +0.208 | +0.98 |
161 550 | 23.842 | 23.979 | +0.137 | +0.57 |
162 600 | 26.309 | 26.111 | -0.198 | -0.75 |
163 650 | 28.705 | 28.446 | -0.259 | -0.9 |
164 700 | 31.233 | 31.394 | +0.161 | +0.52 |
165 750 | 34.064 | 33.720 | -0.344 | -1.01 |
166 800 | 36.320 | 36.114 | -0.206 | -0.57 |
167 --------------------------------------------------------------
168
169 Best regards,
170 Hideo
171
172
173 P.S. When I compiled the linux-2.6-sched-fixes tree on ia64, I
174 had to revert the following git commit since pteval_t is defined
175 on x86 only.
176
177 commit 8686f2b37e7394b51dd6593678cbfd85ecd28c65
178 Date: Tue May 6 15:42:40 2008 -0700
179
180 generic, x86, PAT: fix mprotect
181
182
183 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
184 Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
185 Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
186 CC: "Frank Ch. Eigler" <fche@redhat.com>
187 CC: 'Ingo Molnar' <mingo@elte.hu>
188 CC: 'Hideo AOKI' <haoki@redhat.com>
189 CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com>
190 CC: 'Steven Rostedt' <rostedt@goodmis.org>
191 CC: Alexander Viro <viro@zeniv.linux.org.uk>
192 CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
193 Acked-by: Jan Blunck <jblunck@suse.de>
194 ---
195 include/asm-generic/vmlinux.lds.h | 6
196 include/linux/module.h | 18 +
197 include/linux/tracepoint.h | 127 ++++++++++
198 init/Kconfig | 7
199 kernel/Makefile | 1
200 kernel/module.c | 67 +++++
201 kernel/tracepoint.c | 476 ++++++++++++++++++++++++++++++++++++++
202 7 files changed, 700 insertions(+), 2 deletions(-)
203
204 --- a/include/asm-generic/vmlinux.lds.h
205 +++ b/include/asm-generic/vmlinux.lds.h
206 @@ -52,7 +52,10 @@
207 . = ALIGN(8); \
208 VMLINUX_SYMBOL(__start___markers) = .; \
209 *(__markers) \
210 - VMLINUX_SYMBOL(__stop___markers) = .;
211 + VMLINUX_SYMBOL(__stop___markers) = .; \
212 + VMLINUX_SYMBOL(__start___tracepoints) = .; \
213 + *(__tracepoints) \
214 + VMLINUX_SYMBOL(__stop___tracepoints) = .;
215
216 #define RO_DATA(align) \
217 . = ALIGN((align)); \
218 @@ -61,6 +64,7 @@
219 *(.rodata) *(.rodata.*) \
220 *(__vermagic) /* Kernel version magic */ \
221 *(__markers_strings) /* Markers: strings */ \
222 + *(__tracepoints_strings)/* Tracepoints: strings */ \
223 } \
224 \
225 .rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
226 --- a/include/linux/module.h
227 +++ b/include/linux/module.h
228 @@ -16,6 +16,7 @@
229 #include <linux/kobject.h>
230 #include <linux/moduleparam.h>
231 #include <linux/marker.h>
232 +#include <linux/tracepoint.h>
233 #include <asm/local.h>
234
235 #include <asm/module.h>
236 @@ -332,6 +333,11 @@ struct module
237 unsigned int num_markers;
238 #endif
239
240 +#ifdef CONFIG_TRACEPOINTS
241 + struct tracepoint *tracepoints;
242 + unsigned int num_tracepoints;
243 +#endif
244 +
245 #ifdef CONFIG_MODULE_UNLOAD
246 /* What modules depend on me? */
247 struct list_head modules_which_use_me;
248 @@ -453,6 +459,9 @@ extern void print_modules(void);
249
250 extern void module_update_markers(void);
251
252 +extern void module_update_tracepoints(void);
253 +extern int module_get_iter_tracepoints(struct tracepoint_iter *iter);
254 +
255 #else /* !CONFIG_MODULES... */
256 #define EXPORT_SYMBOL(sym)
257 #define EXPORT_SYMBOL_GPL(sym)
258 @@ -557,6 +566,15 @@ static inline void module_update_markers
259 {
260 }
261
262 +static inline void module_update_tracepoints(void)
263 +{
264 +}
265 +
266 +static inline int module_get_iter_tracepoints(struct tracepoint_iter *iter)
267 +{
268 + return 0;
269 +}
270 +
271 #endif /* CONFIG_MODULES */
272
273 struct device_driver;
274 --- /dev/null
275 +++ b/include/linux/tracepoint.h
276 @@ -0,0 +1,127 @@
277 +#ifndef _LINUX_TRACEPOINT_H
278 +#define _LINUX_TRACEPOINT_H
279 +
280 +/*
281 + * Kernel Tracepoint API.
282 + *
283 + * See Documentation/tracepoint.txt.
284 + *
285 + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
286 + *
287 + * Heavily inspired from the Linux Kernel Markers.
288 + *
289 + * This file is released under the GPLv2.
290 + * See the file COPYING for more details.
291 + */
292 +
293 +#include <linux/types.h>
294 +#include <linux/rcupdate.h>
295 +
296 +struct module;
297 +struct tracepoint;
298 +
299 +struct tracepoint {
300 + const char *name; /* Tracepoint name */
301 + int state; /* State. */
302 + void **funcs;
303 +} __attribute__((aligned(8)));
304 +
305 +
306 +#define TPPROTO(args...) args
307 +#define TPARGS(args...) args
308 +
309 +#ifdef CONFIG_TRACEPOINTS
310 +
311 +/*
312 + * it_func[0] is never NULL because there is at least one element in the array
313 + * when the array itself is non NULL.
314 + */
315 +#define __DO_TRACE(tp, proto, args) \
316 + do { \
317 + void **it_func; \
318 + \
319 + rcu_read_lock_sched(); \
320 + it_func = rcu_dereference((tp)->funcs); \
321 + if (it_func) { \
322 + do { \
323 + ((void(*)(proto))(*it_func))(args); \
324 + } while (*(++it_func)); \
325 + } \
326 + rcu_read_unlock_sched(); \
327 + } while (0)
328 +
329 +/*
330 + * Make sure the alignment of the structure in the __tracepoints section will
331 + * not add unwanted padding between the beginning of the section and the
332 + * structure. Force alignment to the same alignment as the section start.
333 + */
334 +#define DEFINE_TRACE(name, proto, args) \
335 + static inline void trace_##name(proto) \
336 + { \
337 + static const char __tpstrtab_##name[] \
338 + __attribute__((section("__tracepoints_strings"))) \
339 + = #name ":" #proto; \
340 + static struct tracepoint __tracepoint_##name \
341 + __attribute__((section("__tracepoints"), aligned(8))) = \
342 + { __tpstrtab_##name, 0, NULL }; \
343 + if (unlikely(__tracepoint_##name.state)) \
344 + __DO_TRACE(&__tracepoint_##name, \
345 + TPPROTO(proto), TPARGS(args)); \
346 + } \
347 + static inline int register_trace_##name(void (*probe)(proto)) \
348 + { \
349 + return tracepoint_probe_register(#name ":" #proto, \
350 + (void *)probe); \
351 + } \
352 + static inline void unregister_trace_##name(void (*probe)(proto))\
353 + { \
354 + tracepoint_probe_unregister(#name ":" #proto, \
355 + (void *)probe); \
356 + }
357 +
358 +extern void tracepoint_update_probe_range(struct tracepoint *begin,
359 + struct tracepoint *end);
360 +
361 +#else /* !CONFIG_TRACEPOINTS */
362 +#define DEFINE_TRACE(name, proto, args) \
363 + static inline void _do_trace_##name(struct tracepoint *tp, proto) \
364 + { } \
365 + static inline void trace_##name(proto) \
366 + { } \
367 + static inline int register_trace_##name(void (*probe)(proto)) \
368 + { \
369 + return -ENOSYS; \
370 + } \
371 + static inline void unregister_trace_##name(void (*probe)(proto))\
372 + { }
373 +
374 +static inline void tracepoint_update_probe_range(struct tracepoint *begin,
375 + struct tracepoint *end)
376 +{ }
377 +#endif /* CONFIG_TRACEPOINTS */
378 +
379 +/*
380 + * Connect a probe to a tracepoint.
381 + * Internal API, should not be used directly.
382 + */
383 +extern int tracepoint_probe_register(const char *name, void *probe);
384 +
385 +/*
386 + * Disconnect a probe from a tracepoint.
387 + * Internal API, should not be used directly.
388 + */
389 +extern int tracepoint_probe_unregister(const char *name, void *probe);
390 +
391 +struct tracepoint_iter {
392 + struct module *module;
393 + struct tracepoint *tracepoint;
394 +};
395 +
396 +extern void tracepoint_iter_start(struct tracepoint_iter *iter);
397 +extern void tracepoint_iter_next(struct tracepoint_iter *iter);
398 +extern void tracepoint_iter_stop(struct tracepoint_iter *iter);
399 +extern void tracepoint_iter_reset(struct tracepoint_iter *iter);
400 +extern int tracepoint_get_iter_range(struct tracepoint **tracepoint,
401 + struct tracepoint *begin, struct tracepoint *end);
402 +
403 +#endif
404 --- a/init/Kconfig
405 +++ b/init/Kconfig
406 @@ -782,6 +782,13 @@ config PROFILING
407 Say Y here to enable the extended profiling support mechanisms used
408 by profilers such as OProfile.
409
410 +config TRACEPOINTS
411 + bool "Activate tracepoints"
412 + default y
413 + help
414 + Place an empty function call at each tracepoint site. Can be
415 + dynamically changed for a probe function.
416 +
417 config MARKERS
418 bool "Activate markers"
419 help
420 --- a/kernel/Makefile
421 +++ b/kernel/Makefile
422 @@ -84,6 +84,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
423 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
424 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
425 obj-$(CONFIG_MARKERS) += marker.o
426 +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
427 obj-$(CONFIG_LATENCYTOP) += latencytop.o
428 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
429 obj-$(CONFIG_FTRACE) += trace/
430 --- a/kernel/module.c
431 +++ b/kernel/module.c
432 @@ -46,6 +46,7 @@
433 #include <asm/cacheflush.h>
434 #include <linux/license.h>
435 #include <asm/sections.h>
436 +#include <linux/tracepoint.h>
437
438 #if 0
439 #define DEBUGP printk
440 @@ -1915,6 +1916,8 @@ static noinline struct module *load_modu
441 #endif
442 unsigned int markersindex;
443 unsigned int markersstringsindex;
444 + unsigned int tracepointsindex;
445 + unsigned int tracepointsstringsindex;
446 unsigned int verboseindex;
447 struct module *mod;
448 long err = 0;
449 @@ -2206,6 +2209,10 @@ static noinline struct module *load_modu
450 "__markers_strings");
451 verboseindex = find_sec(hdr, sechdrs, secstrings, "__verbose");
452
453 + tracepointsindex = find_sec(hdr, sechdrs, secstrings, "__tracepoints");
454 + tracepointsstringsindex = find_sec(hdr, sechdrs, secstrings,
455 + "__tracepoints_strings");
456 +
457 /* Now do relocations. */
458 for (i = 1; i < hdr->e_shnum; i++) {
459 const char *strtab = (char *)sechdrs[strindex].sh_addr;
460 @@ -2232,6 +2239,12 @@ static noinline struct module *load_modu
461 mod->num_markers =
462 sechdrs[markersindex].sh_size / sizeof(*mod->markers);
463 #endif
464 +#ifdef CONFIG_TRACEPOINTS
465 + mod->tracepoints = (void *)sechdrs[tracepointsindex].sh_addr;
466 + mod->num_tracepoints =
467 + sechdrs[tracepointsindex].sh_size / sizeof(*mod->tracepoints);
468 +#endif
469 +
470
471 /* Find duplicate symbols */
472 err = verify_export_symbols(mod);
473 @@ -2250,11 +2263,16 @@ static noinline struct module *load_modu
474
475 add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
476
477 + if (!mod->taints) {
478 #ifdef CONFIG_MARKERS
479 - if (!mod->taints)
480 marker_update_probe_range(mod->markers,
481 mod->markers + mod->num_markers);
482 #endif
483 +#ifdef CONFIG_TRACEPOINTS
484 + tracepoint_update_probe_range(mod->tracepoints,
485 + mod->tracepoints + mod->num_tracepoints);
486 +#endif
487 + }
488 dynamic_printk_setup(sechdrs, verboseindex);
489 err = module_finalize(hdr, sechdrs, mod);
490 if (err < 0)
491 @@ -2842,3 +2860,50 @@ void module_update_markers(void)
492 mutex_unlock(&module_mutex);
493 }
494 #endif
495 +
496 +#ifdef CONFIG_TRACEPOINTS
497 +void module_update_tracepoints(void)
498 +{
499 + struct module *mod;
500 +
501 + mutex_lock(&module_mutex);
502 + list_for_each_entry(mod, &modules, list)
503 + if (!mod->taints)
504 + tracepoint_update_probe_range(mod->tracepoints,
505 + mod->tracepoints + mod->num_tracepoints);
506 + mutex_unlock(&module_mutex);
507 +}
508 +
509 +/*
510 + * Returns 0 if current not found.
511 + * Returns 1 if current found.
512 + */
513 +int module_get_iter_tracepoints(struct tracepoint_iter *iter)
514 +{
515 + struct module *iter_mod;
516 + int found = 0;
517 +
518 + mutex_lock(&module_mutex);
519 + list_for_each_entry(iter_mod, &modules, list) {
520 + if (!iter_mod->taints) {
521 + /*
522 + * Sorted module list
523 + */
524 + if (iter_mod < iter->module)
525 + continue;
526 + else if (iter_mod > iter->module)
527 + iter->tracepoint = NULL;
528 + found = tracepoint_get_iter_range(&iter->tracepoint,
529 + iter_mod->tracepoints,
530 + iter_mod->tracepoints
531 + + iter_mod->num_tracepoints);
532 + if (found) {
533 + iter->module = iter_mod;
534 + break;
535 + }
536 + }
537 + }
538 + mutex_unlock(&module_mutex);
539 + return found;
540 +}
541 +#endif
542 --- /dev/null
543 +++ b/kernel/tracepoint.c
544 @@ -0,0 +1,476 @@
545 +/*
546 + * Copyright (C) 2008 Mathieu Desnoyers
547 + *
548 + * This program is free software; you can redistribute it and/or modify
549 + * it under the terms of the GNU General Public License as published by
550 + * the Free Software Foundation; either version 2 of the License, or
551 + * (at your option) any later version.
552 + *
553 + * This program is distributed in the hope that it will be useful,
554 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
555 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
556 + * GNU General Public License for more details.
557 + *
558 + * You should have received a copy of the GNU General Public License
559 + * along with this program; if not, write to the Free Software
560 + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
561 + */
562 +#include <linux/module.h>
563 +#include <linux/mutex.h>
564 +#include <linux/types.h>
565 +#include <linux/jhash.h>
566 +#include <linux/list.h>
567 +#include <linux/rcupdate.h>
568 +#include <linux/tracepoint.h>
569 +#include <linux/err.h>
570 +#include <linux/slab.h>
571 +
572 +extern struct tracepoint __start___tracepoints[];
573 +extern struct tracepoint __stop___tracepoints[];
574 +
575 +/* Set to 1 to enable tracepoint debug output */
576 +static const int tracepoint_debug;
577 +
578 +/*
579 + * tracepoints_mutex nests inside module_mutex. Tracepoints mutex protects the
580 + * builtin and module tracepoints and the hash table.
581 + */
582 +static DEFINE_MUTEX(tracepoints_mutex);
583 +
584 +/*
585 + * Tracepoint hash table, containing the active tracepoints.
586 + * Protected by tracepoints_mutex.
587 + */
588 +#define TRACEPOINT_HASH_BITS 6
589 +#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
590 +
591 +/*
592 + * Note about RCU :
593 + * It is used to to delay the free of multiple probes array until a quiescent
594 + * state is reached.
595 + * Tracepoint entries modifications are protected by the tracepoints_mutex.
596 + */
597 +struct tracepoint_entry {
598 + struct hlist_node hlist;
599 + void **funcs;
600 + int refcount; /* Number of times armed. 0 if disarmed. */
601 + struct rcu_head rcu;
602 + void *oldptr;
603 + unsigned char rcu_pending:1;
604 + char name[0];
605 +};
606 +
607 +static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
608 +
609 +static void free_old_closure(struct rcu_head *head)
610 +{
611 + struct tracepoint_entry *entry = container_of(head,
612 + struct tracepoint_entry, rcu);
613 + kfree(entry->oldptr);
614 + /* Make sure we free the data before setting the pending flag to 0 */
615 + smp_wmb();
616 + entry->rcu_pending = 0;
617 +}
618 +
619 +static void tracepoint_entry_free_old(struct tracepoint_entry *entry, void *old)
620 +{
621 + if (!old)
622 + return;
623 + entry->oldptr = old;
624 + entry->rcu_pending = 1;
625 + /* write rcu_pending before calling the RCU callback */
626 + smp_wmb();
627 +#ifdef CONFIG_PREEMPT_RCU
628 + synchronize_sched(); /* Until we have the call_rcu_sched() */
629 +#endif
630 + call_rcu(&entry->rcu, free_old_closure);
631 +}
632 +
633 +static void debug_print_probes(struct tracepoint_entry *entry)
634 +{
635 + int i;
636 +
637 + if (!tracepoint_debug)
638 + return;
639 +
640 + for (i = 0; entry->funcs[i]; i++)
641 + printk(KERN_DEBUG "Probe %d : %p\n", i, entry->funcs[i]);
642 +}
643 +
644 +static void *
645 +tracepoint_entry_add_probe(struct tracepoint_entry *entry, void *probe)
646 +{
647 + int nr_probes = 0;
648 + void **old, **new;
649 +
650 + WARN_ON(!probe);
651 +
652 + debug_print_probes(entry);
653 + old = entry->funcs;
654 + if (old) {
655 + /* (N -> N+1), (N != 0, 1) probes */
656 + for (nr_probes = 0; old[nr_probes]; nr_probes++)
657 + if (old[nr_probes] == probe)
658 + return ERR_PTR(-EEXIST);
659 + }
660 + /* + 2 : one for new probe, one for NULL func */
661 + new = kzalloc((nr_probes + 2) * sizeof(void *), GFP_KERNEL);
662 + if (new == NULL)
663 + return ERR_PTR(-ENOMEM);
664 + if (old)
665 + memcpy(new, old, nr_probes * sizeof(void *));
666 + new[nr_probes] = probe;
667 + entry->refcount = nr_probes + 1;
668 + entry->funcs = new;
669 + debug_print_probes(entry);
670 + return old;
671 +}
672 +
673 +static void *
674 +tracepoint_entry_remove_probe(struct tracepoint_entry *entry, void *probe)
675 +{
676 + int nr_probes = 0, nr_del = 0, i;
677 + void **old, **new;
678 +
679 + old = entry->funcs;
680 +
681 + debug_print_probes(entry);
682 + /* (N -> M), (N > 1, M >= 0) probes */
683 + for (nr_probes = 0; old[nr_probes]; nr_probes++) {
684 + if ((!probe || old[nr_probes] == probe))
685 + nr_del++;
686 + }
687 +
688 + if (nr_probes - nr_del == 0) {
689 + /* N -> 0, (N > 1) */
690 + entry->funcs = NULL;
691 + entry->refcount = 0;
692 + debug_print_probes(entry);
693 + return old;
694 + } else {
695 + int j = 0;
696 + /* N -> M, (N > 1, M > 0) */
697 + /* + 1 for NULL */
698 + new = kzalloc((nr_probes - nr_del + 1)
699 + * sizeof(void *), GFP_KERNEL);
700 + if (new == NULL)
701 + return ERR_PTR(-ENOMEM);
702 + for (i = 0; old[i]; i++)
703 + if ((probe && old[i] != probe))
704 + new[j++] = old[i];
705 + entry->refcount = nr_probes - nr_del;
706 + entry->funcs = new;
707 + }
708 + debug_print_probes(entry);
709 + return old;
710 +}
711 +
712 +/*
713 + * Get tracepoint if the tracepoint is present in the tracepoint hash table.
714 + * Must be called with tracepoints_mutex held.
715 + * Returns NULL if not present.
716 + */
717 +static struct tracepoint_entry *get_tracepoint(const char *name)
718 +{
719 + struct hlist_head *head;
720 + struct hlist_node *node;
721 + struct tracepoint_entry *e;
722 + u32 hash = jhash(name, strlen(name), 0);
723 +
724 + head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
725 + hlist_for_each_entry(e, node, head, hlist) {
726 + if (!strcmp(name, e->name))
727 + return e;
728 + }
729 + return NULL;
730 +}
731 +
732 +/*
733 + * Add the tracepoint to the tracepoint hash table. Must be called with
734 + * tracepoints_mutex held.
735 + */
736 +static struct tracepoint_entry *add_tracepoint(const char *name)
737 +{
738 + struct hlist_head *head;
739 + struct hlist_node *node;
740 + struct tracepoint_entry *e;
741 + size_t name_len = strlen(name) + 1;
742 + u32 hash = jhash(name, name_len-1, 0);
743 +
744 + head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
745 + hlist_for_each_entry(e, node, head, hlist) {
746 + if (!strcmp(name, e->name)) {
747 + printk(KERN_NOTICE
748 + "tracepoint %s busy\n", name);
749 + return ERR_PTR(-EEXIST); /* Already there */
750 + }
751 + }
752 + /*
753 + * Using kmalloc here to allocate a variable length element. Could
754 + * cause some memory fragmentation if overused.
755 + */
756 + e = kmalloc(sizeof(struct tracepoint_entry) + name_len, GFP_KERNEL);
757 + if (!e)
758 + return ERR_PTR(-ENOMEM);
759 + memcpy(&e->name[0], name, name_len);
760 + e->funcs = NULL;
761 + e->refcount = 0;
762 + e->rcu_pending = 0;
763 + hlist_add_head(&e->hlist, head);
764 + return e;
765 +}
766 +
767 +/*
768 + * Remove the tracepoint from the tracepoint hash table. Must be called with
769 + * mutex_lock held.
770 + */
771 +static int remove_tracepoint(const char *name)
772 +{
773 + struct hlist_head *head;
774 + struct hlist_node *node;
775 + struct tracepoint_entry *e;
776 + int found = 0;
777 + size_t len = strlen(name) + 1;
778 + u32 hash = jhash(name, len-1, 0);
779 +
780 + head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)];
781 + hlist_for_each_entry(e, node, head, hlist) {
782 + if (!strcmp(name, e->name)) {
783 + found = 1;
784 + break;
785 + }
786 + }
787 + if (!found)
788 + return -ENOENT;
789 + if (e->refcount)
790 + return -EBUSY;
791 + hlist_del(&e->hlist);
792 + /* Make sure the call_rcu has been executed */
793 + if (e->rcu_pending)
794 + rcu_barrier();
795 + kfree(e);
796 + return 0;
797 +}
798 +
799 +/*
800 + * Sets the probe callback corresponding to one tracepoint.
801 + */
802 +static void set_tracepoint(struct tracepoint_entry **entry,
803 + struct tracepoint *elem, int active)
804 +{
805 + WARN_ON(strcmp((*entry)->name, elem->name) != 0);
806 +
807 + /*
808 + * rcu_assign_pointer has a smp_wmb() which makes sure that the new
809 + * probe callbacks array is consistent before setting a pointer to it.
810 + * This array is referenced by __DO_TRACE from
811 + * include/linux/tracepoints.h. A matching smp_read_barrier_depends()
812 + * is used.
813 + */
814 + rcu_assign_pointer(elem->funcs, (*entry)->funcs);
815 + elem->state = active;
816 +}
817 +
818 +/*
819 + * Disable a tracepoint and its probe callback.
820 + * Note: only waiting an RCU period after setting elem->call to the empty
821 + * function insures that the original callback is not used anymore. This insured
822 + * by preempt_disable around the call site.
823 + */
824 +static void disable_tracepoint(struct tracepoint *elem)
825 +{
826 + elem->state = 0;
827 +}
828 +
829 +/**
830 + * tracepoint_update_probe_range - Update a probe range
831 + * @begin: beginning of the range
832 + * @end: end of the range
833 + *
834 + * Updates the probe callback corresponding to a range of tracepoints.
835 + */
836 +void tracepoint_update_probe_range(struct tracepoint *begin,
837 + struct tracepoint *end)
838 +{
839 + struct tracepoint *iter;
840 + struct tracepoint_entry *mark_entry;
841 +
842 + mutex_lock(&tracepoints_mutex);
843 + for (iter = begin; iter < end; iter++) {
844 + mark_entry = get_tracepoint(iter->name);
845 + if (mark_entry) {
846 + set_tracepoint(&mark_entry, iter,
847 + !!mark_entry->refcount);
848 + } else {
849 + disable_tracepoint(iter);
850 + }
851 + }
852 + mutex_unlock(&tracepoints_mutex);
853 +}
854 +
855 +/*
856 + * Update probes, removing the faulty probes.
857 + */
858 +static void tracepoint_update_probes(void)
859 +{
860 + /* Core kernel tracepoints */
861 + tracepoint_update_probe_range(__start___tracepoints,
862 + __stop___tracepoints);
863 + /* tracepoints in modules. */
864 + module_update_tracepoints();
865 +}
866 +
867 +/**
868 + * tracepoint_probe_register - Connect a probe to a tracepoint
869 + * @name: tracepoint name
870 + * @probe: probe handler
871 + *
872 + * Returns 0 if ok, error value on error.
873 + * The probe address must at least be aligned on the architecture pointer size.
874 + */
875 +int tracepoint_probe_register(const char *name, void *probe)
876 +{
877 + struct tracepoint_entry *entry;
878 + int ret = 0;
879 + void *old;
880 +
881 + mutex_lock(&tracepoints_mutex);
882 + entry = get_tracepoint(name);
883 + if (!entry) {
884 + entry = add_tracepoint(name);
885 + if (IS_ERR(entry)) {
886 + ret = PTR_ERR(entry);
887 + goto end;
888 + }
889 + }
890 + /*
891 + * If we detect that a call_rcu is pending for this tracepoint,
892 + * make sure it's executed now.
893 + */
894 + if (entry->rcu_pending)
895 + rcu_barrier();
896 + old = tracepoint_entry_add_probe(entry, probe);
897 + if (IS_ERR(old)) {
898 + ret = PTR_ERR(old);
899 + goto end;
900 + }
901 + mutex_unlock(&tracepoints_mutex);
902 + tracepoint_update_probes(); /* may update entry */
903 + mutex_lock(&tracepoints_mutex);
904 + entry = get_tracepoint(name);
905 + WARN_ON(!entry);
906 + tracepoint_entry_free_old(entry, old);
907 +end:
908 + mutex_unlock(&tracepoints_mutex);
909 + return ret;
910 +}
911 +EXPORT_SYMBOL_GPL(tracepoint_probe_register);
912 +
913 +/**
914 + * tracepoint_probe_unregister - Disconnect a probe from a tracepoint
915 + * @name: tracepoint name
916 + * @probe: probe function pointer
917 + *
918 + * We do not need to call a synchronize_sched to make sure the probes have
919 + * finished running before doing a module unload, because the module unload
920 + * itself uses stop_machine(), which insures that every preempt disabled section
921 + * have finished.
922 + */
923 +int tracepoint_probe_unregister(const char *name, void *probe)
924 +{
925 + struct tracepoint_entry *entry;
926 + void *old;
927 + int ret = -ENOENT;
928 +
929 + mutex_lock(&tracepoints_mutex);
930 + entry = get_tracepoint(name);
931 + if (!entry)
932 + goto end;
933 + if (entry->rcu_pending)
934 + rcu_barrier();
935 + old = tracepoint_entry_remove_probe(entry, probe);
936 + mutex_unlock(&tracepoints_mutex);
937 + tracepoint_update_probes(); /* may update entry */
938 + mutex_lock(&tracepoints_mutex);
939 + entry = get_tracepoint(name);
940 + if (!entry)
941 + goto end;
942 + tracepoint_entry_free_old(entry, old);
943 + remove_tracepoint(name); /* Ignore busy error message */
944 + ret = 0;
945 +end:
946 + mutex_unlock(&tracepoints_mutex);
947 + return ret;
948 +}
949 +EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
950 +
951 +/**
952 + * tracepoint_get_iter_range - Get a next tracepoint iterator given a range.
953 + * @tracepoint: current tracepoints (in), next tracepoint (out)
954 + * @begin: beginning of the range
955 + * @end: end of the range
956 + *
957 + * Returns whether a next tracepoint has been found (1) or not (0).
958 + * Will return the first tracepoint in the range if the input tracepoint is
959 + * NULL.
960 + */
961 +int tracepoint_get_iter_range(struct tracepoint **tracepoint,
962 + struct tracepoint *begin, struct tracepoint *end)
963 +{
964 + if (!*tracepoint && begin != end) {
965 + *tracepoint = begin;
966 + return 1;
967 + }
968 + if (*tracepoint >= begin && *tracepoint < end)
969 + return 1;
970 + return 0;
971 +}
972 +EXPORT_SYMBOL_GPL(tracepoint_get_iter_range);
973 +
974 +static void tracepoint_get_iter(struct tracepoint_iter *iter)
975 +{
976 + int found = 0;
977 +
978 + /* Core kernel tracepoints */
979 + if (!iter->module) {
980 + found = tracepoint_get_iter_range(&iter->tracepoint,
981 + __start___tracepoints, __stop___tracepoints);
982 + if (found)
983 + goto end;
984 + }
985 + /* tracepoints in modules. */
986 + found = module_get_iter_tracepoints(iter);
987 +end:
988 + if (!found)
989 + tracepoint_iter_reset(iter);
990 +}
991 +
992 +void tracepoint_iter_start(struct tracepoint_iter *iter)
993 +{
994 + tracepoint_get_iter(iter);
995 +}
996 +EXPORT_SYMBOL_GPL(tracepoint_iter_start);
997 +
998 +void tracepoint_iter_next(struct tracepoint_iter *iter)
999 +{
1000 + iter->tracepoint++;
1001 + /*
1002 + * iter->tracepoint may be invalid because we blindly incremented it.
1003 + * Make sure it is valid by marshalling on the tracepoints, getting the
1004 + * tracepoints from following modules if necessary.
1005 + */
1006 + tracepoint_get_iter(iter);
1007 +}
1008 +EXPORT_SYMBOL_GPL(tracepoint_iter_next);
1009 +
1010 +void tracepoint_iter_stop(struct tracepoint_iter *iter)
1011 +{
1012 +}
1013 +EXPORT_SYMBOL_GPL(tracepoint_iter_stop);
1014 +
1015 +void tracepoint_iter_reset(struct tracepoint_iter *iter)
1016 +{
1017 + iter->module = NULL;
1018 + iter->tracepoint = NULL;
1019 +}
1020 +EXPORT_SYMBOL_GPL(tracepoint_iter_reset);