]>
Commit | Line | Data |
---|---|---|
2cb7cef9 BS |
1 | From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> |
2 | Subject: Kernel Tracepoints | |
3 | ||
4 | Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers. | |
5 | Allows complete typing verification by declaring both tracing statement inline | |
6 | functions and probe registration/unregistration static inline functions within | |
7 | the same macro "DEFINE_TRACE". No format string is required. See the | |
8 | tracepoint Documentation and Samples patches for usage examples. | |
9 | ||
10 | Taken from the documentation patch : | |
11 | ||
12 | "A tracepoint placed in code provides a hook to call a function (probe) that you | |
13 | can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or | |
14 | "off" (no probe is attached). When a tracepoint is "off" it has no effect, | |
15 | except for adding a tiny time penalty (checking a condition for a branch) and | |
16 | space penalty (adding a few bytes for the function call at the end of the | |
17 | instrumented function and adds a data structure in a separate section). When a | |
18 | tracepoint is "on", the function you provide is called each time the tracepoint | |
19 | is executed, in the execution context of the caller. When the function provided | |
20 | ends its execution, it returns to the caller (continuing from the tracepoint | |
21 | site). | |
22 | ||
23 | You can put tracepoints at important locations in the code. They are lightweight | |
24 | hooks that can pass an arbitrary number of parameters, which prototypes are | |
25 | described in a tracepoint declaration placed in a header file." | |
26 | ||
27 | Addition and removal of tracepoints is synchronized by RCU using the | |
28 | scheduler (and preempt_disable) as guarantees to find a quiescent state | |
29 | (this is really RCU "classic"). The update side uses rcu_barrier_sched() | |
30 | with call_rcu_sched() and the read/execute side uses | |
31 | "preempt_disable()/preempt_enable()". | |
32 | ||
33 | We make sure the previous array containing probes, which has been scheduled for | |
34 | deletion by the rcu callback, is indeed freed before we proceed to the next | |
35 | update. It therefore limits the rate of modification of a single tracepoint to | |
36 | one update per RCU period. The objective here is to permit fast batch | |
37 | add/removal of probes on _different_ tracepoints. | |
38 | ||
39 | Changelog : | |
40 | - Use #name ":" #proto as string to identify the tracepoint in the | |
41 | tracepoint table. This will make sure not type mismatch happens due to | |
42 | connexion of a probe with the wrong type to a tracepoint declared with | |
43 | the same name in a different header. | |
44 | - Add tracepoint_entry_free_old. | |
45 | - Change __TO_TRACE to get rid of the 'i' iterator. | |
46 | ||
47 | Masami Hiramatsu <mhiramat@redhat.com> : | |
48 | Tested on x86-64. | |
49 | ||
50 | Performance impact of a tracepoint : same as markers, except that it adds about | |
51 | 70 bytes of instructions in an unlikely branch of each instrumented function | |
52 | (the for loop, the stack setup and the function call). It currently adds a | |
53 | memory read, a test and a conditional branch at the instrumentation site (in the | |
54 | hot path). Immediate values will eventually change this into a load immediate, | |
55 | test and branch, which removes the memory read which will make the i-cache | |
56 | impact smaller (changing the memory read for a load immediate removes 3-4 bytes | |
57 | per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also | |
58 | saves the d-cache hit). | |
59 | ||
60 | About the performance impact of tracepoints (which is comparable to markers), | |
61 | even without immediate values optimizations, tests done by Hideo Aoki on ia64 | |
62 | show no regression. His test case was using hackbench on a kernel where | |
63 | scheduler instrumentation (about 5 events in code scheduler code) was added. | |
64 | ||
65 | ||
66 | Quoting Hideo Aoki about Markers : | |
67 | ||
68 | I evaluated overhead of kernel marker using linux-2.6-sched-fixes | |
69 | git tree, which includes several markers for LTTng, using an ia64 | |
70 | server. | |
71 | ||
72 | While the immediate trace mark feature isn't implemented on ia64, | |
73 | there is no major performance regression. So, I think that we | |
74 | don't have any issues to propose merging marker point patches | |
75 | into Linus's tree from the viewpoint of performance impact. | |
76 | ||
77 | I prepared two kernels to evaluate. The first one was compiled | |
78 | without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS. | |
79 | ||
80 | I downloaded the original hackbench from the following URL: | |
81 | http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c | |
82 | ||
83 | I ran hackbench 5 times in each condition and calculated the | |
84 | average and difference between the kernels. | |
85 | ||
86 | The parameter of hackbench: every 50 from 50 to 800 | |
87 | The number of CPUs of the server: 2, 4, and 8 | |
88 | ||
89 | Below is the results. As you can see, major performance | |
90 | regression wasn't found in any case. Even if number of processes | |
91 | increases, differences between marker-enabled kernel and marker- | |
92 | disabled kernel doesn't increase. Moreover, if number of CPUs | |
93 | increases, the differences doesn't increase either. | |
94 | ||
95 | Curiously, marker-enabled kernel is better than marker-disabled | |
96 | kernel in more than half cases, although I guess it comes from | |
97 | the difference of memory access pattern. | |
98 | ||
99 | ||
100 | * 2 CPUs | |
101 | ||
102 | Number of | without | with | diff | diff | | |
103 | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | | |
104 | -------------------------------------------------------------- | |
105 | 50 | 4.811 | 4.872 | +0.061 | +1.27 | | |
106 | 100 | 9.854 | 10.309 | +0.454 | +4.61 | | |
107 | 150 | 15.602 | 15.040 | -0.562 | -3.6 | | |
108 | 200 | 20.489 | 20.380 | -0.109 | -0.53 | | |
109 | 250 | 25.798 | 25.652 | -0.146 | -0.56 | | |
110 | 300 | 31.260 | 30.797 | -0.463 | -1.48 | | |
111 | 350 | 36.121 | 35.770 | -0.351 | -0.97 | | |
112 | 400 | 42.288 | 42.102 | -0.186 | -0.44 | | |
113 | 450 | 47.778 | 47.253 | -0.526 | -1.1 | | |
114 | 500 | 51.953 | 52.278 | +0.325 | +0.63 | | |
115 | 550 | 58.401 | 57.700 | -0.701 | -1.2 | | |
116 | 600 | 63.334 | 63.222 | -0.112 | -0.18 | | |
117 | 650 | 68.816 | 68.511 | -0.306 | -0.44 | | |
118 | 700 | 74.667 | 74.088 | -0.579 | -0.78 | | |
119 | 750 | 78.612 | 79.582 | +0.970 | +1.23 | | |
120 | 800 | 85.431 | 85.263 | -0.168 | -0.2 | | |
121 | -------------------------------------------------------------- | |
122 | ||
123 | * 4 CPUs | |
124 | ||
125 | Number of | without | with | diff | diff | | |
126 | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | | |
127 | -------------------------------------------------------------- | |
128 | 50 | 2.586 | 2.584 | -0.003 | -0.1 | | |
129 | 100 | 5.254 | 5.283 | +0.030 | +0.56 | | |
130 | 150 | 8.012 | 8.074 | +0.061 | +0.76 | | |
131 | 200 | 11.172 | 11.000 | -0.172 | -1.54 | | |
132 | 250 | 13.917 | 14.036 | +0.119 | +0.86 | | |
133 | 300 | 16.905 | 16.543 | -0.362 | -2.14 | | |
134 | 350 | 19.901 | 20.036 | +0.135 | +0.68 | | |
135 | 400 | 22.908 | 23.094 | +0.186 | +0.81 | | |
136 | 450 | 26.273 | 26.101 | -0.172 | -0.66 | | |
137 | 500 | 29.554 | 29.092 | -0.461 | -1.56 | | |
138 | 550 | 32.377 | 32.274 | -0.103 | -0.32 | | |
139 | 600 | 35.855 | 35.322 | -0.533 | -1.49 | | |
140 | 650 | 39.192 | 38.388 | -0.804 | -2.05 | | |
141 | 700 | 41.744 | 41.719 | -0.025 | -0.06 | | |
142 | 750 | 45.016 | 44.496 | -0.520 | -1.16 | | |
143 | 800 | 48.212 | 47.603 | -0.609 | -1.26 | | |
144 | -------------------------------------------------------------- | |
145 | ||
146 | * 8 CPUs | |
147 | ||
148 | Number of | without | with | diff | diff | | |
149 | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | | |
150 | -------------------------------------------------------------- | |
151 | 50 | 2.094 | 2.072 | -0.022 | -1.07 | | |
152 | 100 | 4.162 | 4.273 | +0.111 | +2.66 | | |
153 | 150 | 6.485 | 6.540 | +0.055 | +0.84 | | |
154 | 200 | 8.556 | 8.478 | -0.078 | -0.91 | | |
155 | 250 | 10.458 | 10.258 | -0.200 | -1.91 | | |
156 | 300 | 12.425 | 12.750 | +0.325 | +2.62 | | |
157 | 350 | 14.807 | 14.839 | +0.032 | +0.22 | | |
158 | 400 | 16.801 | 16.959 | +0.158 | +0.94 | | |
159 | 450 | 19.478 | 19.009 | -0.470 | -2.41 | | |
160 | 500 | 21.296 | 21.504 | +0.208 | +0.98 | | |
161 | 550 | 23.842 | 23.979 | +0.137 | +0.57 | | |
162 | 600 | 26.309 | 26.111 | -0.198 | -0.75 | | |
163 | 650 | 28.705 | 28.446 | -0.259 | -0.9 | | |
164 | 700 | 31.233 | 31.394 | +0.161 | +0.52 | | |
165 | 750 | 34.064 | 33.720 | -0.344 | -1.01 | | |
166 | 800 | 36.320 | 36.114 | -0.206 | -0.57 | | |
167 | -------------------------------------------------------------- | |
168 | ||
169 | Best regards, | |
170 | Hideo | |
171 | ||
172 | ||
173 | P.S. When I compiled the linux-2.6-sched-fixes tree on ia64, I | |
174 | had to revert the following git commit since pteval_t is defined | |
175 | on x86 only. | |
176 | ||
177 | commit 8686f2b37e7394b51dd6593678cbfd85ecd28c65 | |
178 | Date: Tue May 6 15:42:40 2008 -0700 | |
179 | ||
180 | generic, x86, PAT: fix mprotect | |
181 | ||
182 | ||
183 | Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> | |
184 | Acked-by: Masami Hiramatsu <mhiramat@redhat.com> | |
185 | Acked-by: 'Peter Zijlstra' <peterz@infradead.org> | |
186 | CC: "Frank Ch. Eigler" <fche@redhat.com> | |
187 | CC: 'Ingo Molnar' <mingo@elte.hu> | |
188 | CC: 'Hideo AOKI' <haoki@redhat.com> | |
189 | CC: Takashi Nishiie <t-nishiie@np.css.fujitsu.com> | |
190 | CC: 'Steven Rostedt' <rostedt@goodmis.org> | |
191 | CC: Alexander Viro <viro@zeniv.linux.org.uk> | |
192 | CC: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> | |
193 | Acked-by: Jan Blunck <jblunck@suse.de> | |
194 | --- | |
195 | include/asm-generic/vmlinux.lds.h | 6 | |
196 | include/linux/module.h | 18 + | |
197 | include/linux/tracepoint.h | 127 ++++++++++ | |
198 | init/Kconfig | 7 | |
199 | kernel/Makefile | 1 | |
200 | kernel/module.c | 67 +++++ | |
201 | kernel/tracepoint.c | 476 ++++++++++++++++++++++++++++++++++++++ | |
202 | 7 files changed, 700 insertions(+), 2 deletions(-) | |
203 | ||
204 | --- a/include/asm-generic/vmlinux.lds.h | |
205 | +++ b/include/asm-generic/vmlinux.lds.h | |
206 | @@ -52,7 +52,10 @@ | |
207 | . = ALIGN(8); \ | |
208 | VMLINUX_SYMBOL(__start___markers) = .; \ | |
209 | *(__markers) \ | |
210 | - VMLINUX_SYMBOL(__stop___markers) = .; | |
211 | + VMLINUX_SYMBOL(__stop___markers) = .; \ | |
212 | + VMLINUX_SYMBOL(__start___tracepoints) = .; \ | |
213 | + *(__tracepoints) \ | |
214 | + VMLINUX_SYMBOL(__stop___tracepoints) = .; | |
215 | ||
216 | #define RO_DATA(align) \ | |
217 | . = ALIGN((align)); \ | |
218 | @@ -61,6 +64,7 @@ | |
219 | *(.rodata) *(.rodata.*) \ | |
220 | *(__vermagic) /* Kernel version magic */ \ | |
221 | *(__markers_strings) /* Markers: strings */ \ | |
222 | + *(__tracepoints_strings)/* Tracepoints: strings */ \ | |
223 | } \ | |
224 | \ | |
225 | .rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \ | |
226 | --- a/include/linux/module.h | |
227 | +++ b/include/linux/module.h | |
228 | @@ -16,6 +16,7 @@ | |
229 | #include <linux/kobject.h> | |
230 | #include <linux/moduleparam.h> | |
231 | #include <linux/marker.h> | |
232 | +#include <linux/tracepoint.h> | |
233 | #include <asm/local.h> | |
234 | ||
235 | #include <asm/module.h> | |
236 | @@ -332,6 +333,11 @@ struct module | |
237 | unsigned int num_markers; | |
238 | #endif | |
239 | ||
240 | +#ifdef CONFIG_TRACEPOINTS | |
241 | + struct tracepoint *tracepoints; | |
242 | + unsigned int num_tracepoints; | |
243 | +#endif | |
244 | + | |
245 | #ifdef CONFIG_MODULE_UNLOAD | |
246 | /* What modules depend on me? */ | |
247 | struct list_head modules_which_use_me; | |
248 | @@ -453,6 +459,9 @@ extern void print_modules(void); | |
249 | ||
250 | extern void module_update_markers(void); | |
251 | ||
252 | +extern void module_update_tracepoints(void); | |
253 | +extern int module_get_iter_tracepoints(struct tracepoint_iter *iter); | |
254 | + | |
255 | #else /* !CONFIG_MODULES... */ | |
256 | #define EXPORT_SYMBOL(sym) | |
257 | #define EXPORT_SYMBOL_GPL(sym) | |
258 | @@ -557,6 +566,15 @@ static inline void module_update_markers | |
259 | { | |
260 | } | |
261 | ||
262 | +static inline void module_update_tracepoints(void) | |
263 | +{ | |
264 | +} | |
265 | + | |
266 | +static inline int module_get_iter_tracepoints(struct tracepoint_iter *iter) | |
267 | +{ | |
268 | + return 0; | |
269 | +} | |
270 | + | |
271 | #endif /* CONFIG_MODULES */ | |
272 | ||
273 | struct device_driver; | |
274 | --- /dev/null | |
275 | +++ b/include/linux/tracepoint.h | |
276 | @@ -0,0 +1,127 @@ | |
277 | +#ifndef _LINUX_TRACEPOINT_H | |
278 | +#define _LINUX_TRACEPOINT_H | |
279 | + | |
280 | +/* | |
281 | + * Kernel Tracepoint API. | |
282 | + * | |
283 | + * See Documentation/tracepoint.txt. | |
284 | + * | |
285 | + * (C) Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> | |
286 | + * | |
287 | + * Heavily inspired from the Linux Kernel Markers. | |
288 | + * | |
289 | + * This file is released under the GPLv2. | |
290 | + * See the file COPYING for more details. | |
291 | + */ | |
292 | + | |
293 | +#include <linux/types.h> | |
294 | +#include <linux/rcupdate.h> | |
295 | + | |
296 | +struct module; | |
297 | +struct tracepoint; | |
298 | + | |
299 | +struct tracepoint { | |
300 | + const char *name; /* Tracepoint name */ | |
301 | + int state; /* State. */ | |
302 | + void **funcs; | |
303 | +} __attribute__((aligned(8))); | |
304 | + | |
305 | + | |
306 | +#define TPPROTO(args...) args | |
307 | +#define TPARGS(args...) args | |
308 | + | |
309 | +#ifdef CONFIG_TRACEPOINTS | |
310 | + | |
311 | +/* | |
312 | + * it_func[0] is never NULL because there is at least one element in the array | |
313 | + * when the array itself is non NULL. | |
314 | + */ | |
315 | +#define __DO_TRACE(tp, proto, args) \ | |
316 | + do { \ | |
317 | + void **it_func; \ | |
318 | + \ | |
319 | + rcu_read_lock_sched(); \ | |
320 | + it_func = rcu_dereference((tp)->funcs); \ | |
321 | + if (it_func) { \ | |
322 | + do { \ | |
323 | + ((void(*)(proto))(*it_func))(args); \ | |
324 | + } while (*(++it_func)); \ | |
325 | + } \ | |
326 | + rcu_read_unlock_sched(); \ | |
327 | + } while (0) | |
328 | + | |
329 | +/* | |
330 | + * Make sure the alignment of the structure in the __tracepoints section will | |
331 | + * not add unwanted padding between the beginning of the section and the | |
332 | + * structure. Force alignment to the same alignment as the section start. | |
333 | + */ | |
334 | +#define DEFINE_TRACE(name, proto, args) \ | |
335 | + static inline void trace_##name(proto) \ | |
336 | + { \ | |
337 | + static const char __tpstrtab_##name[] \ | |
338 | + __attribute__((section("__tracepoints_strings"))) \ | |
339 | + = #name ":" #proto; \ | |
340 | + static struct tracepoint __tracepoint_##name \ | |
341 | + __attribute__((section("__tracepoints"), aligned(8))) = \ | |
342 | + { __tpstrtab_##name, 0, NULL }; \ | |
343 | + if (unlikely(__tracepoint_##name.state)) \ | |
344 | + __DO_TRACE(&__tracepoint_##name, \ | |
345 | + TPPROTO(proto), TPARGS(args)); \ | |
346 | + } \ | |
347 | + static inline int register_trace_##name(void (*probe)(proto)) \ | |
348 | + { \ | |
349 | + return tracepoint_probe_register(#name ":" #proto, \ | |
350 | + (void *)probe); \ | |
351 | + } \ | |
352 | + static inline void unregister_trace_##name(void (*probe)(proto))\ | |
353 | + { \ | |
354 | + tracepoint_probe_unregister(#name ":" #proto, \ | |
355 | + (void *)probe); \ | |
356 | + } | |
357 | + | |
358 | +extern void tracepoint_update_probe_range(struct tracepoint *begin, | |
359 | + struct tracepoint *end); | |
360 | + | |
361 | +#else /* !CONFIG_TRACEPOINTS */ | |
362 | +#define DEFINE_TRACE(name, proto, args) \ | |
363 | + static inline void _do_trace_##name(struct tracepoint *tp, proto) \ | |
364 | + { } \ | |
365 | + static inline void trace_##name(proto) \ | |
366 | + { } \ | |
367 | + static inline int register_trace_##name(void (*probe)(proto)) \ | |
368 | + { \ | |
369 | + return -ENOSYS; \ | |
370 | + } \ | |
371 | + static inline void unregister_trace_##name(void (*probe)(proto))\ | |
372 | + { } | |
373 | + | |
374 | +static inline void tracepoint_update_probe_range(struct tracepoint *begin, | |
375 | + struct tracepoint *end) | |
376 | +{ } | |
377 | +#endif /* CONFIG_TRACEPOINTS */ | |
378 | + | |
379 | +/* | |
380 | + * Connect a probe to a tracepoint. | |
381 | + * Internal API, should not be used directly. | |
382 | + */ | |
383 | +extern int tracepoint_probe_register(const char *name, void *probe); | |
384 | + | |
385 | +/* | |
386 | + * Disconnect a probe from a tracepoint. | |
387 | + * Internal API, should not be used directly. | |
388 | + */ | |
389 | +extern int tracepoint_probe_unregister(const char *name, void *probe); | |
390 | + | |
391 | +struct tracepoint_iter { | |
392 | + struct module *module; | |
393 | + struct tracepoint *tracepoint; | |
394 | +}; | |
395 | + | |
396 | +extern void tracepoint_iter_start(struct tracepoint_iter *iter); | |
397 | +extern void tracepoint_iter_next(struct tracepoint_iter *iter); | |
398 | +extern void tracepoint_iter_stop(struct tracepoint_iter *iter); | |
399 | +extern void tracepoint_iter_reset(struct tracepoint_iter *iter); | |
400 | +extern int tracepoint_get_iter_range(struct tracepoint **tracepoint, | |
401 | + struct tracepoint *begin, struct tracepoint *end); | |
402 | + | |
403 | +#endif | |
404 | --- a/init/Kconfig | |
405 | +++ b/init/Kconfig | |
406 | @@ -782,6 +782,13 @@ config PROFILING | |
407 | Say Y here to enable the extended profiling support mechanisms used | |
408 | by profilers such as OProfile. | |
409 | ||
410 | +config TRACEPOINTS | |
411 | + bool "Activate tracepoints" | |
412 | + default y | |
413 | + help | |
414 | + Place an empty function call at each tracepoint site. Can be | |
415 | + dynamically changed for a probe function. | |
416 | + | |
417 | config MARKERS | |
418 | bool "Activate markers" | |
419 | help | |
420 | --- a/kernel/Makefile | |
421 | +++ b/kernel/Makefile | |
422 | @@ -84,6 +84,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o | |
423 | obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o | |
424 | obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o | |
425 | obj-$(CONFIG_MARKERS) += marker.o | |
426 | +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o | |
427 | obj-$(CONFIG_LATENCYTOP) += latencytop.o | |
428 | obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o | |
429 | obj-$(CONFIG_FTRACE) += trace/ | |
430 | --- a/kernel/module.c | |
431 | +++ b/kernel/module.c | |
432 | @@ -46,6 +46,7 @@ | |
433 | #include <asm/cacheflush.h> | |
434 | #include <linux/license.h> | |
435 | #include <asm/sections.h> | |
436 | +#include <linux/tracepoint.h> | |
437 | ||
438 | #if 0 | |
439 | #define DEBUGP printk | |
440 | @@ -1915,6 +1916,8 @@ static noinline struct module *load_modu | |
441 | #endif | |
442 | unsigned int markersindex; | |
443 | unsigned int markersstringsindex; | |
444 | + unsigned int tracepointsindex; | |
445 | + unsigned int tracepointsstringsindex; | |
446 | unsigned int verboseindex; | |
447 | struct module *mod; | |
448 | long err = 0; | |
449 | @@ -2206,6 +2209,10 @@ static noinline struct module *load_modu | |
450 | "__markers_strings"); | |
451 | verboseindex = find_sec(hdr, sechdrs, secstrings, "__verbose"); | |
452 | ||
453 | + tracepointsindex = find_sec(hdr, sechdrs, secstrings, "__tracepoints"); | |
454 | + tracepointsstringsindex = find_sec(hdr, sechdrs, secstrings, | |
455 | + "__tracepoints_strings"); | |
456 | + | |
457 | /* Now do relocations. */ | |
458 | for (i = 1; i < hdr->e_shnum; i++) { | |
459 | const char *strtab = (char *)sechdrs[strindex].sh_addr; | |
460 | @@ -2232,6 +2239,12 @@ static noinline struct module *load_modu | |
461 | mod->num_markers = | |
462 | sechdrs[markersindex].sh_size / sizeof(*mod->markers); | |
463 | #endif | |
464 | +#ifdef CONFIG_TRACEPOINTS | |
465 | + mod->tracepoints = (void *)sechdrs[tracepointsindex].sh_addr; | |
466 | + mod->num_tracepoints = | |
467 | + sechdrs[tracepointsindex].sh_size / sizeof(*mod->tracepoints); | |
468 | +#endif | |
469 | + | |
470 | ||
471 | /* Find duplicate symbols */ | |
472 | err = verify_export_symbols(mod); | |
473 | @@ -2250,11 +2263,16 @@ static noinline struct module *load_modu | |
474 | ||
475 | add_kallsyms(mod, sechdrs, symindex, strindex, secstrings); | |
476 | ||
477 | + if (!mod->taints) { | |
478 | #ifdef CONFIG_MARKERS | |
479 | - if (!mod->taints) | |
480 | marker_update_probe_range(mod->markers, | |
481 | mod->markers + mod->num_markers); | |
482 | #endif | |
483 | +#ifdef CONFIG_TRACEPOINTS | |
484 | + tracepoint_update_probe_range(mod->tracepoints, | |
485 | + mod->tracepoints + mod->num_tracepoints); | |
486 | +#endif | |
487 | + } | |
488 | dynamic_printk_setup(sechdrs, verboseindex); | |
489 | err = module_finalize(hdr, sechdrs, mod); | |
490 | if (err < 0) | |
491 | @@ -2842,3 +2860,50 @@ void module_update_markers(void) | |
492 | mutex_unlock(&module_mutex); | |
493 | } | |
494 | #endif | |
495 | + | |
496 | +#ifdef CONFIG_TRACEPOINTS | |
497 | +void module_update_tracepoints(void) | |
498 | +{ | |
499 | + struct module *mod; | |
500 | + | |
501 | + mutex_lock(&module_mutex); | |
502 | + list_for_each_entry(mod, &modules, list) | |
503 | + if (!mod->taints) | |
504 | + tracepoint_update_probe_range(mod->tracepoints, | |
505 | + mod->tracepoints + mod->num_tracepoints); | |
506 | + mutex_unlock(&module_mutex); | |
507 | +} | |
508 | + | |
509 | +/* | |
510 | + * Returns 0 if current not found. | |
511 | + * Returns 1 if current found. | |
512 | + */ | |
513 | +int module_get_iter_tracepoints(struct tracepoint_iter *iter) | |
514 | +{ | |
515 | + struct module *iter_mod; | |
516 | + int found = 0; | |
517 | + | |
518 | + mutex_lock(&module_mutex); | |
519 | + list_for_each_entry(iter_mod, &modules, list) { | |
520 | + if (!iter_mod->taints) { | |
521 | + /* | |
522 | + * Sorted module list | |
523 | + */ | |
524 | + if (iter_mod < iter->module) | |
525 | + continue; | |
526 | + else if (iter_mod > iter->module) | |
527 | + iter->tracepoint = NULL; | |
528 | + found = tracepoint_get_iter_range(&iter->tracepoint, | |
529 | + iter_mod->tracepoints, | |
530 | + iter_mod->tracepoints | |
531 | + + iter_mod->num_tracepoints); | |
532 | + if (found) { | |
533 | + iter->module = iter_mod; | |
534 | + break; | |
535 | + } | |
536 | + } | |
537 | + } | |
538 | + mutex_unlock(&module_mutex); | |
539 | + return found; | |
540 | +} | |
541 | +#endif | |
542 | --- /dev/null | |
543 | +++ b/kernel/tracepoint.c | |
544 | @@ -0,0 +1,476 @@ | |
545 | +/* | |
546 | + * Copyright (C) 2008 Mathieu Desnoyers | |
547 | + * | |
548 | + * This program is free software; you can redistribute it and/or modify | |
549 | + * it under the terms of the GNU General Public License as published by | |
550 | + * the Free Software Foundation; either version 2 of the License, or | |
551 | + * (at your option) any later version. | |
552 | + * | |
553 | + * This program is distributed in the hope that it will be useful, | |
554 | + * but WITHOUT ANY WARRANTY; without even the implied warranty of | |
555 | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
556 | + * GNU General Public License for more details. | |
557 | + * | |
558 | + * You should have received a copy of the GNU General Public License | |
559 | + * along with this program; if not, write to the Free Software | |
560 | + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. | |
561 | + */ | |
562 | +#include <linux/module.h> | |
563 | +#include <linux/mutex.h> | |
564 | +#include <linux/types.h> | |
565 | +#include <linux/jhash.h> | |
566 | +#include <linux/list.h> | |
567 | +#include <linux/rcupdate.h> | |
568 | +#include <linux/tracepoint.h> | |
569 | +#include <linux/err.h> | |
570 | +#include <linux/slab.h> | |
571 | + | |
572 | +extern struct tracepoint __start___tracepoints[]; | |
573 | +extern struct tracepoint __stop___tracepoints[]; | |
574 | + | |
575 | +/* Set to 1 to enable tracepoint debug output */ | |
576 | +static const int tracepoint_debug; | |
577 | + | |
578 | +/* | |
579 | + * tracepoints_mutex nests inside module_mutex. Tracepoints mutex protects the | |
580 | + * builtin and module tracepoints and the hash table. | |
581 | + */ | |
582 | +static DEFINE_MUTEX(tracepoints_mutex); | |
583 | + | |
584 | +/* | |
585 | + * Tracepoint hash table, containing the active tracepoints. | |
586 | + * Protected by tracepoints_mutex. | |
587 | + */ | |
588 | +#define TRACEPOINT_HASH_BITS 6 | |
589 | +#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS) | |
590 | + | |
591 | +/* | |
592 | + * Note about RCU : | |
593 | + * It is used to to delay the free of multiple probes array until a quiescent | |
594 | + * state is reached. | |
595 | + * Tracepoint entries modifications are protected by the tracepoints_mutex. | |
596 | + */ | |
597 | +struct tracepoint_entry { | |
598 | + struct hlist_node hlist; | |
599 | + void **funcs; | |
600 | + int refcount; /* Number of times armed. 0 if disarmed. */ | |
601 | + struct rcu_head rcu; | |
602 | + void *oldptr; | |
603 | + unsigned char rcu_pending:1; | |
604 | + char name[0]; | |
605 | +}; | |
606 | + | |
607 | +static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE]; | |
608 | + | |
609 | +static void free_old_closure(struct rcu_head *head) | |
610 | +{ | |
611 | + struct tracepoint_entry *entry = container_of(head, | |
612 | + struct tracepoint_entry, rcu); | |
613 | + kfree(entry->oldptr); | |
614 | + /* Make sure we free the data before setting the pending flag to 0 */ | |
615 | + smp_wmb(); | |
616 | + entry->rcu_pending = 0; | |
617 | +} | |
618 | + | |
619 | +static void tracepoint_entry_free_old(struct tracepoint_entry *entry, void *old) | |
620 | +{ | |
621 | + if (!old) | |
622 | + return; | |
623 | + entry->oldptr = old; | |
624 | + entry->rcu_pending = 1; | |
625 | + /* write rcu_pending before calling the RCU callback */ | |
626 | + smp_wmb(); | |
627 | +#ifdef CONFIG_PREEMPT_RCU | |
628 | + synchronize_sched(); /* Until we have the call_rcu_sched() */ | |
629 | +#endif | |
630 | + call_rcu(&entry->rcu, free_old_closure); | |
631 | +} | |
632 | + | |
633 | +static void debug_print_probes(struct tracepoint_entry *entry) | |
634 | +{ | |
635 | + int i; | |
636 | + | |
637 | + if (!tracepoint_debug) | |
638 | + return; | |
639 | + | |
640 | + for (i = 0; entry->funcs[i]; i++) | |
641 | + printk(KERN_DEBUG "Probe %d : %p\n", i, entry->funcs[i]); | |
642 | +} | |
643 | + | |
644 | +static void * | |
645 | +tracepoint_entry_add_probe(struct tracepoint_entry *entry, void *probe) | |
646 | +{ | |
647 | + int nr_probes = 0; | |
648 | + void **old, **new; | |
649 | + | |
650 | + WARN_ON(!probe); | |
651 | + | |
652 | + debug_print_probes(entry); | |
653 | + old = entry->funcs; | |
654 | + if (old) { | |
655 | + /* (N -> N+1), (N != 0, 1) probes */ | |
656 | + for (nr_probes = 0; old[nr_probes]; nr_probes++) | |
657 | + if (old[nr_probes] == probe) | |
658 | + return ERR_PTR(-EEXIST); | |
659 | + } | |
660 | + /* + 2 : one for new probe, one for NULL func */ | |
661 | + new = kzalloc((nr_probes + 2) * sizeof(void *), GFP_KERNEL); | |
662 | + if (new == NULL) | |
663 | + return ERR_PTR(-ENOMEM); | |
664 | + if (old) | |
665 | + memcpy(new, old, nr_probes * sizeof(void *)); | |
666 | + new[nr_probes] = probe; | |
667 | + entry->refcount = nr_probes + 1; | |
668 | + entry->funcs = new; | |
669 | + debug_print_probes(entry); | |
670 | + return old; | |
671 | +} | |
672 | + | |
673 | +static void * | |
674 | +tracepoint_entry_remove_probe(struct tracepoint_entry *entry, void *probe) | |
675 | +{ | |
676 | + int nr_probes = 0, nr_del = 0, i; | |
677 | + void **old, **new; | |
678 | + | |
679 | + old = entry->funcs; | |
680 | + | |
681 | + debug_print_probes(entry); | |
682 | + /* (N -> M), (N > 1, M >= 0) probes */ | |
683 | + for (nr_probes = 0; old[nr_probes]; nr_probes++) { | |
684 | + if ((!probe || old[nr_probes] == probe)) | |
685 | + nr_del++; | |
686 | + } | |
687 | + | |
688 | + if (nr_probes - nr_del == 0) { | |
689 | + /* N -> 0, (N > 1) */ | |
690 | + entry->funcs = NULL; | |
691 | + entry->refcount = 0; | |
692 | + debug_print_probes(entry); | |
693 | + return old; | |
694 | + } else { | |
695 | + int j = 0; | |
696 | + /* N -> M, (N > 1, M > 0) */ | |
697 | + /* + 1 for NULL */ | |
698 | + new = kzalloc((nr_probes - nr_del + 1) | |
699 | + * sizeof(void *), GFP_KERNEL); | |
700 | + if (new == NULL) | |
701 | + return ERR_PTR(-ENOMEM); | |
702 | + for (i = 0; old[i]; i++) | |
703 | + if ((probe && old[i] != probe)) | |
704 | + new[j++] = old[i]; | |
705 | + entry->refcount = nr_probes - nr_del; | |
706 | + entry->funcs = new; | |
707 | + } | |
708 | + debug_print_probes(entry); | |
709 | + return old; | |
710 | +} | |
711 | + | |
712 | +/* | |
713 | + * Get tracepoint if the tracepoint is present in the tracepoint hash table. | |
714 | + * Must be called with tracepoints_mutex held. | |
715 | + * Returns NULL if not present. | |
716 | + */ | |
717 | +static struct tracepoint_entry *get_tracepoint(const char *name) | |
718 | +{ | |
719 | + struct hlist_head *head; | |
720 | + struct hlist_node *node; | |
721 | + struct tracepoint_entry *e; | |
722 | + u32 hash = jhash(name, strlen(name), 0); | |
723 | + | |
724 | + head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)]; | |
725 | + hlist_for_each_entry(e, node, head, hlist) { | |
726 | + if (!strcmp(name, e->name)) | |
727 | + return e; | |
728 | + } | |
729 | + return NULL; | |
730 | +} | |
731 | + | |
732 | +/* | |
733 | + * Add the tracepoint to the tracepoint hash table. Must be called with | |
734 | + * tracepoints_mutex held. | |
735 | + */ | |
736 | +static struct tracepoint_entry *add_tracepoint(const char *name) | |
737 | +{ | |
738 | + struct hlist_head *head; | |
739 | + struct hlist_node *node; | |
740 | + struct tracepoint_entry *e; | |
741 | + size_t name_len = strlen(name) + 1; | |
742 | + u32 hash = jhash(name, name_len-1, 0); | |
743 | + | |
744 | + head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)]; | |
745 | + hlist_for_each_entry(e, node, head, hlist) { | |
746 | + if (!strcmp(name, e->name)) { | |
747 | + printk(KERN_NOTICE | |
748 | + "tracepoint %s busy\n", name); | |
749 | + return ERR_PTR(-EEXIST); /* Already there */ | |
750 | + } | |
751 | + } | |
752 | + /* | |
753 | + * Using kmalloc here to allocate a variable length element. Could | |
754 | + * cause some memory fragmentation if overused. | |
755 | + */ | |
756 | + e = kmalloc(sizeof(struct tracepoint_entry) + name_len, GFP_KERNEL); | |
757 | + if (!e) | |
758 | + return ERR_PTR(-ENOMEM); | |
759 | + memcpy(&e->name[0], name, name_len); | |
760 | + e->funcs = NULL; | |
761 | + e->refcount = 0; | |
762 | + e->rcu_pending = 0; | |
763 | + hlist_add_head(&e->hlist, head); | |
764 | + return e; | |
765 | +} | |
766 | + | |
767 | +/* | |
768 | + * Remove the tracepoint from the tracepoint hash table. Must be called with | |
769 | + * mutex_lock held. | |
770 | + */ | |
771 | +static int remove_tracepoint(const char *name) | |
772 | +{ | |
773 | + struct hlist_head *head; | |
774 | + struct hlist_node *node; | |
775 | + struct tracepoint_entry *e; | |
776 | + int found = 0; | |
777 | + size_t len = strlen(name) + 1; | |
778 | + u32 hash = jhash(name, len-1, 0); | |
779 | + | |
780 | + head = &tracepoint_table[hash & ((1 << TRACEPOINT_HASH_BITS)-1)]; | |
781 | + hlist_for_each_entry(e, node, head, hlist) { | |
782 | + if (!strcmp(name, e->name)) { | |
783 | + found = 1; | |
784 | + break; | |
785 | + } | |
786 | + } | |
787 | + if (!found) | |
788 | + return -ENOENT; | |
789 | + if (e->refcount) | |
790 | + return -EBUSY; | |
791 | + hlist_del(&e->hlist); | |
792 | + /* Make sure the call_rcu has been executed */ | |
793 | + if (e->rcu_pending) | |
794 | + rcu_barrier(); | |
795 | + kfree(e); | |
796 | + return 0; | |
797 | +} | |
798 | + | |
799 | +/* | |
800 | + * Sets the probe callback corresponding to one tracepoint. | |
801 | + */ | |
802 | +static void set_tracepoint(struct tracepoint_entry **entry, | |
803 | + struct tracepoint *elem, int active) | |
804 | +{ | |
805 | + WARN_ON(strcmp((*entry)->name, elem->name) != 0); | |
806 | + | |
807 | + /* | |
808 | + * rcu_assign_pointer has a smp_wmb() which makes sure that the new | |
809 | + * probe callbacks array is consistent before setting a pointer to it. | |
810 | + * This array is referenced by __DO_TRACE from | |
811 | + * include/linux/tracepoints.h. A matching smp_read_barrier_depends() | |
812 | + * is used. | |
813 | + */ | |
814 | + rcu_assign_pointer(elem->funcs, (*entry)->funcs); | |
815 | + elem->state = active; | |
816 | +} | |
817 | + | |
818 | +/* | |
819 | + * Disable a tracepoint and its probe callback. | |
820 | + * Note: only waiting an RCU period after setting elem->call to the empty | |
821 | + * function insures that the original callback is not used anymore. This insured | |
822 | + * by preempt_disable around the call site. | |
823 | + */ | |
824 | +static void disable_tracepoint(struct tracepoint *elem) | |
825 | +{ | |
826 | + elem->state = 0; | |
827 | +} | |
828 | + | |
829 | +/** | |
830 | + * tracepoint_update_probe_range - Update a probe range | |
831 | + * @begin: beginning of the range | |
832 | + * @end: end of the range | |
833 | + * | |
834 | + * Updates the probe callback corresponding to a range of tracepoints. | |
835 | + */ | |
836 | +void tracepoint_update_probe_range(struct tracepoint *begin, | |
837 | + struct tracepoint *end) | |
838 | +{ | |
839 | + struct tracepoint *iter; | |
840 | + struct tracepoint_entry *mark_entry; | |
841 | + | |
842 | + mutex_lock(&tracepoints_mutex); | |
843 | + for (iter = begin; iter < end; iter++) { | |
844 | + mark_entry = get_tracepoint(iter->name); | |
845 | + if (mark_entry) { | |
846 | + set_tracepoint(&mark_entry, iter, | |
847 | + !!mark_entry->refcount); | |
848 | + } else { | |
849 | + disable_tracepoint(iter); | |
850 | + } | |
851 | + } | |
852 | + mutex_unlock(&tracepoints_mutex); | |
853 | +} | |
854 | + | |
855 | +/* | |
856 | + * Update probes, removing the faulty probes. | |
857 | + */ | |
858 | +static void tracepoint_update_probes(void) | |
859 | +{ | |
860 | + /* Core kernel tracepoints */ | |
861 | + tracepoint_update_probe_range(__start___tracepoints, | |
862 | + __stop___tracepoints); | |
863 | + /* tracepoints in modules. */ | |
864 | + module_update_tracepoints(); | |
865 | +} | |
866 | + | |
867 | +/** | |
868 | + * tracepoint_probe_register - Connect a probe to a tracepoint | |
869 | + * @name: tracepoint name | |
870 | + * @probe: probe handler | |
871 | + * | |
872 | + * Returns 0 if ok, error value on error. | |
873 | + * The probe address must at least be aligned on the architecture pointer size. | |
874 | + */ | |
875 | +int tracepoint_probe_register(const char *name, void *probe) | |
876 | +{ | |
877 | + struct tracepoint_entry *entry; | |
878 | + int ret = 0; | |
879 | + void *old; | |
880 | + | |
881 | + mutex_lock(&tracepoints_mutex); | |
882 | + entry = get_tracepoint(name); | |
883 | + if (!entry) { | |
884 | + entry = add_tracepoint(name); | |
885 | + if (IS_ERR(entry)) { | |
886 | + ret = PTR_ERR(entry); | |
887 | + goto end; | |
888 | + } | |
889 | + } | |
890 | + /* | |
891 | + * If we detect that a call_rcu is pending for this tracepoint, | |
892 | + * make sure it's executed now. | |
893 | + */ | |
894 | + if (entry->rcu_pending) | |
895 | + rcu_barrier(); | |
896 | + old = tracepoint_entry_add_probe(entry, probe); | |
897 | + if (IS_ERR(old)) { | |
898 | + ret = PTR_ERR(old); | |
899 | + goto end; | |
900 | + } | |
901 | + mutex_unlock(&tracepoints_mutex); | |
902 | + tracepoint_update_probes(); /* may update entry */ | |
903 | + mutex_lock(&tracepoints_mutex); | |
904 | + entry = get_tracepoint(name); | |
905 | + WARN_ON(!entry); | |
906 | + tracepoint_entry_free_old(entry, old); | |
907 | +end: | |
908 | + mutex_unlock(&tracepoints_mutex); | |
909 | + return ret; | |
910 | +} | |
911 | +EXPORT_SYMBOL_GPL(tracepoint_probe_register); | |
912 | + | |
913 | +/** | |
914 | + * tracepoint_probe_unregister - Disconnect a probe from a tracepoint | |
915 | + * @name: tracepoint name | |
916 | + * @probe: probe function pointer | |
917 | + * | |
918 | + * We do not need to call a synchronize_sched to make sure the probes have | |
919 | + * finished running before doing a module unload, because the module unload | |
920 | + * itself uses stop_machine(), which insures that every preempt disabled section | |
921 | + * have finished. | |
922 | + */ | |
923 | +int tracepoint_probe_unregister(const char *name, void *probe) | |
924 | +{ | |
925 | + struct tracepoint_entry *entry; | |
926 | + void *old; | |
927 | + int ret = -ENOENT; | |
928 | + | |
929 | + mutex_lock(&tracepoints_mutex); | |
930 | + entry = get_tracepoint(name); | |
931 | + if (!entry) | |
932 | + goto end; | |
933 | + if (entry->rcu_pending) | |
934 | + rcu_barrier(); | |
935 | + old = tracepoint_entry_remove_probe(entry, probe); | |
936 | + mutex_unlock(&tracepoints_mutex); | |
937 | + tracepoint_update_probes(); /* may update entry */ | |
938 | + mutex_lock(&tracepoints_mutex); | |
939 | + entry = get_tracepoint(name); | |
940 | + if (!entry) | |
941 | + goto end; | |
942 | + tracepoint_entry_free_old(entry, old); | |
943 | + remove_tracepoint(name); /* Ignore busy error message */ | |
944 | + ret = 0; | |
945 | +end: | |
946 | + mutex_unlock(&tracepoints_mutex); | |
947 | + return ret; | |
948 | +} | |
949 | +EXPORT_SYMBOL_GPL(tracepoint_probe_unregister); | |
950 | + | |
951 | +/** | |
952 | + * tracepoint_get_iter_range - Get a next tracepoint iterator given a range. | |
953 | + * @tracepoint: current tracepoints (in), next tracepoint (out) | |
954 | + * @begin: beginning of the range | |
955 | + * @end: end of the range | |
956 | + * | |
957 | + * Returns whether a next tracepoint has been found (1) or not (0). | |
958 | + * Will return the first tracepoint in the range if the input tracepoint is | |
959 | + * NULL. | |
960 | + */ | |
961 | +int tracepoint_get_iter_range(struct tracepoint **tracepoint, | |
962 | + struct tracepoint *begin, struct tracepoint *end) | |
963 | +{ | |
964 | + if (!*tracepoint && begin != end) { | |
965 | + *tracepoint = begin; | |
966 | + return 1; | |
967 | + } | |
968 | + if (*tracepoint >= begin && *tracepoint < end) | |
969 | + return 1; | |
970 | + return 0; | |
971 | +} | |
972 | +EXPORT_SYMBOL_GPL(tracepoint_get_iter_range); | |
973 | + | |
974 | +static void tracepoint_get_iter(struct tracepoint_iter *iter) | |
975 | +{ | |
976 | + int found = 0; | |
977 | + | |
978 | + /* Core kernel tracepoints */ | |
979 | + if (!iter->module) { | |
980 | + found = tracepoint_get_iter_range(&iter->tracepoint, | |
981 | + __start___tracepoints, __stop___tracepoints); | |
982 | + if (found) | |
983 | + goto end; | |
984 | + } | |
985 | + /* tracepoints in modules. */ | |
986 | + found = module_get_iter_tracepoints(iter); | |
987 | +end: | |
988 | + if (!found) | |
989 | + tracepoint_iter_reset(iter); | |
990 | +} | |
991 | + | |
992 | +void tracepoint_iter_start(struct tracepoint_iter *iter) | |
993 | +{ | |
994 | + tracepoint_get_iter(iter); | |
995 | +} | |
996 | +EXPORT_SYMBOL_GPL(tracepoint_iter_start); | |
997 | + | |
998 | +void tracepoint_iter_next(struct tracepoint_iter *iter) | |
999 | +{ | |
1000 | + iter->tracepoint++; | |
1001 | + /* | |
1002 | + * iter->tracepoint may be invalid because we blindly incremented it. | |
1003 | + * Make sure it is valid by marshalling on the tracepoints, getting the | |
1004 | + * tracepoints from following modules if necessary. | |
1005 | + */ | |
1006 | + tracepoint_get_iter(iter); | |
1007 | +} | |
1008 | +EXPORT_SYMBOL_GPL(tracepoint_iter_next); | |
1009 | + | |
1010 | +void tracepoint_iter_stop(struct tracepoint_iter *iter) | |
1011 | +{ | |
1012 | +} | |
1013 | +EXPORT_SYMBOL_GPL(tracepoint_iter_stop); | |
1014 | + | |
1015 | +void tracepoint_iter_reset(struct tracepoint_iter *iter) | |
1016 | +{ | |
1017 | + iter->module = NULL; | |
1018 | + iter->tracepoint = NULL; | |
1019 | +} | |
1020 | +EXPORT_SYMBOL_GPL(tracepoint_iter_reset); |