From: Julian Seward Date: Mon, 14 May 2007 14:06:30 +0000 (+0000) Subject: Merge r6734 (Callgrind: improve documentation) X-Git-Tag: svn/VALGRIND_3_2_3~10 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=decf2f7f579626d712aefa10ff29e2c0b7a68188;p=thirdparty%2Fvalgrind.git Merge r6734 (Callgrind: improve documentation) git-svn-id: svn://svn.valgrind.org/valgrind/branches/VALGRIND_3_2_BRANCH@6740 --- diff --git a/callgrind/docs/cl-manual.xml b/callgrind/docs/cl-manual.xml index 3add8078cf..a19c702c72 100644 --- a/callgrind/docs/cl-manual.xml +++ b/callgrind/docs/cl-manual.xml @@ -10,11 +10,13 @@ Overview -Callgrind is a Valgrind tool for profiling programs. -The collected data consists of -the number of instructions executed on a run, their relationship +Callgrind is a Valgrind tool for profiling programs +with the ability to construct a call graph from the execution. +By default, the collected data consists of +the number of instructions executed, their attribution to source lines, and -call relationship among functions together with call counts. +call relationship among functions together with number of +actually executed calls. Optionally, a cache simulator (similar to cachegrind) can produce further information about the memory access behavior of the application. @@ -27,7 +29,7 @@ of the profiling, two command line tools are provided: callgrind_annotate This command reads in the profile data, and prints a - sorted lists of functions, optionally with annotation. + sorted lists of functions, optionally with source annotation. @@ -58,86 +60,47 @@ of the profiling, two command line tools are provided: command line or use the supplied script callgrind. + + Functionality + +Cachegrind provides a flat profile: event counts (reads, misses etc.) +attributed to functions exactly represent events which happened while the +function itself was running, which also is called self +or exclusive cost. In addition, Callgrind further +attributes call sites inside functions with event counts for events which +happened while the call was active, ie. while code was executed which actually +was called from the given call site. Adding these call costs to the self cost of +a function gives the so called inclusive cost. +As an example, inclusive cost of main() should +be almost 100 percent (apart from any cost spent in startup before main, such as +initialization of the run time linker or construction of global C++ objects). + + +Together with the call graph, this allows you to see the call chains starting +from main(), inside which most of the +events were happening. This especially is useful for functions called from +multiple call sites, and where any optimization makes sense only by changing +code in the caller (e.g. by reducing the call count). + Callgrind's cache simulation is based on the -Cachegrind tool of the -Valgrind package. Read +Cachegrind tool. Read Cachegrind's documentation first; this page describes the features supported in addition to Cachegrind's features. - - - - -Purpose - - - - Profiling as part of Application Development - - With application development, a common step is - to improve runtime performance. To not waste time on - optimizing functions which are rarely used, one needs to know - in which parts of the program most of the time is spent. - - This is done with a technique called profiling. The program - is run under control of a profiling tool, which gives the time - distribution of executed functions in the run. After examination - of the program's profile, it should be clear if and where optimization - is useful. Afterwards, one should verify any runtime changes by another - profile run. - - - - - - Profiling Tools - - Most widely known is the GCC profiling tool GProf: - one needs to compile an application with the compiler option - -pg. Running the program generates - a file gmon.out, which can be - transformed into human readable form with the command line tool - gprof. A disadvantage here is the - the need to recompile everything, and also the need to statically link the - executable. - - Another profiling tool is Cachegrind, part - of Valgrind. It uses the processor - emulation of Valgrind to run the executable, and catches all memory - accesses, which are used to drive a cache simulator. - The program does not need to be - recompiled, it can use shared libraries and plugins, and the profile - measurement doesn't influence the memory access behaviour. - The trace includes - the number of instruction/data memory accesses and 1st/2nd level - cache misses, and relates it to source lines and functions of the - run program. A disadvantage is the slowdown involved in the - processor emulation, around 50 times slower. - - Cachegrind can only deliver a flat profile. There is no call - relationship among the functions of an application stored. Thus, - inclusive costs, i.e. costs of a function including the cost of all - functions called from there, cannot be calculated. Callgrind extends - Cachegrind by including call relationship and exact event counts - spent while doing a call. - - Because Callgrind (and Cachegrind) is based on simulation, the - slowdown due to processing the synthetic runtime events does not - influence the results. See for more - details on the possibilities. +Callgrinds ability to trace function call varies with the ISA of the +platform it is run on. Its usage was specially tailored for x86 and amd64, +and unfortunately, it currently happens to show quite bad call/return detection +in PPC32/64 code (this is because there are only jump/branch instructions +in the PPC ISA, and Callgrind has to rely on heuristics). - - + + Basic Usage - -Usage - - - Basics + As with Cachegrind, you probably want to compile with debugging info + (the -g flag), but with optimization turned on. To start a profile run for a program, execute: callgrind [callgrind options] your-program [program options] @@ -145,7 +108,7 @@ Cachegrind's features. While the simulation is running, you can observe execution with callgrind_control -b - This will print out a current backtrace. To annotate the backtrace with + This will print out the current backtrace. To annotate the backtrace with event counts, run callgrind_control -e -b @@ -153,26 +116,73 @@ Cachegrind's features. After program termination, a profile data file named callgrind.out.pid is generated with pid being the process ID - of the execution of this profile run. - - The data file contains information about the calls made in the + of the execution of this profile run. + The data file contains information about the calls made in the program among the functions executed, together with events of type Instruction Read Accesses (Ir). + To generate a function-by-function summary from the profile + data file, use + callgrind_annotate [options] callgrind.out.pid + This summary is similar to the output you get from a Cachegrind + run with cg_annotate: the list + of functions is ordered by exclusive cost of functions, which also + are the ones that are shown. + Important for the additional features of Callgrind are + the following two options: + + + + : Instead of using + exclusive cost of functions as sorting order, use and show + inclusive cost. + + + + : Interleaved into the + ordered list of function, show the callers and the callees + of each function. In these lines, which represents executed + calls, the cost gives the number of events spent in the call. + Indented, above each given function, there is the list of callers, + and below, the list of callees. The sum of events in calls to + a given function (caller lines), as well as the sum of events in + calls from the function (callee lines) together with the self + cost, gives the total inclusive cost of the function. + + + + Use to get annotated source code + for all relevant functions for which the source can be found. In + addition to source annotation as produced by + cg_annotate, you will see the + annotated call sites with call counts. For all other options, look + up the manual for cg_annotate. + + + For better call graph browsing experience, it is highly recommended + to use KCachegrind. If your code happens + to spent relevant fractions of cost in cycles (sets + of functions calling each other in a recursive manner), you have to + use KCachegrind, as callgrind_annotate + currently does not do any cycle detection, which is important to get correct + results in this case. + If you are additionally interested in measuring the - cache behaviour of your + cache behavior of your program, use Callgrind with the option - This will further slow down the run approximately by a factor of 2. + However, expect a further slow down approximately by a factor of 2. If the program section you want to profile is somewhere in the middle of the run, it is beneficial to fast forward to this section without any - profiling at all, and switch it on later. This is achieved by using + profiling at all, and switch profiling on later. This is achieved by using and interactively use callgrind_control -i on before the - interesting code section is about to be executed. + interesting code section is about to be executed. To exactly specify + the code position where profiling should start, use the client request + CALLGRIND_START_INSTRUMENTATION. If you want to be able to see assembler annotation, specify . This will produce @@ -185,12 +195,16 @@ Cachegrind's features. + + + +Advanced Usage Multiple profiling dumps from one program run - Often, you aren't interested in time characteristics of a full + Often, you are not interested in characteristics of a full program run, but only of a small part of it (e.g. execution of one algorithm). If there are multiple algorithms or one algorithm running with different input data, it's even useful to get different