From: Nicholas Nethercote Date: Thu, 6 Aug 2009 02:30:26 +0000 (+0000) Subject: Clean up Callgrind docs. Josef, I added brief entries for --collect-systime, X-Git-Tag: svn/VALGRIND_3_5_0~122 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=d570c487939e4fcfddd92fe18937597453cfe267;p=thirdparty%2Fvalgrind.git Clean up Callgrind docs. Josef, I added brief entries for --collect-systime, --cacheuse and --simulate-wb but you might like to expand them. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@10728 --- diff --git a/callgrind/clo.c b/callgrind/clo.c index 881f3ea118..a6aaba4ea0 100644 --- a/callgrind/clo.c +++ b/callgrind/clo.c @@ -580,10 +580,10 @@ void CLG_(print_usage)(void) "\n cost entity separation options:\n" " --separate-threads=no|yes Separate data per thread [no]\n" " --separate-callers= Separate functions by call chain length [0]\n" -" --separate-recs= Separate function recursions upto level [2]\n" -" --skip-plt=no|yes Ignore calls to/from PLT sections? [yes]\n" -" --separate-recs= Separate recursions for function \n" " --separate-callers= Separate callers for function \n" +" --separate-recs= Separate function recursions up to level [2]\n" +" --separate-recs= Separate recursions for function \n" +" --skip-plt=no|yes Ignore calls to/from PLT sections? [yes]\n" " --skip-direct-rec=no|yes Ignore direct recursions? [yes]\n" " --fn-skip= Ignore calls to/from function?\n" #if CLG_EXPERIMENTAL diff --git a/callgrind/docs/cl-manual.xml b/callgrind/docs/cl-manual.xml index 0ff92cfe1b..a247c0c2e2 100644 --- a/callgrind/docs/cl-manual.xml +++ b/callgrind/docs/cl-manual.xml @@ -20,7 +20,7 @@ By default, the collected data consists of the number of instructions executed, their relationship to source lines, the caller/callee relationship between functions, and the numbers of such calls. -Optionally, a cache simulator (similar to cachegrind) can produce +Optionally, a cache simulator (similar to Cachegrind) can produce further information about the memory access behavior of the application. @@ -60,10 +60,6 @@ of the profiling, two command line tools are provided: -To use Callgrind, you must specify - on the Valgrind -command line. - Functionality @@ -74,24 +70,24 @@ called self or exclusive attribution. Callgrind extends this functionality by propagating costs -across function call boundaries. If function foo calls -bar, the costs from bar are added into -foo's costs. When applied to the program as a whole, +across function call boundaries. If function foo calls +bar, the costs from bar are added into +foo's costs. When applied to the program as a whole, this builds up a picture of so called inclusive costs, that is, where the cost of each function includes the costs of all functions it called, directly or indirectly. As an example, the inclusive cost of -main should be almost 100 percent +main should be almost 100 percent of the total program cost. Because of costs arising before -main is run, such as +main is run, such as initialization of the run time linker and construction of global C++ -objects, the inclusive cost of main +objects, the inclusive cost of main is not exactly 100 percent of the total program cost. Together with the call graph, this allows you to find the specific call chains starting from -main in which the majority of the +main in which the majority of the program's costs occur. Caller/callee cost attribution is also useful for profiling functions called from multiple call sites, and where optimization opportunities depend on changing code in the callers, in @@ -115,13 +111,13 @@ on heuristics to detect calls and returns. Basic Usage As with Cachegrind, you probably want to compile with debugging info - (the -g flag), but with optimization turned on. + (the flag) and with optimization turned on. To start a profile run for a program, execute: valgrind --tool=callgrind [callgrind options] your-program [program options] - While the simulation is running, you can observe execution with + While the simulation is running, you can observe execution with: callgrind_control -b This will print out the current backtrace. To annotate the backtrace with event counts, run @@ -133,14 +129,14 @@ on heuristics to detect calls and returns. is generated, where pid is the process ID of the program being profiled. The data file contains information about the calls made in the - program among the functions executed, together with events of type - Instruction Read Accesses (Ir). + program among the functions executed, together with + Instruction Read (Ir) event counts. To generate a function-by-function summary from the profile data file, use callgrind_annotate [options] callgrind.out.<pid> This summary is similar to the output you get from a Cachegrind - run with cg_annotate: the list + run with cg_annotate: the list of functions is ordered by exclusive cost of functions, which also are the ones that are shown. Important for the additional features of Callgrind are @@ -193,10 +189,10 @@ on heuristics to detect calls and returns. If the program section you want to profile is somewhere in the middle of the run, it is beneficial to fast forward to this section without any - profiling, and then switch on profiling. This is achieved by using + profiling, and then enable profiling. This is achieved by using the command line option - and running, in a shell, + and running, in a shell: callgrind_control -i on just before the interesting code section is executed. To exactly specify the code position where profiling should start, use the client request @@ -208,7 +204,7 @@ on heuristics to detect calls and returns. data can only be viewed with KCachegrind. For assembly annotation, it also is interesting to see more details of the control flow inside of functions, - ie. (conditional) jumps. This will be collected by further specifying + i.e. (conditional) jumps. This will be collected by further specifying . @@ -287,7 +283,7 @@ callgrind.out.pid.part-threa To zero cost counters before entering a function, use . You can specify these options multiple times for different - functions. Function specifications support wildcards: eg. use + functions. Function specifications support wildcards: e.g. use to generate dumps before entering any function starting with foo. @@ -323,17 +319,17 @@ callgrind.out.pid.part-threa For aggregating events (function enter/leave, instruction execution, memory access) into event numbers, first, the events must be recognizable by Callgrind, and second, - the collection state must be switched on. + the collection state must be enabled. Event collection is only possible if instrumentation - for program code is switched on. This is the default, but for faster + for program code is enabled. This is the default, but for faster execution (identical to valgrind --tool=none), - it can be switched off until the program reaches a state in which + it can be disabled until the program reaches a state in which you want to start collecting profiling data. Callgrind can start without instrumentation by specifying option . - Instrumentation can be switched on interactively - with callgrind_control -i on + Instrumentation can be enabled interactively + with: callgrind_control -i on and off by specifying "off" instead of "on". Furthermore, instrumentation state can be programatically changed with the macros ; @@ -353,15 +349,15 @@ callgrind.out.pid.part-threa inside of the given function will be collected. Recursive calls of the given function do not trigger any action. - It is important to note that with instrumentation switched off, the + It is important to note that with instrumentation disabled, the cache simulator cannot see any memory access events, and thus, any simulated cache state will be frozen and wrong without instrumentation. Therefore, to get useful cache events (hits/misses) after switching on instrumentation, the cache first must warm up, probably leading to many cold misses which would not have happened in reality. If you do not want to see these, - start event collection a few million instructions after you have switched - on instrumentation. + start event collection a few million instructions after you have enabled + instrumentation. @@ -391,7 +387,7 @@ callgrind.out.pid.part-threa Cycles are not bad in itself, but tend to make performance analysis of your code harder. This is because inclusive costs for calls inside of a cycle are meaningless. The definition of - inclusive cost, ie. self cost of a function plus inclusive cost + inclusive cost, i.e. self cost of a function plus inclusive cost of its callees, needs a topological order among functions. For cycles, this does not hold true: callees of a function in a cycle include the function itself. Therefore, KCachegrind does cycle detection @@ -401,10 +397,10 @@ callgrind.out.pid.part-threa Now, when a program exposes really big cycles (as is true for some GUI code, or in general code using event or callback based - programming style), you loose the nice property to let you pinpoint + programming style), you lose the nice property to let you pinpoint the bottlenecks by following call chains from - main(), guided via - inclusive cost. In addition, KCachegrind looses its ability to show + main, guided via + inclusive cost. In addition, KCachegrind loses its ability to show interesting parts of the call graph, as it uses inclusive costs to cut off uninteresting areas. @@ -477,7 +473,7 @@ callgrind.out.pid.part-threa counter values in the child, the client request ; can be inserted into code to be executed by the child, directly after - fork(). + fork. However, you will have to make sure that the output file format string (controlled by ) does contain @@ -539,27 +535,28 @@ These options influence the name and format of the profile data files. - + - + This specifies that event counting should be performed at - per-instruction granularity. - This allows for assembly code - annotation. Currently the results can only be - displayed by KCachegrind. + source line granularity. This allows source annotation for sources + which are compiled with debug information + (). - + - + This specifies that event counting should be performed at - source line granularity. This allows source - annotation for sources which are compiled with debug information ("-g"). + per-instruction granularity. + This allows for assembly code + annotation. Currently the results can only be + displayed by KCachegrind. @@ -584,7 +581,7 @@ These options influence the name and format of the profile data files. This option influences the output format of the profile data. It specifies whether numerical positions are always specified as absolute values or are allowed to be relative to previous numbers. - This shrinks the file size, + This shrinks the file size. @@ -593,9 +590,9 @@ These options influence the name and format of the profile data files. - When multiple profile data parts are to be generated, these - parts are appended to the same output file if this option is set to - "yes". Not recommended. + When enabled, when multiple profile data parts are to be + generated these parts are appended to the same output file. + Not recommended. @@ -619,7 +616,7 @@ be executed. For interactive control use - Dump profile data every <count> basic blocks. + Dump profile data every basic blocks. Whether a dump is needed is only checked when Valgrind's internal scheduler is run. Therefore, the minimum setting useful is about 100000. The count is a 64-bit value to make long dump periods possible. @@ -632,7 +629,7 @@ be executed. For interactive control use - Dump when entering <function> + Dump when entering . @@ -641,7 +638,7 @@ be executed. For interactive control use - Zero all costs when entering <function> + Zero all costs when entering . @@ -650,7 +647,7 @@ be executed. For interactive control use - Dump when leaving <function> + Dump when leaving . @@ -678,14 +675,14 @@ Also see . Callgrind will not be able to collect any information, including calls, but it will have at most a slowdown of around 4, which is the minimum Valgrind - overhead. Instrumentation can be interactively switched on via + overhead. Instrumentation can be interactively enabled via callgrind_control -i on. Note that the resulting call graph will most probably not - contain main, but will contain all the - functions executed after instrumentation was switched on. - Instrumentation can also programatically switched on/off. See the + contain main, but will contain all the + functions executed after instrumentation was enabled. + Instrumentation can also programatically enabled/disabled. See the Callgrind include file - <callgrind.h> for the macro + callgrind.h for the macro you have to use in your source code. For cache simulation, results will be less accurate when switching on instrumentation later in the program run, as the simulator starts @@ -699,7 +696,7 @@ Also see . - Specify whether event collection is switched on at beginning + Specify whether event collection is enabled at beginning of the profile run. To only look at parts of your program, you have two possibilities: @@ -720,9 +717,9 @@ Also see . dumps is not practical here. Collection state can be toggled at entry and exit of a given function with the - option . If you use this flag, - collection - state should be switched off at the beginning. Note that the + option . If you + use this flag, collection + state should be disabled at the beginning. Note that the specification of implicitly sets . @@ -737,7 +734,7 @@ Also see . - Toggle collection on entry/exit of <function>. + Toggle collection on entry/exit of . @@ -753,6 +750,16 @@ Also see . + + + + + + This specifies whether information for system call times + should be collected. + + + @@ -781,23 +788,43 @@ Also see . + + + + + + Separate contexts by at most <callers> functions in the + call chain. See . + + + + + + + + + Separate callers for . + See . + + + - Separate function recursions by at most <level> levels. + Separate function recursions by at most levels. See . - + - + - Separate contexts by at most <callers> functions in the - call chain. See . + Separate recursions for . + See . @@ -810,6 +837,15 @@ Also see . + + + + + + Ignore direct recursions. + + + @@ -827,9 +863,13 @@ Also see . + @@ -880,6 +901,15 @@ Also see . + + + + + + Specify whether write-back events should be counted. + + + @@ -895,6 +925,45 @@ Also see . + + + + + + Specify whether cache block use should be collected. + + + + + + + + + + Specify the size, associativity and line size of the level 1 + instruction cache. + + + + + + + + + Specify the size, associativity and line size of the level 1 + data cache. + + + + + + + + + Specify the size, associativity and line size of the level 2 + cache. + + @@ -904,16 +973,9 @@ Also see . Callgrind specific client requests -In Valgrind terminology, a client request is a C macro which -can be inserted into your code to request specific functionality when -run under Valgrind. For this, special instruction patterns resulting -in NOPs are used, but which can be detected by Valgrind. - -Callgrind provides the following specific client requests. -To use them, add the line -]]> -into your code for the macro definitions. -. +Callgrind provides the following specific client requests in +callgrind.h. See that file for the exact details of +their arguments. @@ -933,8 +995,9 @@ into your code for the macro definitions. CALLGRIND_DUMP_STATS_AT(string) - Same as CALLGRIND_DUMP_STATS, but allows to specify a string - to be able to distinguish profile dumps. + Same as CALLGRIND_DUMP_STATS, + but allows to specify a string to be able to distinguish profile + dumps. @@ -954,8 +1017,8 @@ into your code for the macro definitions. Toggle the collection state. This allows to ignore events with regard to profile counters. See also options - and - . + and + . @@ -964,11 +1027,11 @@ into your code for the macro definitions. CALLGRIND_START_INSTRUMENTATION - Start full Callgrind instrumentation if not already switched on. + Start full Callgrind instrumentation if not already enabled. When cache simulation is done, this will flush the simulated cache and lead to an artifical cache warmup phase afterwards with - cache misses which would not have happened in reality. - See also option . + cache misses which would not have happened in reality. See also + option . @@ -977,13 +1040,14 @@ into your code for the macro definitions. CALLGRIND_STOP_INSTRUMENTATION - Stop full Callgrind instrumentation if not already switched off. + Stop full Callgrind instrumentation if not already disabled. This flushes Valgrinds translation cache, and does no additional instrumentation afterwards: it effectivly will run at the same - speed as the "none" tool, ie. at minimal slowdown. Use this to + speed as Nulgrind, i.e. at minimal slowdown. Use this to speed up the Callgrind run for uninteresting code parts. Use - to switch on instrumentation again. - See also option . + to + enable instrumentation again. See also option + .