From: Josef Weidendorfer Date: Mon, 20 Mar 2006 10:29:30 +0000 (+0000) Subject: Callgrind merge: documentation X-Git-Tag: svn/VALGRIND_3_2_0~178 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=1cdac21bd90854116726b80868c92490a796956a;p=thirdparty%2Fvalgrind.git Callgrind merge: documentation git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5781 --- diff --git a/callgrind/docs/Makefile.am b/callgrind/docs/Makefile.am index d539a6ecd5..540f9313a7 100644 --- a/callgrind/docs/Makefile.am +++ b/callgrind/docs/Makefile.am @@ -1 +1,8 @@ -EXTRA_DIST = +EXTRA_DIST = \ + cl-entities.xml \ + cl-manual.xml \ + cl-format.xml \ + index.xml \ + man-annotate.xml \ + man-control.xml \ + man-callgrind.xml diff --git a/callgrind/docs/cl-entities.xml b/callgrind/docs/cl-entities.xml new file mode 100644 index 0000000000..727962fa67 --- /dev/null +++ b/callgrind/docs/cl-entities.xml @@ -0,0 +1,22 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/callgrind/docs/cl-format.xml b/callgrind/docs/cl-format.xml new file mode 100644 index 0000000000..6777551d10 --- /dev/null +++ b/callgrind/docs/cl-format.xml @@ -0,0 +1,551 @@ + + %cl-entities; ]> + + +Callgrind Format Specification + +This chapter describes the Callgrind Profile Format, Version 1. + +A synonymous name is "Calltree Profile Format". These names actually mean +the same since Callgrind was previously named Calltree. + +The format description is meant for the user to be able to understand the +file contents; but more important, it is given for authors of measurement or +visualization tools to be able to write and read this format. + + +Overview + +The profile data format is ASCII based. +It is written by Callgrind, and it is upwards compatible +to the format used by Cachegrind (ie. Cachegrind uses a subset). It can +be read by callgrind_annotate and KCachegrind. + +This chapter gives on overview of format features and examples. +For detailed syntax, look at the format reference. + + +Basic Structure + +Each file has a header part of an arbitrary number of lines of the +format "key: value". The lines with key "positions" and "events" define +the meaning of cost lines in the second part of the file: the value of +"positions" is a list of subpositions, and the value of "events" is a list +of event type names. Cost lines consist of subpositions followed by 64-bit +counters for the events, in the order specified by the "positions" and "events" +header line. + +The "events" header line is always required in contrast to the optional +line for "positions", which defaults to "line", i.e. a line number of some +source file. In addition, the second part of the file contains position +specifications of the form "spec=name". "spec" can be e.g. "fn" for a +function name or "fl" for a file name. Cost lines are always related to +the function/file specifications given directly before. + + + + +Simple Example + + +events: Cycles Instructions Flops +fl=file.f +fn=main +15 90 14 2 +16 20 12 + +The above example gives profile information for event types "Cycles", +"Instructions", and "Flops". Thus, cost lines give the number of CPU cycles +passed by, number of executed instructions, and number of floating point +operations executed while running code corresponding to some source +position. As there is no line specifying the value of "positions", it defaults +to "line", which means that the first number of a cost line is always a line +number. + +Thus, the first cost line specifies that in line 15 of source file +"file.f" there is code belonging to function "main". While running, 90 CPU +cycles passed by, and 2 of the 14 instructions executed were floating point +operations. Similarily, the next line specifies that there were 12 instructions +executed in the context of function "main" which can be related to line 16 in +file "file.f", taking 20 CPU cycles. If a cost line specifies less event counts +than given in the "events" line, the rest is assumed to be zero. I.e., there +was no floating point instruction executed relating to line 16. + +Note that regular cost lines always give self (also called exclusive) +cost of code at a given position. If you specify multiple cost lines for the +same position, these will be summed up. On the other hand, in the example above +there is no specification of how many times function "main" actually was +called: profile data only contains sums. + + + + + +Associations + +The most important extension to the original format of Cachegrind is the +ability to specify call relationship among functions. More generally, you +specify assoziations among positions. For this, the second part of the +file also can contain assoziation specifications. These look similar to +position specifications, but consist of 2 lines. For calls, the format +looks like + + calls=(Call Count) (Destination position) + (Source position) (Inclusive cost of call) + + +The destination only specifies subpositions like line number. Therefore, +to be able to specify a call to another function in another source file, you +have to precede the above lines with a "cfn=" specification for the name of the +called function, and a "cfl=" specification if the function is in another +source file. The 2nd line looks like a regular cost line with the difference +that inclusive cost spent inside of the function call has to be specified. + +Other assoziations which or for example (conditional) jumps. See the +reference below for details. + + + + + +Extended Example + +The following example shows 3 functions, "main", "func1", and +"func2". Function "main" calls "func1" once and "func2" 3 times. "func1" calls +"func2" 2 times. +events: Instructions + +fl=file1.c +fn=main +16 20 +cfn=func1 +calls=1 50 +16 400 +cfl=file2.c +cfn=func2 +calls=3 20 +16 400 + +fn=func1 +51 100 +cfl=file2.c +cfn=func2 +calls=2 20 +51 300 + +fl=file2.c +fn=func2 +20 700 + +One can see that in "main" only code from line 16 is executed where also +the other functions are called. Inclusive cost of "main" is 420, which is the +sum of self cost 20 and costs spent in the calls. + +Function "func1" is located in "file1.c", the same as "main". Therefore, +a "cfl=" specification for the call to "func1" is not needed. The function +"func1" only consists of code at line 51 of "file1.c", where "func2" is called. + + + + + +Name Compression + +With the introduction of association specifications like calls it is +needed to specify the same function or same file name multiple times. As +absolute filenames or symbol names in C++ can be quite long, it is advantageous +to be able to specify integer IDs for position specifications. + +To support name compression, a position specification can be not only of +the format "spec=name", but also "spec=(ID) name" to specify a mapping of an +integer ID to a name, and "spec=(ID)" to reference a previously defined ID +mapping. There is a separate ID mapping for each position specification, +i.e. you can use ID 1 for both a file name and a symbol name. + +With string compression, the example from 1.4 looks like this: +events: Instructions + +fl=(1) file1.c +fn=(1) main +16 20 +cfn=(2) func1 +calls=1 50 +16 400 +cfl=(2) file2.c +cfn=(3) func2 +calls=3 20 +16 400 + +fn=(2) +51 100 +cfl=(2) +cfn=(3) +calls=2 20 +51 300 + +fl=(2) +fn=(3) +20 700 + +As position specifications carry no information themself, but only change +the meaning of subsequent cost lines or associations, they can appear +everywhere in the file without any negative consequence. Especially, you can +define name compression mappings directly after the header, and before any cost +lines. Thus, the above example can also be written as +events: Instructions + +# define file ID mapping +fl=(1) file1.c +fl=(2) file2.c +# define function ID mapping +fn=(1) main +fn=(2) func1 +fn=(3) func2 + +fl=(1) +fn=(1) +16 20 +... + + + + + +Subposition Compression + +If a Calltree data file should hold costs for each assembler instruction +of a program, you specify subpostion "instr" in the "positions:" header line, +and each cost line has to include the address of some instruction. Addresses +are allowed to have a size of 64bit to support 64bit architectures. This +motivates for subposition compression: instead of every cost line starting with +a 16 character long address, one is allowed to specify relative subpositions. + +A relative subposition always is based on the corresponding subposition +of the last cost line, and starts with a "+" to specify a positive difference, +a "-" to specify a negative difference, or consists of "*" to specify the same +subposition. Assume the following example (subpositions can always be specified +as hexadecimal numbers, beginning with "0x"): +positions: instr line +events: ticks + +fn=func +0x80001234 90 1 +0x80001237 90 5 +0x80001238 91 6 + +With subposition compression, this looks like +positions: instr line +events: ticks + +fn=func +0x80001234 90 1 ++3 * 5 ++1 +1 6 + +Remark: For assembler annotation to work, instruction addresses have to +be corrected to correspond to addresses found in the original binary. I.e. for +relocatable shared objects, often a load offset has to be subtracted. + + + + + +Miscellaneous + + +Cost Summary Information + +For the visualization to be able to show cost percentage, a sum of the +cost of the full run has to be known. Usually, it is assumed that this is the +sum of all cost lines in a file. But sometimes, this is not correct. Thus, you +can specify a "summary:" line in the header giving the full cost for the +profile run. This has another effect: a import filter can show a progress bar +while loading a large data file if he knows to cost sum in advance. + + + + +Long Names for Event Types and inherited Types + +Event types for cost lines are specified in the "events:" line with an +abbreviated name. For visualization, it makes sense to be able to specify some +longer, more descriptive name. For an event type "Ir" which means "Instruction +Fetches", this can be specified the header line +event: Ir : Instruction Fetches +events: Ir Dr + +In this example, "Dr" itself has no long name assoziated. The order of +"event:" lines and the "events:" line is of no importance. Additionally, +inherited event types can be introduced for which no raw data is available, but +which are calculated from given types. Suppose the last example, you could add +event: Sum = Ir + Dr +to specify an additional event type "Sum", which is calculated by adding costs +for "Ir and "Dr". + + + + + + + + +Reference + + +Grammar + + +ProfileDataFile := FormatVersion? Creator? PartData* +FormatVersion := "version:" Space* Number "\n" +Creator := "creator:" NoNewLineChar* "\n" +PartData := (HeaderLine "\n")+ (BodyLine "\n")+ +HeaderLine := (empty line) + | ('#' NoNewLineChar*) + | PartDetail + | Description + | EventSpecification + | CostLineDef +PartDetail := TargetCommand | TargetID +TargetCommand := "cmd:" Space* NoNewLineChar* +TargetID := ("pid"|"thread"|"part") ":" Space* Number +Description := "desc:" Space* Name Space* ":" NoNewLineChar* +EventSpecification := "event:" Space* Name InheritedDef? LongNameDef? +InheritedDef := "=" InheritedExpr +InheritedExpr := Name + | Number Space* ("*" Space*)? Name + | InheritedExpr Space* "+" Space* InheritedExpr +LongNameDef := ":" NoNewLineChar* +CostLineDef := "events:" Space* Name (Space+ Name)* + | "positions:" "instr"? (Space+ "line")? +BodyLine := (empty line) + | ('#' NoNewLineChar*) + | CostLine + | PositionSpecification + | AssoziationSpecification +CostLine := SubPositionList Costs? +SubPositionList := (SubPosition+ Space+)+ +SubPosition := Number | "+" Number | "-" Number | "*" +Costs := (Number Space+)+ +PositionSpecification := Position "=" Space* PositionName +Position := CostPosition | CalledPosition +CostPosition := "ob" | "fl" | "fi" | "fe" | "fn" +CalledPosition := " "cob" | "cfl" | "cfn" +PositionName := ( "(" Number ")" )? (Space* NoNewLineChar* )? +AssoziationSpecification := CallSpezification + | JumpSpecification +CallSpecification := CallLine "\n" CostLine +CallLine := "calls=" Space* Number Space+ SubPositionList +JumpSpecification := ... +Space := " " | "\t" +Number := HexNumber | (Digit)+ +Digit := "0" | ... | "9" +HexNumber := "0x" (Digit | HexChar)+ +HexChar := "a" | ... | "f" | "A" | ... | "F" +Name = Alpha (Digit | Alpha)* +Alpha = "a" | ... | "z" | "A" | ... | "Z" +NoNewLineChar := all characters without "\n" + + + + + +Description of Header Lines + +The header has an arbitrary number of lines of the format +"key: value". Possible key values for the header are: + + + + + version: number [Callgrind] + This is used to distinguish future profile data formats. A + major version of 0 or 1 is supposed to be upwards compatible with + Cachegrinds format. It is optional; if not appearing, version 1 + is supposed. Otherwise, this has to be the first header line. + + + + pid: process id [Callgrind] + This specifies the process ID of the supervised application + for which this profile was generated. + + + + cmd: program name + args [Cachegrind] + This specifies the full command line of the supervised + application for which this profile was generated. + + + + part: number [Callgrind] + This specifies a sequentially incremented number for each dump + generated, starting at 1. + + + + desc: type: value [Cachegrind] + This specifies various information for this dump. For some + types, the semantic is defined, but any description type is allowed. + Unknown types should be ignored. + There are the types "I1 cache", "D1 cache", "L2 cache", which + specify parameters used for the cache simulator. These are the only + types originally used by Cachegrind. Additionally, Callgrind uses + the following types: "Timerange" gives a rough range of the basic + block counter, for which the cost of this dump was collected. + Type "Trigger" states the reason of why this trace was generated. + E.g. program termination or forced interactive dump. + + + + positions: [instr] [line] [Callgrind] + For cost lines, this defines the semantic of the first numbers. + Any combination of "instr", "bb" and "line" is allowed, but has to be + in this order which corresponds to position numbers at the start of + the cost lines later in the file. + If "instr" is specified, the position is the address of an + instruction whose execution raised the events given later on the + line. This address is relative to the offset of the binary/shared + library file to not have to specify relocation info. For "line", + the position is the line number of a source file, which is + responsible for the events raised. Note that the mapping of "instr" + and "line" positions are given by the debugging line information + produced by the compiler. + This field is optional. If not specified, "line" is supposed + only. + + + + events: event type abbrevations [Cachegrind] + A list of short names of the event types logged in this file. + The order is the same as in cost lines. The first event type is the + second or third number in a cost line, depending on the value of + "positions". Callgrind does not add additional cost types. Specify + exactly once. + Cost types from original Cachegrind are: + + + Ir: Instruction read access + + + I1mr: Instruction Level 1 read cache miss + + + I2mr: Instruction Level 2 read cache miss + + + ... + + + + + + + summary: costs [Callgrind] + totals: costs [Cachegrind] + The value or the total number of events covered by this trace + file. Both keys have the same meaning, but the "totals:" line + happens to be at the end of the file, while "summary:" appears in + the header. This was added to allow postprocessing tools to know + in advance to total cost. The two lines always give the same cost + counts. + + + + + + + +Description of Body Lines + +There exist lines +spec=position. The values for position +specifications are arbitrary strings. When starting with "(" and a +digit, it's a string in compressed format. Otherwise it's the real +position string. This allows for file and symbol names as position +strings, as these never start with "(" + digit. +The compressed format is either "(" number ")" +space position or only +"(" number ")". The first relates +position to number in the +context of the given format specification from this line to the end of +the file; it makes the (number) an alias for +position. Compressed format is always +optional. + +Position specifications allowed: + + + + ob= [Callgrind] + The ELF object where the cost of next cost lines happens. + + + + fl= [Cachegrind] + + + + fi= [Cachegrind] + + + + fe= [Cachegrind] + The source file including the code which is responsible for + the cost of next cost lines. "fi="/"fe=" is used when the source + file changes inside of a function, i.e. for inlined code. + + + + fn= [Cachegrind] + The name of the function where the cost of next cost lines + happens. + + + + cob= [Callgrind] + The ELF object of the target of the next call cost lines. + + + + cfl= [Callgrind] + The source file including the code of the target of the + next call cost lines. + + + + cfn= [Callgrind] + The name of the target function of the next call cost + lines. + + + + calls= [Callgrind] + The number of nonrecursive calls which are responsible for the + cost specified by the next call cost line. This is the cost spent + inside of the called function. + After "calls=" there MUST be a cost line. This is the cost + spent in the called function. The first number is the source line + from where the call happened. + + + + jump=count target position [Callgrind] + Unconditional jump, executed count times, to the given target + position. + + + + jcnd=exe.count jumpcount target position [Callgrind] + Conditional jump, executed exe.count times with jumpcount + jumps to the given target position. + + + + + + + + + \ No newline at end of file diff --git a/callgrind/docs/cl-manual.xml b/callgrind/docs/cl-manual.xml new file mode 100644 index 0000000000..6c8797f422 --- /dev/null +++ b/callgrind/docs/cl-manual.xml @@ -0,0 +1,810 @@ + + %cl-entities; ]> + + +Callgrind Manual + + + +Overview + +Callgrind is a Valgrind tool, able to run applications under +supervision to generate profiling data. By default, this data consists of +number of instructions executed on a run, related to source lines, and +call relationship among functions together with call counts. +Optionally, a cache simulator (similar to cachegrind) can produce +further information about the memory access behavior of the application. + + +The profile data is written out to a file at program +termination. For presentation of the data, and interactive control +of the profiling, two command line tools are provided: + + + callgrind_annotate + + This command reads in the profile data, and prints a + sorted lists of functions, optionally with annotation. + You can read the manpage here: . + For graphical visualization of the data, check out + KCachegrind. + + + + + + callgrind_control + + This command enables you to interactively observe and control + the status of currently running applications supervised. You can + get statistic information, the current stack trace, and request + zeroing of counters, and dumping of profiles. + You can read the manpage here: . + + + + +To use Callgrind, you must specify +--tool=callgrind on the Valgrind +command line or use the supplied script +callgrind. + +Callgrinds cache simulation is based on the +Cachegrind tool of the +Valgrind package. Read +Cachegrind's documentation first; +this page describes the features supported in addition to +Cachegrind's features. + + + + + +Purpose + + + + Profiling as part of Application Development + + With application development, usually, one of the last steps is + to improve the runtime performance. To not waste time on + optimizing functions which are rarely used, one needs to know + in which part of the program most of the time is spent. + + This is done with a technique called profiling. The program + is run under control of a profiling tool, which gives the time + distribution of executed functions in the run. After examination + of the program's profile, it should be clear if and where optimization + is useful. Afterwards, one should verify any runtime changes by another + profile run. + + + + + + Profiling Tools + + Most known is the GCC profiling tool GProf: + one needs to compile an application with the compiler option + -pg; running the program generates + a file gmon.out, which can be + transformed into human readable form with the command line tool + gprof. An disadvantage here is the + required compilation step for preparing the executable; additionally, the + application should be statically linked. + + Another profiling tool is Cachegrind, part + of Valgrind. It uses the processor + emulation of Valgrind to run the executable, and catches all memory + accesses for the trace. The user program does not need to be + recompiled; it can use shared libraries and plugins, and the profile + measuring doesn't influence the trace results. The trace includes + the number of instruction/data memory accesses and 1st/2nd level + cache misses, and relates it to source lines and functions of the + run program. A disadvantage is the slowdown involved in the + processor emulation, it's around 50 times slower. + + Cachegrind can only deliver a flat profile. There is no call + relationship among the functions of an application stored. Thus, + inclusive costs, i.e. costs of a function including the cost of all + functions called from there, cannot be calculated. Callgrind extends + Cachegrind by including call relationship and exact event counts + spent while doing a call. + + Because Callgrind (and Cachegrind) is based on simulation, the + slowdown due to processing the synthetic runtime events does not + influence the results. See for more + details on the possibilities. + + + + + + + +Usage + + + Basics + + To start a profile run for a program, execute: + callgrind [callgrind options] your-program [program options] + + + While the simulation is running, you can observe execution with + callgrind_control -b + This will print out a current backtrace. To annotate the backtrace with + event counts, run + callgrind_control -e -b + + + After program termination, a profile data file named + callgrind.out.pid + is generated with pid being the process ID + of the execution of this profile run. + + The data file contains information about the calls made in the + program among the functions executed, together with events of type + Instruction Read Accesses (Ir). + + If you are additionally interested in memory accesses of your + program, and if an access can be satisfied by loading from 1st/2nd + level cache, use Callgrind with the option + + This will further slow down the run approximatly by a factor of 2. + + If the program section you want to profile is somewhere in the + middle of the run, it is beneficial to + fast forward to this section without any + profiling at all, and switch it on later. This is achieved by using + + and interactively use + callgrind_control -i on before the + interesting code section is about to be executed. + + If you want to be able to see assembler annotation, specify + . This will produce + profile data at instruction granularity. Note that this type of annotation + is only available with KCachegrind. For assembler annotation, it also is + interesting to see more details of the control flow inside of functions, + ie. (conditional) jumps. This will be collected by further specifying + . + + + + + + Multiple dumps from one program run + + Often, you aren't interested in time characteristics of a full + program run, but only of a small part of it (e.g. execution of one + algorithm). If there are multiple algorithms or one algorithm + running with different input data, it's even useful to get different + profile information for multiple parts of one program run. + + In full detail, a generated profile data files is named + +callgrind.out.pid.part-threadID + + + where pid is the PID of the running + program, part is a number incremented on each + dump (".part" is skipped for the dump at program termination), and + threadID is a thread identification + ("-threadID" is only used if you request dumps of individual + threads with ). + + There are different ways to generate multiple profile dumps + while a program is running under Callgrind's supervision. Still, + all methods trigger the same action, viz. "dump all profile + information since the last dump or program start, and zero cost + counters afterwards". To allow for zeroing cost counters without + dumping, there is a second action "zero all cost counters now". + The different methods are: + + + + Dump on program termination. + This method is the standard way and doesn't need any special + action from your side. + + + + Spontaneous, interactive dumping. Use + callgrind_control -d [hint [PID/Name]] to + request the dumping of profile information of the supervised + application with PID or Name. hint is an + arbitrary string you can optionally specify to later be able to + distinguish profile dumps. The control program will not terminate + before the dump is completely written. Note that the application + must be actively running for detection of the dump command. So, + for a GUI application, resize the window or for a server send a + request. + If you are using KCachegrind + for browsing of profile information, you can use the toolbar + button Force dump. This will request a dump + and trigger a reload after the dump is written. + + + + Periodic dumping after execution of a specified + number of basic blocks. For this, use the command line + option . + The resultion of the internal basic block counter of Valgrind is + only rough, so you should at least specify a interval of 50000 + basic blocks. + + + + Dumping at enter/leave of all functions whose name + starts with funcprefix. Use the + option + and . + To zero cost counters before entering a function, use + . + The prefix method for specifying function names was choosen to + ease the use with C++: you don't have to specify full + signatures. You can specify these options multiple + times for different function prefixes. + + + + Program controlled dumping. + Put ]]> + into your source and add + CALLGRIND_DUMP_STATS; when you + want a dump to happen. Use + CALLGRIND_ZERO_STATS; to only + zero cost centers. + In Valgrind terminology, this way is called "Client + requests". The given macros generate a special instruction + pattern with no effect at all (i.e. a NOP). Only when run under + Valgrind, the CPU simulation engine detects the special + instruction pattern and triggers special actions like the ones + described above. + + + + If you are running a multi-threaded application and specify the + command line option , + every thread will be profiled on its own and will create its own + profile dump. Thus, the last two methods will only generate one dump + of the currently running thread. With the other methods, you will get + multiple dumps (one for each thread) on a dump request. + + + + + + + Limiting range of event collection + + For aggregating events (function enter/leave, + instruction execution, memory access) into event numbers, + first, the events must be recognizable by Callgrind, and second, + the collection state must be switched on. + + Event recognition is only possible if instrumentation + for program code is switched on. This is the default, but for faster + execution (identical to valgrind --tool=none), + it can be temporarely switched off until the program reaches parts which + are interesting to be profiled. Callgrind can start without instrumentation + by specifying option . + The instrumentation state can be switched on interactively + with callgrind_control -i on + and off by specifying "off" instead of "on". + Furthermore, instrumentation state can be programatically changed with + the macros CALLGRIND_START_INSTRUMENTATION; + and CALLGRIND_STOP_INSTRUMENTATION;. + + + In addition to instrumentation, events must be allowed to be collected + to be counted. This, too, is by default the case. + You can explicitly control for which part of your program you want to + collect events by using + . + This will toggle the collection state on entering and leaving a + function. When specifying this option, the default collection state + at program start is "off". Thus, only events happening while running + inside of functions starting with funcprefix will + be collected. Recursive + calls of functions with funcprefix do not trigger + any action. + + It is important to note that with instrumentation switched off, the + cache simulator can not see any memory access events, and thus, any + simulated cache state will be frozen and wrong without instrumentation. + Therefore, to get useful cache events (hits/misses) after switching on + instrumentation, the cache first must warm up, + probably leading to many cold misses + which would not have happened in reality. If you do not want to see these, + start actual collection a few million instructions after you have switched + on instrumentation. + + + + + + + + Avoiding cycles + + Each group of functions with any two of them happening to have a + call chain from one to the other, is called a cycle. For example, + with A calling B, B calling C, and C calling A, the three functions + A,B,C build up one cycle. + + If a call chain goes multiple times around inside of a cycle, + with profiling, you can not distinguish event counts coming from the + first round or the second. Thus, it makes no sense to attach any inclusive + cost to a call among functions inside of one cycle. + If "A > B" appears multiple times in a call chain, you + have no way to partition the one big sum of all appearances of "A > + B". Thus, for profile data presentation, all functions of a cycle are + seen as one big virtual function. + + Unfortunately, if you have an application using some callback + mechanism (like any GUI program), or even with normal polymorphism (as + in OO languages like C++), it's quite possible to get large cycles. + As it is often impossible to say anything about performance behaviour + inside of cycles, it is useful to introduce some mechanisms to avoid + cycles in call graphs at all. This is done by treating the same + function in different ways, depending on the current execution + context. Either by giving them different names, or by ignoring calls to + functions at all. + + There is an option to ignore calls to a function with + . E.g., you + usually do not want to see the trampoline functions in the PLT sections + for calls to functions in shared libraries. You can see the difference + if you profile with . + If a call is ignored, cost events happening will be attached to the + enclosing function. + + If you have a recursive function, you can distinguish the first + 10 recursion levels by specifying + . + Or for all functions with + , but this will + give you much bigger profile data files. In the profile data, you will see + the recursion levels of "func" as the different functions with names + "func", "func'2", "func'3" and so on. + + If you have call chains "A > B > C" and "A > C > B" + in your program, you usually get a "false" cycle "B <> C". Use + + , + and functions "B" and "C" will be treated as different functions + depending on the direct caller. Using the apostrophe for appending + this "context" to the function name, you get "A > B'A > C'B" + and "A > C'A > B'C", and there will be no cycle. Use + to get a 2-caller + dependency for all functions. Again, this will multiplicate the + profile data size. + + + + + + + +Command line option reference + + +This reference groups options into classes, and uses the same order as +the output as callgrind --help. + + + +Miscellaneous options + + + + + + + Show summary of options. This is a short version of this + manual section. + + + + + + + Show version of callgrind. + + + + + + + +Dump creation options + + +These options influence the name and format of the profile data files. + + + + + + + + + + Specify another base name for the dump file names. To + distinguish different profile runs of the same application, + .<pid> is appended to the + base dump file name with + <pid> being the process ID + of the profile run (with multiple dumps happening, the file name + is modified further; see below). This option is + especially usefull if your application changes its working + directory. Usually, the dump file is generated in the current + working directory of the application at program termination. By + giving an absolute path with the base specification, you can force + a fixed directory for the dump files. + + + + + + + + + This specifies that event count relation at instruction granularity + should be available in the profile data file. This allows assembler + annotation, but currently can only be shown with KCachegrind. + + + + + + + + + This specifies that event count relation at source line granularity + should be available in the profile data file. This allows source + annotation for source which was compiled with debug information ("-g"). + This always should be enabled. + + + + + + + + + This option influences the output format of the profile data. + It specifies whether strings (file and function names) should be + identified by numbers. This shrinks the file size, but makes it more difficult + to be read by humans (which is not recommand either way). + However, this currently has to be switched off if + the files are to be read by + callgrind_annotate! + + + + + + + + + This option influences the output format of the profile data. + It specifies whether numerical positions are always specified as absolute + values or are allowed to be relative to previous numbers. + This shrinks the file size, + However, this currently has to be switched off if + the files are to be read by + callgrind_annotate! + + + + + + + + + When multiple profile data parts are to be generated, these + parts are appended to the same output file if this option is set to + "yes". Not recommand. + + + + + + + +Activity options + + +These options specify when different actions regarding event counts are to +be executed. For interactive control use +callgrind_control. + + + + + + + + + + Dump profile data each <count> basic blocks + + + + + + + + + Dump when entering a function starting with <prefix> + + + + + + + + + Zero all costs when entering a function starting with <prefix> + + + + + + + + + Dump when leaving a function starting with <prefix> + + + + + + + +Data collection options + + +These options specify when events are to be aggregated into event counts. +Also see . + + + + + + + + + Specify if you want Callgrind to start simulation and + profiling from the beginning. If not, Callgrind will not be able + to collect any information, including calls, but it will have at + most a slowdown of around 4, which is the minimum Valgrind + overhead. Instrumentation can be interactively switched on via + callgrind_control -i on. + Note that the resulting call graph will most probably not + contain main, but all the + functions executed after instrumentation was switched on. + Instrumentation can also programatically switched on/off. See the + Callgrind include file + <callgrind.h> for the macro + you have to use in your source code. For cache + simulation, results will be a little bit off when switching on + instrumentation later in the program run, as the simulator starts + with an empty cache at that moment. Switch on event collection + later to cope with this error. + + + + + + + + + Specify whether event collection is switched on at beginning + of the profile run. + To only look at parts of your program, you have two + possibilities: + + + Zero event counters before entering the program part you + want to profile, and dump the event counters to a file after + leaving that program part. + + + Switch on/off collection state as needed to only see + event counters happening while inside of the program part you + want to profile. + + + The second option can be used if the programm part you want to + profile is called many times. Option 1, i.e. creating a lot of + dumps is not practical here. Collection state can be + toggled at entering and leaving of a given function with the + option . For this, collection + state should be switched off at the beginning. Note that the + specification of --toggle-collect + implicitly sets + --collect-state=no. + Collection state can be toggled also by using a Valgrind + User Request in your application. For this, include + valgrind/callgrind.h and specify + the macro + CALLGRIND_TOGGLE_COLLECT at the + needed positions. This only will have any effect if run under + supervision of the Callgrind tool. + + + + + + + + + Toggle collection on enter/leave a function starting with + <prefix>. + + + + + + + + + This specifies whether information for (conditional) jumps + should be collected. Same as above, callgrind_annotate currently is not + able to show you the data. You have to use KCachegrind to get jump + arrows in the annotated code. + + + + + + + +Cost entity separation options + + +These options specify how event count relation to execution contexts should be +done. More specifically, this specifies e.g. if the recursion level or the +call chain leading to a function should be accounted for, are if the +thread ID should be remembered. +Also see . + + + + + + + + + This option specifies whether profile data should be generated + separately for every thread. If yes, the file names get "-threadID" + appended. + + + + + + + + + Separate function recursions, maximal <level>. + See . + + + + + + + + + Separate contexts by maximal <callers> functions in the + call chain. See . + + + + + + + + + Ignore calls to/from PLT sections. + + + + + + + + + Ignore calls to/from a given function? E.g. if you have a + call chain A > B > C, and you specify function B to be + ignored, you will only see A > C. + This is very convenient to skip functions handling callback + behaviour. E.g. for the SIGNAL/SLOT mechanism in QT, you only want + to see the function emitting a signal to call the slots connected + to that signal. First, determine the real call chain to see the + functions needed to be skipped, then use this option. + + + + + + + + + Put a function into a separation group. This influences the + context name for cycle avoidance. All functions inside of such a + group are treated as being the same for context name building, which + resembles the call chain leading to a context. By specifying function + groups with this option, you can shorten the context name, as functions + in the same group will not appear in sequence in the name. + + + + + + + + + Separate <number> recursions for <function>. + See . + + + + + + + + + Separate <number> callers for <function>. + See . + + + + + + + +Cache simulation options + + + + + + + + + Specify if you want to do full cache simulation. Disabled by + default; only instruction read accesses will be profiled. + Note however, that estimating of how much real time your + program will need only by using the instruction read counts is + impossible. Use it if you want to find out how many times + different functions are called and there call relation. + + + + + + + + + + + + diff --git a/callgrind/docs/index.xml b/callgrind/docs/index.xml new file mode 100644 index 0000000000..45d2f8e94f --- /dev/null +++ b/callgrind/docs/index.xml @@ -0,0 +1,120 @@ + + %cl-entities; ]> + + + + + + Callgrind Documentation + A call-graph generating Cache Simulator and Profiler + Release &cl-version; &cl-date; + + &cl-lifespan; + + + + + + &cl-email; + + + Permission is granted to copy, distribute and/or modify this + document under the terms of the GNU Free Documentation License, + Version 1.2 or any later version published by the Free Software Foundation; + with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover + Texts. A copy of the license is included in the section entitled + . + + + + + + + + + + Callgrind Annotate (1) + + + + + Callgrind Control (1) + + + + + + Callgrind (1) + + + + + + + AUTHORS + + + + + + + README + + + + + + + ChangeLog + + + + + + + + + + INSTALL + + + + + + + The GNU General Public License + + + + + + + The GNU Free Documentation License + + + + + + + diff --git a/callgrind/docs/man-annotate.xml b/callgrind/docs/man-annotate.xml new file mode 100644 index 0000000000..dd668e26e5 --- /dev/null +++ b/callgrind/docs/man-annotate.xml @@ -0,0 +1,163 @@ + + %cl-entities; ]> + + + + + + Callgrind Annotate + 1 + May 13, 2003 + + + + callgrind_annotate + produces human readable ASCII output from profile + information in cachegrind.out files + + + + + callgrind_annotate + options + source-files + + + + + +Description + +This manual page documents briefly the +callgrind_annotate command. This manual page was +written for the Debian distribution because the original program does +not have a manual page. + + + + + +Options + +This program follows the usual GNU command line syntax, with long +options starting with two dashes ('--'). A summary of options is +included below. + + + + + + + Show summary of options. + + + + + + + Show version of callgrind_annotate. + + + + + + + + + only show figures for events A,B,C + + + + + + + + + sort columns by events A,B,C [event column order] + + + + + + + + + percentage of counts (of primary sort event) we are + interested in + + + + + + + + + annotate all source files containing functions that helped + reach the event count threshold + + + + + + + + + print N lines of context before and after annotated + lines + + + + + + + + + add subroutine costs to functions calls + + + + + + + + + print for each function their callers, the called functions + or both + + + + + + + + + add <dir> to the list of directories to search for source + files + + + + + + + + + +See Also + +&cl-doc-path; + + + + + +Author + +This manual page was written by +Philipp Frauenfelder <pfrauenf@debian.org>, for the Debian +GNU/Linux system (but may be used by others). + + + + diff --git a/callgrind/docs/man-callgrind.xml b/callgrind/docs/man-callgrind.xml new file mode 100644 index 0000000000..152543fc67 --- /dev/null +++ b/callgrind/docs/man-callgrind.xml @@ -0,0 +1,100 @@ + + %cl-entities; ]> + + + + + Callgrind + 1 + November 18, 2005 + + + + callgrind + calls valgrind with the callgrind tool + + + + + callgrind + options + progs-and-args + + + + + +Description + +Callgrind is a profiling tool similar to gprof, +but by being able to observe a program run in great detail - using +Valgrind - it can give much more information. The binary does not have +to be prepared for profiling with callgrind in any +special way. Still, it is recommended to compile with debug information. + +Callgrind builds up the call graph of a program +while it is running, and optionally does cache simulation. The collected +profiling data can be stored into an output file multiple times in a +program run, optionally separately for every thread in the case of +multithreaded code. For interactive inspection and control, see +callgrind_control. The data produced +(callgrind.out.PID) can be analysed with +callgrind_annotate or better with the graphical profile +visualization KCachegrind. Further documentation can +be found in HTML format either on your filesystem: +&cl-doc-path; or online at +&cl-doc-url;. + + + + + +Options + +This program follows the usual GNU command line syntax, with long +options starting with two dashes ('--'). + + + + + + + + + +See Also + +callgrind_control, +callgrind_annotate, +&cl-doc-path; + + + + + + +Author + +This manual page was written by Josef Weidendorfer <&cl-email;>. + + + + + + +Copyright + +Copyright © &cl-lifespan; Josef Weidendorfer +This is free software; see the source for copying conditions. +There is NO warranty; not even for MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + diff --git a/callgrind/docs/man-control.xml b/callgrind/docs/man-control.xml new file mode 100644 index 0000000000..ca3edde6e8 --- /dev/null +++ b/callgrind/docs/man-control.xml @@ -0,0 +1,132 @@ + + %cl-entities; ]> + + + + + Callgrind Control + 1 + October, 2005 + + + + callgrind_control + observe and control applications currently running under + supervision of callgrind + + + + + callgrind_control + options + pid/program-name + + + + + +Description + +This manual page documents briefly the +callgrind_control command. When not specifying a +pid/program name argument, all applications run +by callgrind on this system will be used for actions given by the +specified option(s). The default action is to give short information +for the applications run by callgrind. + + + + + +Options + +This program follows the usual GNU command line syntax, with long +options starting with two dashes ("--"). A summary of options is +included below. + + + + + + + Show summary of options. + + + + + + + Show version of callgrind_control. + + + + + + + Show statistics + + + + + + + Show stack trace + + + + + + + Only show figures for events A,B,C + + + + + + + Zero cost counters + + + + + + + Request the dumping of profile information. Optionally, a + string can be specified which is written into the dump as part of + the Trigger reason. This can be used to distinguish multiple dumps. + + + + + + + Kill + + + + + + + + + +See Also + +&cl-doc-path; + + + + + +Author + +This manual page was written by Josef Weidendorfer <&cl-email;>. + + + + + + + diff --git a/docs/xml/manual.xml b/docs/xml/manual.xml index 1599aedb9b..0d996488c2 100644 --- a/docs/xml/manual.xml +++ b/docs/xml/manual.xml @@ -28,6 +28,8 @@ xmlns:xi="http://www.w3.org/2001/XInclude" /> + + diff --git a/docs/xml/valgrind-manpage.xml b/docs/xml/valgrind-manpage.xml index e86ebb37df..ffe33646a1 100644 --- a/docs/xml/valgrind-manpage.xml +++ b/docs/xml/valgrind-manpage.xml @@ -76,6 +76,14 @@ leaks. instructions executed and cache misses incurred. + + adds call graph tracing to cachegrind. It can be + used to get call counts and inclusive cost for each call happening in your + program. In addition to cachegrind, callgrind can annotate threads separatly, + and every instruction of disassembler output of your program with the number of + instructions executed and cache misses incurred. + + spots potential race conditions in your program. @@ -198,6 +206,17 @@ leaks. + +Callgrind Options + + + + + + + Massif Options