-EXTRA_DIST =
+EXTRA_DIST = \
+ cl-entities.xml \
+ cl-manual.xml \
+ cl-format.xml \
+ index.xml \
+ man-annotate.xml \
+ man-control.xml \
+ man-callgrind.xml
--- /dev/null
+<!-- callgrind release + version stuff -->
+<!ENTITY cl-version "0.10.1">
+<!ENTITY cl-date "November 25 2005">
+
+<!-- copyright length of years -->
+<!ENTITY cl-lifespan "2000-2005">
+
+<!-- website + email -->
+<!ENTITY cl-email "Josef.Weidendorfer@gmx.de">
+<!ENTITY cl-url "http://www.valgrind.org/info/developers.html">
+
+<!-- external urls used in the docs. kept in here because when -->
+<!-- they change it's a real pain tracking them down in the docs -->
+<!ENTITY vg-url "http://www.valgrind.org/">
+<!ENTITY cg-doc-url "http://www.valgrind.org/docs/manual/cg-manual.html">
+<!ENTITY cg-tool-url "http://www.valgrind.org/info/tools.html#cachegrind">
+<!ENTITY cl-gui "http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex">
+
+<!-- path/to/callgrind/docs in valgrind install tree -->
+<!-- only used in the manpages -->
+<!ENTITY cl-doc-path "/usr/share/doc/valgrind/html/callgrind.html">
+<!ENTITY cl-doc-url "http://www.valgrind.org/docs/manual/cl-manual.html">
--- /dev/null
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+<chapter id="cl-format" xreflabel="Callgrind Format Specification">
+<title>Callgrind Format Specification</title>
+
+<para>This chapter describes the Callgrind Profile Format, Version 1.</para>
+
+<para>A synonymous name is "Calltree Profile Format". These names actually mean
+the same since Callgrind was previously named Calltree.</para>
+
+<para>The format description is meant for the user to be able to understand the
+file contents; but more important, it is given for authors of measurement or
+visualization tools to be able to write and read this format.</para>
+
+<sect1 id="cl-format.overview" xreflabel="Overview">
+<title>Overview</title>
+
+<para>The profile data format is ASCII based.
+It is written by Callgrind, and it is upwards compatible
+to the format used by Cachegrind (ie. Cachegrind uses a subset). It can
+be read by callgrind_annotate and KCachegrind.</para>
+
+<para>This chapter gives on overview of format features and examples.
+For detailed syntax, look at the format reference.</para>
+
+<sect2 id="cl-format.overview.basics" xreflabel="Basic Structure">
+<title>Basic Structure</title>
+
+<para>Each file has a header part of an arbitrary number of lines of the
+format "key: value". The lines with key "positions" and "events" define
+the meaning of cost lines in the second part of the file: the value of
+"positions" is a list of subpositions, and the value of "events" is a list
+of event type names. Cost lines consist of subpositions followed by 64-bit
+counters for the events, in the order specified by the "positions" and "events"
+header line.</para>
+
+<para>The "events" header line is always required in contrast to the optional
+line for "positions", which defaults to "line", i.e. a line number of some
+source file. In addition, the second part of the file contains position
+specifications of the form "spec=name". "spec" can be e.g. "fn" for a
+function name or "fl" for a file name. Cost lines are always related to
+the function/file specifications given directly before.</para>
+
+</sect2>
+
+<sect2 id="cl-format.overview.example1" xreflabel="Simple Example">
+<title>Simple Example</title>
+
+<para>
+<screen>events: Cycles Instructions Flops
+fl=file.f
+fn=main
+15 90 14 2
+16 20 12</screen></para>
+
+<para>The above example gives profile information for event types "Cycles",
+"Instructions", and "Flops". Thus, cost lines give the number of CPU cycles
+passed by, number of executed instructions, and number of floating point
+operations executed while running code corresponding to some source
+position. As there is no line specifying the value of "positions", it defaults
+to "line", which means that the first number of a cost line is always a line
+number.</para>
+
+<para>Thus, the first cost line specifies that in line 15 of source file
+"file.f" there is code belonging to function "main". While running, 90 CPU
+cycles passed by, and 2 of the 14 instructions executed were floating point
+operations. Similarily, the next line specifies that there were 12 instructions
+executed in the context of function "main" which can be related to line 16 in
+file "file.f", taking 20 CPU cycles. If a cost line specifies less event counts
+than given in the "events" line, the rest is assumed to be zero. I.e., there
+was no floating point instruction executed relating to line 16.</para>
+
+<para>Note that regular cost lines always give self (also called exclusive)
+cost of code at a given position. If you specify multiple cost lines for the
+same position, these will be summed up. On the other hand, in the example above
+there is no specification of how many times function "main" actually was
+called: profile data only contains sums.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.associations" xreflabel="Associations">
+<title>Associations</title>
+
+<para>The most important extension to the original format of Cachegrind is the
+ability to specify call relationship among functions. More generally, you
+specify assoziations among positions. For this, the second part of the
+file also can contain assoziation specifications. These look similar to
+position specifications, but consist of 2 lines. For calls, the format
+looks like
+<screen>
+ calls=(Call Count) (Destination position)
+ (Source position) (Inclusive cost of call)
+</screen></para>
+
+<para>The destination only specifies subpositions like line number. Therefore,
+to be able to specify a call to another function in another source file, you
+have to precede the above lines with a "cfn=" specification for the name of the
+called function, and a "cfl=" specification if the function is in another
+source file. The 2nd line looks like a regular cost line with the difference
+that inclusive cost spent inside of the function call has to be specified.</para>
+
+<para>Other assoziations which or for example (conditional) jumps. See the
+reference below for details.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.example2" xreflabel="Extended Example">
+<title>Extended Example</title>
+
+<para>The following example shows 3 functions, "main", "func1", and
+"func2". Function "main" calls "func1" once and "func2" 3 times. "func1" calls
+"func2" 2 times.
+<screen>events: Instructions
+
+fl=file1.c
+fn=main
+16 20
+cfn=func1
+calls=1 50
+16 400
+cfl=file2.c
+cfn=func2
+calls=3 20
+16 400
+
+fn=func1
+51 100
+cfl=file2.c
+cfn=func2
+calls=2 20
+51 300
+
+fl=file2.c
+fn=func2
+20 700</screen></para>
+
+<para>One can see that in "main" only code from line 16 is executed where also
+the other functions are called. Inclusive cost of "main" is 420, which is the
+sum of self cost 20 and costs spent in the calls.</para>
+
+<para>Function "func1" is located in "file1.c", the same as "main". Therefore,
+a "cfl=" specification for the call to "func1" is not needed. The function
+"func1" only consists of code at line 51 of "file1.c", where "func2" is called.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.compression1" xreflabel="Name Compression">
+<title>Name Compression</title>
+
+<para>With the introduction of association specifications like calls it is
+needed to specify the same function or same file name multiple times. As
+absolute filenames or symbol names in C++ can be quite long, it is advantageous
+to be able to specify integer IDs for position specifications.</para>
+
+<para>To support name compression, a position specification can be not only of
+the format "spec=name", but also "spec=(ID) name" to specify a mapping of an
+integer ID to a name, and "spec=(ID)" to reference a previously defined ID
+mapping. There is a separate ID mapping for each position specification,
+i.e. you can use ID 1 for both a file name and a symbol name.</para>
+
+<para>With string compression, the example from 1.4 looks like this:
+<screen>events: Instructions
+
+fl=(1) file1.c
+fn=(1) main
+16 20
+cfn=(2) func1
+calls=1 50
+16 400
+cfl=(2) file2.c
+cfn=(3) func2
+calls=3 20
+16 400
+
+fn=(2)
+51 100
+cfl=(2)
+cfn=(3)
+calls=2 20
+51 300
+
+fl=(2)
+fn=(3)
+20 700</screen></para>
+
+<para>As position specifications carry no information themself, but only change
+the meaning of subsequent cost lines or associations, they can appear
+everywhere in the file without any negative consequence. Especially, you can
+define name compression mappings directly after the header, and before any cost
+lines. Thus, the above example can also be written as
+<screen>events: Instructions
+
+# define file ID mapping
+fl=(1) file1.c
+fl=(2) file2.c
+# define function ID mapping
+fn=(1) main
+fn=(2) func1
+fn=(3) func2
+
+fl=(1)
+fn=(1)
+16 20
+...</screen></para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.compression2" xreflabel="Subposition Compression">
+<title>Subposition Compression</title>
+
+<para>If a Calltree data file should hold costs for each assembler instruction
+of a program, you specify subpostion "instr" in the "positions:" header line,
+and each cost line has to include the address of some instruction. Addresses
+are allowed to have a size of 64bit to support 64bit architectures. This
+motivates for subposition compression: instead of every cost line starting with
+a 16 character long address, one is allowed to specify relative subpositions.</para>
+
+<para>A relative subposition always is based on the corresponding subposition
+of the last cost line, and starts with a "+" to specify a positive difference,
+a "-" to specify a negative difference, or consists of "*" to specify the same
+subposition. Assume the following example (subpositions can always be specified
+as hexadecimal numbers, beginning with "0x"):
+<screen>positions: instr line
+events: ticks
+
+fn=func
+0x80001234 90 1
+0x80001237 90 5
+0x80001238 91 6</screen></para>
+
+<para>With subposition compression, this looks like
+<screen>positions: instr line
+events: ticks
+
+fn=func
+0x80001234 90 1
++3 * 5
++1 +1 6</screen></para>
+
+<para>Remark: For assembler annotation to work, instruction addresses have to
+be corrected to correspond to addresses found in the original binary. I.e. for
+relocatable shared objects, often a load offset has to be subtracted.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.misc" xreflabel="Miscellaneous">
+<title>Miscellaneous</title>
+
+<sect3 id="cl-format.overview.misc.summary" xreflabel="Cost Summary Information">
+<title>Cost Summary Information</title>
+
+<para>For the visualization to be able to show cost percentage, a sum of the
+cost of the full run has to be known. Usually, it is assumed that this is the
+sum of all cost lines in a file. But sometimes, this is not correct. Thus, you
+can specify a "summary:" line in the header giving the full cost for the
+profile run. This has another effect: a import filter can show a progress bar
+while loading a large data file if he knows to cost sum in advance.</para>
+
+</sect3>
+
+<sect3 id="cl-format.overview.misc.events" xreflabel="Long Names for Event Types and inherited Types">
+<title>Long Names for Event Types and inherited Types</title>
+
+<para>Event types for cost lines are specified in the "events:" line with an
+abbreviated name. For visualization, it makes sense to be able to specify some
+longer, more descriptive name. For an event type "Ir" which means "Instruction
+Fetches", this can be specified the header line
+<screen>event: Ir : Instruction Fetches
+events: Ir Dr</screen></para>
+
+<para>In this example, "Dr" itself has no long name assoziated. The order of
+"event:" lines and the "events:" line is of no importance. Additionally,
+inherited event types can be introduced for which no raw data is available, but
+which are calculated from given types. Suppose the last example, you could add
+<screen>event: Sum = Ir + Dr</screen>
+to specify an additional event type "Sum", which is calculated by adding costs
+for "Ir and "Dr".</para>
+
+</sect3>
+
+</sect2>
+
+</sect1>
+
+<sect1 id="cl-format.reference" xreflabel="Reference">
+<title>Reference</title>
+
+<sect2 id="cl-format.reference.grammar" xreflabel="Grammar">
+<title>Grammar</title>
+
+<para>
+<screen>ProfileDataFile := FormatVersion? Creator? PartData*</screen>
+<screen>FormatVersion := "version:" Space* Number "\n"</screen>
+<screen>Creator := "creator:" NoNewLineChar* "\n"</screen>
+<screen>PartData := (HeaderLine "\n")+ (BodyLine "\n")+</screen>
+<screen>HeaderLine := (empty line)
+ | ('#' NoNewLineChar*)
+ | PartDetail
+ | Description
+ | EventSpecification
+ | CostLineDef</screen>
+<screen>PartDetail := TargetCommand | TargetID</screen>
+<screen>TargetCommand := "cmd:" Space* NoNewLineChar*</screen>
+<screen>TargetID := ("pid"|"thread"|"part") ":" Space* Number</screen>
+<screen>Description := "desc:" Space* Name Space* ":" NoNewLineChar*</screen>
+<screen>EventSpecification := "event:" Space* Name InheritedDef? LongNameDef?</screen>
+<screen>InheritedDef := "=" InheritedExpr</screen>
+<screen>InheritedExpr := Name
+ | Number Space* ("*" Space*)? Name
+ | InheritedExpr Space* "+" Space* InheritedExpr</screen>
+<screen>LongNameDef := ":" NoNewLineChar*</screen>
+<screen>CostLineDef := "events:" Space* Name (Space+ Name)*
+ | "positions:" "instr"? (Space+ "line")?</screen>
+<screen>BodyLine := (empty line)
+ | ('#' NoNewLineChar*)
+ | CostLine
+ | PositionSpecification
+ | AssoziationSpecification</screen>
+<screen>CostLine := SubPositionList Costs?</screen>
+<screen>SubPositionList := (SubPosition+ Space+)+</screen>
+<screen>SubPosition := Number | "+" Number | "-" Number | "*"</screen>
+<screen>Costs := (Number Space+)+</screen>
+<screen>PositionSpecification := Position "=" Space* PositionName</screen>
+<screen>Position := CostPosition | CalledPosition</screen>
+<screen>CostPosition := "ob" | "fl" | "fi" | "fe" | "fn"</screen>
+<screen>CalledPosition := " "cob" | "cfl" | "cfn"</screen>
+<screen>PositionName := ( "(" Number ")" )? (Space* NoNewLineChar* )?</screen>
+<screen>AssoziationSpecification := CallSpezification
+ | JumpSpecification</screen>
+<screen>CallSpecification := CallLine "\n" CostLine</screen>
+<screen>CallLine := "calls=" Space* Number Space+ SubPositionList</screen>
+<screen>JumpSpecification := ...</screen>
+<screen>Space := " " | "\t"</screen>
+<screen>Number := HexNumber | (Digit)+</screen>
+<screen>Digit := "0" | ... | "9"</screen>
+<screen>HexNumber := "0x" (Digit | HexChar)+</screen>
+<screen>HexChar := "a" | ... | "f" | "A" | ... | "F"</screen>
+<screen>Name = Alpha (Digit | Alpha)*</screen>
+<screen>Alpha = "a" | ... | "z" | "A" | ... | "Z"</screen>
+<screen>NoNewLineChar := all characters without "\n"</screen>
+</para>
+
+</sect2>
+
+<sect2 id="cl-format.reference.header" xreflabel="Description of Header Lines">
+<title>Description of Header Lines</title>
+
+<para>The header has an arbitrary number of lines of the format
+"key: value". Possible <emphasis>key</emphasis> values for the header are:</para>
+
+<itemizedlist>
+
+ <listitem>
+ <para><computeroutput>version: number</computeroutput> [Callgrind]</para>
+ <para>This is used to distinguish future profile data formats. A
+ major version of 0 or 1 is supposed to be upwards compatible with
+ Cachegrinds format. It is optional; if not appearing, version 1
+ is supposed. Otherwise, this has to be the first header line.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>pid: process id</computeroutput> [Callgrind]</para>
+ <para>This specifies the process ID of the supervised application
+ for which this profile was generated.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>cmd: program name + args</computeroutput> [Cachegrind]</para>
+ <para>This specifies the full command line of the supervised
+ application for which this profile was generated.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>part: number</computeroutput> [Callgrind]</para>
+ <para>This specifies a sequentially incremented number for each dump
+ generated, starting at 1.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>desc: type: value</computeroutput> [Cachegrind]</para>
+ <para>This specifies various information for this dump. For some
+ types, the semantic is defined, but any description type is allowed.
+ Unknown types should be ignored.</para>
+ <para>There are the types "I1 cache", "D1 cache", "L2 cache", which
+ specify parameters used for the cache simulator. These are the only
+ types originally used by Cachegrind. Additionally, Callgrind uses
+ the following types: "Timerange" gives a rough range of the basic
+ block counter, for which the cost of this dump was collected.
+ Type "Trigger" states the reason of why this trace was generated.
+ E.g. program termination or forced interactive dump.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>positions: [instr] [line]</computeroutput> [Callgrind]</para>
+ <para>For cost lines, this defines the semantic of the first numbers.
+ Any combination of "instr", "bb" and "line" is allowed, but has to be
+ in this order which corresponds to position numbers at the start of
+ the cost lines later in the file.</para>
+ <para>If "instr" is specified, the position is the address of an
+ instruction whose execution raised the events given later on the
+ line. This address is relative to the offset of the binary/shared
+ library file to not have to specify relocation info. For "line",
+ the position is the line number of a source file, which is
+ responsible for the events raised. Note that the mapping of "instr"
+ and "line" positions are given by the debugging line information
+ produced by the compiler.</para>
+ <para>This field is optional. If not specified, "line" is supposed
+ only.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>events: event type abbrevations</computeroutput> [Cachegrind]</para>
+ <para>A list of short names of the event types logged in this file.
+ The order is the same as in cost lines. The first event type is the
+ second or third number in a cost line, depending on the value of
+ "positions". Callgrind does not add additional cost types. Specify
+ exactly once.</para>
+ <para>Cost types from original Cachegrind are:
+ <itemizedlist>
+ <listitem>
+ <para><command>Ir</command>: Instruction read access</para>
+ </listitem>
+ <listitem>
+ <para><command>I1mr</command>: Instruction Level 1 read cache miss</para>
+ </listitem>
+ <listitem>
+ <para><command>I2mr</command>: Instruction Level 2 read cache miss</para>
+ </listitem>
+ <listitem>
+ <para>...</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>summary: costs</computeroutput> [Callgrind]</para>
+ <para><computeroutput>totals: costs</computeroutput> [Cachegrind]</para>
+ <para>The value or the total number of events covered by this trace
+ file. Both keys have the same meaning, but the "totals:" line
+ happens to be at the end of the file, while "summary:" appears in
+ the header. This was added to allow postprocessing tools to know
+ in advance to total cost. The two lines always give the same cost
+ counts.</para>
+ </listitem>
+
+</itemizedlist>
+
+</sect2>
+
+<sect2 id="cl-format.reference.body" xreflabel="Description of Body Lines">
+<title>Description of Body Lines</title>
+
+<para>There exist lines
+<computeroutput>spec=position</computeroutput>. The values for position
+specifications are arbitrary strings. When starting with "(" and a
+digit, it's a string in compressed format. Otherwise it's the real
+position string. This allows for file and symbol names as position
+strings, as these never start with "(" + <emphasis>digit</emphasis>.
+The compressed format is either "(" <emphasis>number</emphasis> ")"
+<emphasis>space</emphasis> <emphasis>position</emphasis> or only
+"(" <emphasis>number</emphasis> ")". The first relates
+<emphasis>position</emphasis> to <emphasis>number</emphasis> in the
+context of the given format specification from this line to the end of
+the file; it makes the (<emphasis>number</emphasis>) an alias for
+<emphasis>position</emphasis>. Compressed format is always
+optional.</para>
+
+<para>Position specifications allowed:</para>
+<itemizedlist>
+
+ <listitem>
+ <para><computeroutput>ob=</computeroutput> [Callgrind]</para>
+ <para>The ELF object where the cost of next cost lines happens.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>fl=</computeroutput> [Cachegrind]</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>fi=</computeroutput> [Cachegrind]</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>fe=</computeroutput> [Cachegrind]</para>
+ <para>The source file including the code which is responsible for
+ the cost of next cost lines. "fi="/"fe=" is used when the source
+ file changes inside of a function, i.e. for inlined code.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>fn=</computeroutput> [Cachegrind]</para>
+ <para>The name of the function where the cost of next cost lines
+ happens.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>cob=</computeroutput> [Callgrind]</para>
+ <para>The ELF object of the target of the next call cost lines.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>cfl=</computeroutput> [Callgrind]</para>
+ <para>The source file including the code of the target of the
+ next call cost lines.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>cfn=</computeroutput> [Callgrind]</para>
+ <para>The name of the target function of the next call cost
+ lines.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>calls=</computeroutput> [Callgrind]</para>
+ <para>The number of nonrecursive calls which are responsible for the
+ cost specified by the next call cost line. This is the cost spent
+ inside of the called function.</para>
+ <para>After "calls=" there MUST be a cost line. This is the cost
+ spent in the called function. The first number is the source line
+ from where the call happened.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>jump=count target position</computeroutput> [Callgrind]</para>
+ <para>Unconditional jump, executed count times, to the given target
+ position.</para>
+ </listitem>
+
+ <listitem>
+ <para><computeroutput>jcnd=exe.count jumpcount target position</computeroutput> [Callgrind]</para>
+ <para>Conditional jump, executed exe.count times with jumpcount
+ jumps to the given target position.</para>
+ </listitem>
+
+</itemizedlist>
+
+</sect2>
+
+</sect1>
+
+</chapter>
\ No newline at end of file
--- /dev/null
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+<chapter id="cl-manual" xreflabel="Callgrind Manual">
+<title>Callgrind Manual</title>
+
+
+<sect1 id="cl-manual.use" xreflabel="Overview">
+<title>Overview</title>
+
+<para>Callgrind is a Valgrind tool, able to run applications under
+supervision to generate profiling data. By default, this data consists of
+number of instructions executed on a run, related to source lines, and
+call relationship among functions together with call counts.
+Optionally, a cache simulator (similar to cachegrind) can produce
+further information about the memory access behavior of the application.
+</para>
+
+<para>The profile data is written out to a file at program
+termination. For presentation of the data, and interactive control
+of the profiling, two command line tools are provided:</para>
+<variablelist>
+ <varlistentry>
+ <term><command>callgrind_annotate</command></term>
+ <listitem>
+ <para>This command reads in the profile data, and prints a
+ sorted lists of functions, optionally with annotation.</para>
+ <para>You can read the manpage here: <xref
+ linkend="callgrind-annotate"/>.</para>
+ <para>For graphical visualization of the data, check out
+ <ulink url="&cl-gui;">KCachegrind</ulink>.</para>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><command>callgrind_control</command></term>
+ <listitem>
+ <para>This command enables you to interactively observe and control
+ the status of currently running applications supervised. You can
+ get statistic information, the current stack trace, and request
+ zeroing of counters, and dumping of profiles.</para>
+ <para>You can read the manpage here: <xref linkend="callgrind-control"/>.</para>
+ </listitem>
+ </varlistentry>
+</variablelist>
+
+<para>To use Callgrind, you must specify
+<computeroutput>--tool=callgrind</computeroutput> on the Valgrind
+command line or use the supplied script
+<computeroutput>callgrind</computeroutput>.</para>
+
+<para>Callgrinds cache simulation is based on the
+<ulink url="&cg-tool-url;">Cachegrind tool</ulink> of the
+<ulink url="&vg-url;">Valgrind</ulink> package. Read
+<ulink url="&cg-doc-url;">Cachegrind's documentation</ulink> first;
+this page describes the features supported in addition to
+Cachegrind's features.</para>
+
+</sect1>
+
+
+<sect1 id="cl-manual.purpose" xreflabel="Purpose">
+<title>Purpose</title>
+
+
+ <sect2 id="cl-manual.devel"
+ xreflabel="Profiling as part of Application Development">
+ <title>Profiling as part of Application Development</title>
+
+ <para>With application development, usually, one of the last steps is
+ to improve the runtime performance. To not waste time on
+ optimizing functions which are rarely used, one needs to know
+ in which part of the program most of the time is spent.</para>
+
+ <para>This is done with a technique called profiling. The program
+ is run under control of a profiling tool, which gives the time
+ distribution of executed functions in the run. After examination
+ of the program's profile, it should be clear if and where optimization
+ is useful. Afterwards, one should verify any runtime changes by another
+ profile run.</para>
+
+ </sect2>
+
+
+ <sect2 id="cl-manual.tools" xreflabel="Profiling Tools">
+ <title>Profiling Tools</title>
+
+ <para>Most known is the GCC profiling tool <command>GProf</command>:
+ one needs to compile an application with the compiler option
+ <computeroutput>-pg</computeroutput>; running the program generates
+ a file <computeroutput>gmon.out</computeroutput>, which can be
+ transformed into human readable form with the command line tool
+ <computeroutput>gprof</computeroutput>. An disadvantage here is the
+ required compilation step for preparing the executable; additionally, the
+ application should be statically linked.</para>
+
+ <para>Another profiling tool is <command>Cachegrind</command>, part
+ of <ulink url="&vg-url;">Valgrind</ulink>. It uses the processor
+ emulation of Valgrind to run the executable, and catches all memory
+ accesses for the trace. The user program does not need to be
+ recompiled; it can use shared libraries and plugins, and the profile
+ measuring doesn't influence the trace results. The trace includes
+ the number of instruction/data memory accesses and 1st/2nd level
+ cache misses, and relates it to source lines and functions of the
+ run program. A disadvantage is the slowdown involved in the
+ processor emulation, it's around 50 times slower.</para>
+
+ <para>Cachegrind can only deliver a flat profile. There is no call
+ relationship among the functions of an application stored. Thus,
+ inclusive costs, i.e. costs of a function including the cost of all
+ functions called from there, cannot be calculated. Callgrind extends
+ Cachegrind by including call relationship and exact event counts
+ spent while doing a call.</para>
+
+ <para>Because Callgrind (and Cachegrind) is based on simulation, the
+ slowdown due to processing the synthetic runtime events does not
+ influence the results. See <xref linkend="cl-manual.usage"/> for more
+ details on the possibilities.</para>
+
+ </sect2>
+
+</sect1>
+
+
+<sect1 id="cl-manual.usage" xreflabel="Usage">
+<title>Usage</title>
+
+ <sect2 id="cl-manual.basics" xreflabel="Basics">
+ <title>Basics</title>
+
+ <para>To start a profile run for a program, execute:
+ <screen>callgrind [callgrind options] your-program [program options]</screen>
+ </para>
+
+ <para>While the simulation is running, you can observe execution with
+ <screen>callgrind_control -b</screen>
+ This will print out a current backtrace. To annotate the backtrace with
+ event counts, run
+ <screen>callgrind_control -e -b</screen>
+ </para>
+
+ <para>After program termination, a profile data file named
+ <computeroutput>callgrind.out.pid</computeroutput>
+ is generated with <emphasis>pid</emphasis> being the process ID
+ of the execution of this profile run.</para>
+
+ <para>The data file contains information about the calls made in the
+ program among the functions executed, together with events of type
+ <command>Instruction Read Accesses</command> (Ir).</para>
+
+ <para>If you are additionally interested in memory accesses of your
+ program, and if an access can be satisfied by loading from 1st/2nd
+ level cache, use Callgrind with the option
+ <option><xref linkend="opt.simulate-cache"/>=yes.</option>
+ This will further slow down the run approximatly by a factor of 2.</para>
+
+ <para>If the program section you want to profile is somewhere in the
+ middle of the run, it is beneficial to
+ <emphasis>fast forward</emphasis> to this section without any
+ profiling at all, and switch it on later. This is achieved by using
+ <option><xref linkend="opt.instr-atstart"/>=no</option>
+ and interactively use
+ <computeroutput>callgrind_control -i on</computeroutput> before the
+ interesting code section is about to be executed.</para>
+
+ <para>If you want to be able to see assembler annotation, specify
+ <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce
+ profile data at instruction granularity. Note that this type of annotation
+ is only available with KCachegrind. For assembler annotation, it also is
+ interesting to see more details of the control flow inside of functions,
+ ie. (conditional) jumps. This will be collected by further specifying
+ <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para>
+
+ </sect2>
+
+
+ <sect2 id="cl-manual.dumps"
+ xreflabel="Multiple dumps from one program run">
+ <title>Multiple dumps from one program run</title>
+
+ <para>Often, you aren't interested in time characteristics of a full
+ program run, but only of a small part of it (e.g. execution of one
+ algorithm). If there are multiple algorithms or one algorithm
+ running with different input data, it's even useful to get different
+ profile information for multiple parts of one program run.</para>
+
+ <para>In full detail, a generated profile data files is named
+<screen>
+callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
+</screen>
+ </para>
+ <para>where <emphasis>pid</emphasis> is the PID of the running
+ program, <emphasis>part</emphasis> is a number incremented on each
+ dump (".part" is skipped for the dump at program termination), and
+ <emphasis>threadID</emphasis> is a thread identification
+ ("-threadID" is only used if you request dumps of individual
+ threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para>
+
+ <para>There are different ways to generate multiple profile dumps
+ while a program is running under Callgrind's supervision. Still,
+ all methods trigger the same action, viz. "dump all profile
+ information since the last dump or program start, and zero cost
+ counters afterwards". To allow for zeroing cost counters without
+ dumping, there is a second action "zero all cost counters now".
+ The different methods are:</para>
+ <itemizedlist>
+
+ <listitem>
+ <para><command>Dump on program termination.</command>
+ This method is the standard way and doesn't need any special
+ action from your side.</para>
+ </listitem>
+
+ <listitem>
+ <para><command>Spontaneous, interactive dumping.</command> Use
+ <screen>callgrind_control -d [hint [PID/Name]]</screen> to
+ request the dumping of profile information of the supervised
+ application with PID or Name. <emphasis>hint</emphasis> is an
+ arbitrary string you can optionally specify to later be able to
+ distinguish profile dumps. The control program will not terminate
+ before the dump is completely written. Note that the application
+ must be actively running for detection of the dump command. So,
+ for a GUI application, resize the window or for a server send a
+ request.</para>
+ <para>If you are using <ulink url="&cl-gui;">KCachegrind</ulink>
+ for browsing of profile information, you can use the toolbar
+ button <command>Force dump</command>. This will request a dump
+ and trigger a reload after the dump is written.</para>
+ </listitem>
+
+ <listitem>
+ <para><command>Periodic dumping after execution of a specified
+ number of basic blocks</command>. For this, use the command line
+ option <option><xref linkend="opt.dump-every-bb"/>=count</option>.
+ The resultion of the internal basic block counter of Valgrind is
+ only rough, so you should at least specify a interval of 50000
+ basic blocks.</para>
+ </listitem>
+
+ <listitem>
+ <para><command>Dumping at enter/leave of all functions whose name
+ starts with</command> <emphasis>funcprefix</emphasis>. Use the
+ option <option><xref linkend="opt.dump-before"/>=funcprefix</option>
+ and <option><xref linkend="opt.dump-after"/>=funcprefix</option>.
+ To zero cost counters before entering a function, use
+ <option><xref linkend="opt.zero-before"/>=funcprefix</option>.
+ The prefix method for specifying function names was choosen to
+ ease the use with C++: you don't have to specify full
+ signatures.</para> <para>You can specify these options multiple
+ times for different function prefixes.</para>
+ </listitem>
+
+ <listitem>
+ <para><command>Program controlled dumping.</command>
+ Put <screen><![CDATA[#include <valgrind/callgrind.h>]]></screen>
+ into your source and add
+ <computeroutput>CALLGRIND_DUMP_STATS;</computeroutput> when you
+ want a dump to happen. Use
+ <computeroutput>CALLGRIND_ZERO_STATS;</computeroutput> to only
+ zero cost centers.</para>
+ <para>In Valgrind terminology, this way is called "Client
+ requests". The given macros generate a special instruction
+ pattern with no effect at all (i.e. a NOP). Only when run under
+ Valgrind, the CPU simulation engine detects the special
+ instruction pattern and triggers special actions like the ones
+ described above.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>If you are running a multi-threaded application and specify the
+ command line option <option><xref linkend="opt.separate-threads"/>=yes</option>,
+ every thread will be profiled on its own and will create its own
+ profile dump. Thus, the last two methods will only generate one dump
+ of the currently running thread. With the other methods, you will get
+ multiple dumps (one for each thread) on a dump request.</para>
+
+ </sect2>
+
+
+
+ <sect2 id="cl-manual.limits"
+ xreflabel="Limiting range of event collection">
+ <title>Limiting range of event collection</title>
+
+ <para>For aggregating events (function enter/leave,
+ instruction execution, memory access) into event numbers,
+ first, the events must be recognizable by Callgrind, and second,
+ the collection state must be switched on.</para>
+
+ <para>Event recognition is only possible if <emphasis>instrumentation</emphasis>
+ for program code is switched on. This is the default, but for faster
+ execution (identical to <computeroutput>valgrind --tool=none</computeroutput>),
+ it can be temporarely switched off until the program reaches parts which
+ are interesting to be profiled. Callgrind can start without instrumentation
+ by specifying option <option><xref linkend="opt.instr-atstart"/>=no</option>.
+ The instrumentation state can be switched on interactively
+ with <screen>callgrind_control -i on</screen>
+ and off by specifying "off" instead of "on".
+ Furthermore, instrumentation state can be programatically changed with
+ the macros <computeroutput>CALLGRIND_START_INSTRUMENTATION;</computeroutput>
+ and <computeroutput>CALLGRIND_STOP_INSTRUMENTATION;</computeroutput>.
+ </para>
+
+ <para>In addition to instrumentation, events must be allowed to be collected
+ to be counted. This, too, is by default the case.
+ You can explicitly control for which part of your program you want to
+ collect events by using
+ <option><xref linkend="opt.toggle-collect"/>=funcprefix</option>.
+ This will toggle the collection state on entering and leaving a
+ function. When specifying this option, the default collection state
+ at program start is "off". Thus, only events happening while running
+ inside of functions starting with <emphasis>funcprefix</emphasis> will
+ be collected. Recursive
+ calls of functions with <emphasis>funcprefix</emphasis> do not trigger
+ any action.</para>
+
+ <para>It is important to note that with instrumentation switched off, the
+ cache simulator can not see any memory access events, and thus, any
+ simulated cache state will be frozen and wrong without instrumentation.
+ Therefore, to get useful cache events (hits/misses) after switching on
+ instrumentation, the cache first must warm up,
+ probably leading to many <emphasis>cold misses</emphasis>
+ which would not have happened in reality. If you do not want to see these,
+ start actual collection a few million instructions after you have switched
+ on instrumentation</para>.
+
+
+ </sect2>
+
+
+
+ <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
+ <title>Avoiding cycles</title>
+
+ <para>Each group of functions with any two of them happening to have a
+ call chain from one to the other, is called a cycle. For example,
+ with A calling B, B calling C, and C calling A, the three functions
+ A,B,C build up one cycle.</para>
+
+ <para>If a call chain goes multiple times around inside of a cycle,
+ with profiling, you can not distinguish event counts coming from the
+ first round or the second. Thus, it makes no sense to attach any inclusive
+ cost to a call among functions inside of one cycle.
+ If "A > B" appears multiple times in a call chain, you
+ have no way to partition the one big sum of all appearances of "A >
+ B". Thus, for profile data presentation, all functions of a cycle are
+ seen as one big virtual function.</para>
+
+ <para>Unfortunately, if you have an application using some callback
+ mechanism (like any GUI program), or even with normal polymorphism (as
+ in OO languages like C++), it's quite possible to get large cycles.
+ As it is often impossible to say anything about performance behaviour
+ inside of cycles, it is useful to introduce some mechanisms to avoid
+ cycles in call graphs at all. This is done by treating the same
+ function in different ways, depending on the current execution
+ context. Either by giving them different names, or by ignoring calls to
+ functions at all.</para>
+
+ <para>There is an option to ignore calls to a function with
+ <option><xref linkend="opt.fn-skip"/>=funcprefix</option>. E.g., you
+ usually do not want to see the trampoline functions in the PLT sections
+ for calls to functions in shared libraries. You can see the difference
+ if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>.
+ If a call is ignored, cost events happening will be attached to the
+ enclosing function.</para>
+
+ <para>If you have a recursive function, you can distinguish the first
+ 10 recursion levels by specifying
+ <option><xref linkend="opt.fn-recursion-num"/>=funcprefix</option>.
+ Or for all functions with
+ <option><xref linkend="opt.fn-recursion"/>=10</option>, but this will
+ give you much bigger profile data files. In the profile data, you will see
+ the recursion levels of "func" as the different functions with names
+ "func", "func'2", "func'3" and so on.</para>
+
+ <para>If you have call chains "A > B > C" and "A > C > B"
+ in your program, you usually get a "false" cycle "B <> C". Use
+ <option><xref linkend="opt.fn-caller-num"/>=B</option>
+ <option><xref linkend="opt.fn-caller-num"/>=C</option>,
+ and functions "B" and "C" will be treated as different functions
+ depending on the direct caller. Using the apostrophe for appending
+ this "context" to the function name, you get "A > B'A > C'B"
+ and "A > C'A > B'C", and there will be no cycle. Use
+ <option><xref linkend="opt.fn-caller"/>=3</option> to get a 2-caller
+ dependency for all functions. Again, this will multiplicate the
+ profile data size.</para>
+
+ </sect2>
+
+</sect1>
+
+
+<sect1 id="cl-manual.options" xreflabel="Command line option reference">
+<title>Command line option reference</title>
+
+<para>
+This reference groups options into classes, and uses the same order as
+the output as <computeroutput>callgrind --help</computeroutput>.
+</para>
+
+<sect2 id="cl-manual.options.misc"
+ xreflabel="Miscellaneous options">
+<title>Miscellaneous options</title>
+
+<variablelist id="cmd-options.misc">
+
+ <varlistentry>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>Show summary of options. This is a short version of this
+ manual section.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>Show version of callgrind.</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.creation"
+ xreflabel="Dump creation options">
+<title>Dump creation options</title>
+
+<para>
+These options influence the name and format of the profile data files.
+</para>
+
+<variablelist id="cmd-options.creation">
+
+ <varlistentry id="opt.base">
+ <term>
+ <option><![CDATA[--base=<prefix> [default: callgrind.out] ]]></option>
+ </term>
+ <listitem>
+ <para>Specify another base name for the dump file names. To
+ distinguish different profile runs of the same application,
+ <computeroutput>.<pid></computeroutput> is appended to the
+ base dump file name with
+ <computeroutput><pid></computeroutput> being the process ID
+ of the profile run (with multiple dumps happening, the file name
+ is modified further; see below).</para> <para>This option is
+ especially usefull if your application changes its working
+ directory. Usually, the dump file is generated in the current
+ working directory of the application at program termination. By
+ giving an absolute path with the base specification, you can force
+ a fixed directory for the dump files.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
+ <term>
+ <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>This specifies that event count relation at instruction granularity
+ should be available in the profile data file. This allows assembler
+ annotation, but currently can only be shown with KCachegrind.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.dump-line" xreflabel="--dump-line">
+ <term>
+ <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
+ </term>
+ <listitem>
+ <para>This specifies that event count relation at source line granularity
+ should be available in the profile data file. This allows source
+ annotation for source which was compiled with debug information ("-g").
+ This always should be enabled.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
+ <term>
+ <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
+ </term>
+ <listitem>
+ <para>This option influences the output format of the profile data.
+ It specifies whether strings (file and function names) should be
+ identified by numbers. This shrinks the file size, but makes it more difficult
+ to be read by humans (which is not recommand either way).</para>
+ <para>However, this currently has to be switched off if
+ the files are to be read by
+ <computeroutput>callgrind_annotate</computeroutput>!</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
+ <term>
+ <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
+ </term>
+ <listitem>
+ <para>This option influences the output format of the profile data.
+ It specifies whether numerical positions are always specified as absolute
+ values or are allowed to be relative to previous numbers.
+ This shrinks the file size,</para>
+ <para>However, this currently has to be switched off if
+ the files are to be read by
+ <computeroutput>callgrind_annotate</computeroutput>!</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
+ <term>
+ <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>When multiple profile data parts are to be generated, these
+ parts are appended to the same output file if this option is set to
+ "yes". Not recommand.</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.activity"
+ xreflabel="Activity options">
+<title>Activity options</title>
+
+<para>
+These options specify when different actions regarding event counts are to
+be executed. For interactive control use
+<computeroutput>callgrind_control</computeroutput>.
+</para>
+
+<variablelist id="cmd-options.activity">
+
+ <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
+ <term>
+ <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
+ </term>
+ <listitem>
+ <para>Dump profile data each <count> basic blocks</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.dump-before" xreflabel="--dump-before">
+ <term>
+ <option><![CDATA[--dump-before=<prefix> ]]></option>
+ </term>
+ <listitem>
+ <para>Dump when entering a function starting with <prefix></para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.zero-before" xreflabel="--zero-before">
+ <term>
+ <option><![CDATA[--zero-before=<prefix> ]]></option>
+ </term>
+ <listitem>
+ <para>Zero all costs when entering a function starting with <prefix></para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.dump-after" xreflabel="--dump-after">
+ <term>
+ <option><![CDATA[--dump-after=<prefix> ]]></option>
+ </term>
+ <listitem>
+ <para>Dump when leaving a function starting with <prefix></para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.collection"
+ xreflabel="Data collection options">
+<title>Data collection options</title>
+
+<para>
+These options specify when events are to be aggregated into event counts.
+Also see <xref linkend="cl-manual.limits"/>.</para>
+
+<variablelist id="cmd-options.collection">
+
+ <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
+ <term>
+ <option><![CDATA[--instr-atstart=<yes|no> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>Specify if you want Callgrind to start simulation and
+ profiling from the beginning. If not, Callgrind will not be able
+ to collect any information, including calls, but it will have at
+ most a slowdown of around 4, which is the minimum Valgrind
+ overhead. Instrumentation can be interactively switched on via
+ <computeroutput>callgrind_control -i on</computeroutput>.</para>
+ <para>Note that the resulting call graph will most probably not
+ contain <computeroutput>main</computeroutput>, but all the
+ functions executed after instrumentation was switched on.
+ Instrumentation can also programatically switched on/off. See the
+ Callgrind include file
+ <computeroutput><callgrind.h></computeroutput> for the macro
+ you have to use in your source code.</para> <para>For cache
+ simulation, results will be a little bit off when switching on
+ instrumentation later in the program run, as the simulator starts
+ with an empty cache at that moment. Switch on event collection
+ later to cope with this error.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.collect-atstart">
+ <term>
+ <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
+ </term>
+ <listitem>
+ <para>Specify whether event collection is switched on at beginning
+ of the profile run.</para>
+ <para>To only look at parts of your program, you have two
+ possibilities:</para>
+ <orderedlist>
+ <listitem>
+ <para>Zero event counters before entering the program part you
+ want to profile, and dump the event counters to a file after
+ leaving that program part.</para>
+ </listitem>
+ <listitem>
+ <para>Switch on/off collection state as needed to only see
+ event counters happening while inside of the program part you
+ want to profile.</para>
+ </listitem>
+ </orderedlist>
+ <para>The second option can be used if the programm part you want to
+ profile is called many times. Option 1, i.e. creating a lot of
+ dumps is not practical here.</para> <para>Collection state can be
+ toggled at entering and leaving of a given function with the
+ option <xref linkend="opt.toggle-collect"/>. For this, collection
+ state should be switched off at the beginning. Note that the
+ specification of <computeroutput>--toggle-collect</computeroutput>
+ implicitly sets
+ <computeroutput>--collect-state=no</computeroutput>.</para>
+ <para>Collection state can be toggled also by using a Valgrind
+ User Request in your application. For this, include
+ <computeroutput>valgrind/callgrind.h</computeroutput> and specify
+ the macro
+ <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput> at the
+ needed positions. This only will have any effect if run under
+ supervision of the Callgrind tool.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
+ <term>
+ <option><![CDATA[--toggle-collect=<prefix> ]]></option>
+ </term>
+ <listitem>
+ <para>Toggle collection on enter/leave a function starting with
+ <prefix>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps=">
+ <term>
+ <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>This specifies whether information for (conditional) jumps
+ should be collected. Same as above, callgrind_annotate currently is not
+ able to show you the data. You have to use KCachegrind to get jump
+ arrows in the annotated code.</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.separation"
+ xreflabel="Cost entity separation options">
+<title>Cost entity separation options</title>
+
+<para>
+These options specify how event count relation to execution contexts should be
+done. More specifically, this specifies e.g. if the recursion level or the
+call chain leading to a function should be accounted for, are if the
+thread ID should be remembered.
+Also see <xref linkend="cl-manual.cycles"/>.</para>
+
+<variablelist id="cmd-options.separation">
+
+ <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
+ <term>
+ <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>This option specifies whether profile data should be generated
+ separately for every thread. If yes, the file names get "-threadID"
+ appended.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.fn-recursion" xreflabel="--fn-recursion">
+ <term>
+ <option><![CDATA[--fn-recursion=<level> [default: 2] ]]></option>
+ </term>
+ <listitem>
+ <para>Separate function recursions, maximal <level>.
+ See <xref linkend="cl-manual.cycles"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.fn-caller" xreflabel="--fn-caller">
+ <term>
+ <option><![CDATA[--fn-caller=<callers> [default: 0] ]]></option>
+ </term>
+ <listitem>
+ <para>Separate contexts by maximal <callers> functions in the
+ call chain. See <xref linkend="cl-manual.cycles"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
+ <term>
+ <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
+ </term>
+ <listitem>
+ <para>Ignore calls to/from PLT sections.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
+ <term>
+ <option><![CDATA[--fn-skip=<function> ]]></option>
+ </term>
+ <listitem>
+ <para>Ignore calls to/from a given function? E.g. if you have a
+ call chain A > B > C, and you specify function B to be
+ ignored, you will only see A > C.</para>
+ <para>This is very convenient to skip functions handling callback
+ behaviour. E.g. for the SIGNAL/SLOT mechanism in QT, you only want
+ to see the function emitting a signal to call the slots connected
+ to that signal. First, determine the real call chain to see the
+ functions needed to be skipped, then use this option.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.fn-group">
+ <term>
+ <option><![CDATA[--fn-group<number>=<function> ]]></option>
+ </term>
+ <listitem>
+ <para>Put a function into a separation group. This influences the
+ context name for cycle avoidance. All functions inside of such a
+ group are treated as being the same for context name building, which
+ resembles the call chain leading to a context. By specifying function
+ groups with this option, you can shorten the context name, as functions
+ in the same group will not appear in sequence in the name. </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.fn-recursion-num" xreflabel="--fn-recursion10">
+ <term>
+ <option><![CDATA[--fn-recursion<number>=<function> ]]></option>
+ </term>
+ <listitem>
+ <para>Separate <number> recursions for <function>.
+ See <xref linkend="cl-manual.cycles"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="opt.fn-caller-num" xreflabel="--fn-caller2">
+ <term>
+ <option><![CDATA[--fn-caller<number>=<function> ]]></option>
+ </term>
+ <listitem>
+ <para>Separate <number> callers for <function>.
+ See <xref linkend="cl-manual.cycles"/>.</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.simulation"
+ xreflabel="Cache simulation options">
+<title>Cache simulation options</title>
+
+<variablelist id="cmd-options.simulation">
+
+ <varlistentry id="opt.simulate-cache" xreflabel="--simulate-cache">
+ <term>
+ <option><![CDATA[--simulate-cache=<yes|no> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>Specify if you want to do full cache simulation. Disabled by
+ default; only instruction read accesses will be profiled.</para>
+ <para>Note however, that estimating of how much real time your
+ program will need only by using the instruction read counts is
+ impossible. Use it if you want to find out how many times
+ different functions are called and there call relation.</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+
+</sect2>
+
+</sect1>
+
+</chapter>
+
+
--- /dev/null
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<book id="cl-docs" xreflabel="Callgrind Documentation">
+
+<bookinfo>
+ <title>Callgrind Documentation</title>
+ <subtitle>A call-graph generating Cache Simulator and Profiler</subtitle>
+ <releaseinfo>Release &cl-version; &cl-date;</releaseinfo>
+ <copyright>
+ <year>&cl-lifespan;</year>
+ <holder>
+ <link linkend="dist.authors" endterm="dist.authors.title"></link>
+ </holder>
+ </copyright>
+ <author>
+ <email><ulink url="mailto:&cl-email;">&cl-email;</ulink></email>
+ </author>
+ <legalnotice>
+ <para>Permission is granted to copy, distribute and/or modify this
+ document under the terms of the GNU Free Documentation License,
+ Version 1.2 or any later version published by the Free Software Foundation;
+ with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+ Texts. A copy of the license is included in the section entitled
+ <xref linkend="dist.license-gfdl"/>.</para>
+ </legalnotice>
+</bookinfo>
+
+
+<xi:include href="cl-manual.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+<xi:include href="cl-format.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+<chapter id="man-annotate" xreflabel="Callgrind Annotate (1)">
+ <title>Callgrind Annotate (1)</title>
+ <xi:include href="man-annotate.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+</chapter>
+
+<chapter id="man-control" xreflabel="Callgrind Control (1)">
+ <title>Callgrind Control (1)</title>
+ <xi:include href="man-control.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+</chapter>
+
+<!-- included for the sake of completeness -->
+<chapter id="man-callgrind" xreflabel="Callgrind (1)">
+ <title>Callgrind (1)</title>
+ <xi:include href="man-callgrind.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+</chapter>
+
+<!-- Because these are all text files, we have to wrap -->
+<!-- them in suitable XML. Hence the chapter/title stuff -->
+<chapter id="dist.authors" xreflabel="Authors">
+ <title id="dist.authors.title">AUTHORS</title>
+ <literallayout>
+ <xi:include href="../../AUTHORS" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+
+<chapter id="dist.readme" xreflabel="Readme">
+ <title>README</title>
+ <literallayout>
+ <xi:include href="../../README" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+
+<chapter id="dist.changelog" xreflabel="ChangeLog">
+ <title>ChangeLog</title>
+ <literallayout>
+ <xi:include href="../../ChangeLog" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+
+<!-- NEWS is empty, so comment it out -->
+<!--
+<chapter id="dist.news" xreflabel="News">
+ <title>NEWS</title>
+ <literallayout>
+ <xi:include href="../../NEWS" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+-->
+
+<chapter id="dist.install" xreflabel="Install">
+ <title>INSTALL</title>
+ <literallayout>
+ <xi:include href="../../INSTALL" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+
+<chapter id="dist.license-gpl" xreflabel=" The GNU General Public License">
+ <title>The GNU General Public License</title>
+ <literallayout>
+ <xi:include href="../../COPYING" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+
+<chapter id="dist.license-gfdl" xreflabel=" The GNU Free Documentation License">
+ <title>The GNU Free Documentation License</title>
+ <literallayout>
+ <xi:include href="../COPYING.DOCS" parse="text"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+ </literallayout>
+</chapter>
+
+
+</book>
--- /dev/null
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<refentry id="callgrind-annotate">
+
+<refmeta>
+ <refentrytitle>Callgrind Annotate</refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo class="a-source">May 13, 2003</refmiscinfo>
+</refmeta>
+
+<refnamediv id="a-name">
+ <refname>callgrind_annotate</refname>
+ <refpurpose>produces human readable ASCII output from profile
+ information in <command>cachegrind.out</command> files</refpurpose>
+</refnamediv>
+
+<refsynopsisdiv id="a-synopsis">
+ <cmdsynopsis>
+ <command>callgrind_annotate</command>
+ <arg choice="opt"><replaceable>options</replaceable></arg>
+ <arg choice="opt"><replaceable>source-files</replaceable></arg>
+ </cmdsynopsis>
+</refsynopsisdiv>
+
+
+<refsect1 id="a-description">
+<title>Description</title>
+
+<para>This manual page documents briefly the
+<command>callgrind_annotate</command> command. This manual page was
+written for the Debian distribution because the original program does
+not have a manual page.</para>
+
+</refsect1>
+
+
+<refsect1 id="a-options">
+<title>Options</title>
+
+<para>This program follows the usual GNU command line syntax, with long
+options starting with two dashes ('--'). A summary of options is
+included below.</para>
+
+<variablelist remap="TP">
+
+ <varlistentry>
+ <term><option>-h, --help</option></term>
+ <listitem>
+ <para>Show summary of options.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>Show version of callgrind_annotate.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>--show=A,B,C [default: all]</option>
+ </term>
+ <listitem>
+ <para>only show figures for events A,B,C</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>--sort=A,B,C</option>
+ </term>
+ <listitem>
+ <para>sort columns by events A,B,C [event column order]</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
+ </term>
+ <listitem>
+ <para>percentage of counts (of primary sort event) we are
+ interested in</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--auto=<yes|no> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>annotate all source files containing functions that helped
+ reach the event count threshold</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>--context=N [default: 8] </option>
+ </term>
+ <listitem>
+ <para>print N lines of context before and after annotated
+ lines</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--cumulative=<yes|no> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>add subroutine costs to functions calls</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
+ </term>
+ <listitem>
+ <para>print for each function their callers, the called functions
+ or both</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option><![CDATA[-I, --include=<dir> ]]></option>
+ </term>
+ <listitem>
+ <para>add <dir> to the list of directories to search for source
+ files</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+
+</refsect1>
+
+
+<refsect1 id="a-see_also">
+<title>See Also</title>
+
+<para><filename>&cl-doc-path;</filename></para>
+
+</refsect1>
+
+
+<refsect1 id="a-author">
+<title>Author</title>
+
+<para>This manual page was written by
+Philipp Frauenfelder <pfrauenf@debian.org>, for the Debian
+GNU/Linux system (but may be used by others).</para>
+</refsect1>
+
+
+</refentry>
--- /dev/null
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<refentry id="callgrind">
+<refmeta>
+ <refentrytitle>Callgrind</refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo class="source">November 18, 2005</refmiscinfo>
+</refmeta>
+
+<refnamediv id="name">
+ <refname>callgrind</refname>
+ <refpurpose>calls <command>valgrind</command> with the callgrind tool</refpurpose>
+</refnamediv>
+
+<refsynopsisdiv id="synopsis">
+ <cmdsynopsis>
+ <command>callgrind</command>
+ <arg choice="opt"><replaceable>options</replaceable></arg>
+ <arg choice="plain"><replaceable>progs-and-args</replaceable></arg>
+ </cmdsynopsis>
+</refsynopsisdiv>
+
+
+<refsect1 id="description">
+<title>Description</title>
+
+<para><command>Callgrind</command> is a profiling tool similar to gprof,
+but by being able to observe a program run in great detail - using
+Valgrind - it can give much more information. The binary does not have
+to be prepared for profiling with <command>callgrind</command> in any
+special way. Still, it is recommended to compile with debug information.</para>
+
+<para><command>Callgrind</command> builds up the call graph of a program
+while it is running, and optionally does cache simulation. The collected
+profiling data can be stored into an output file multiple times in a
+program run, optionally separately for every thread in the case of
+multithreaded code. For interactive inspection and control, see
+<command>callgrind_control</command>. The data produced
+(callgrind.out.PID) can be analysed with
+<command>callgrind_annotate</command> or better with the graphical profile
+visualization <command>KCachegrind</command>. Further documentation can
+be found in HTML format either on your filesystem:
+<filename>&cl-doc-path;</filename> or online at
+<filename>&cl-doc-url;</filename>.</para>
+
+</refsect1>
+
+
+<refsect1 id="options">
+<title>Options</title>
+
+<para>This program follows the usual GNU command line syntax, with long
+options starting with two dashes ('--').</para>
+
+
+<xi:include href="cl-manual.xml" xpointer="cmd-options"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+</refsect1>
+
+
+
+<refsect1 id="see_also">
+<title>See Also</title>
+
+<para><command>callgrind_control</command>,
+<command>callgrind_annotate</command>,
+<filename>&cl-doc-path;</filename>
+</para>
+
+</refsect1>
+
+
+<refsect1 id="author">
+<title>Author</title>
+
+<para>This manual page was written by Josef Weidendorfer <&cl-email;>.</para>
+
+
+</refsect1>
+
+
+<refsect1 id="copyright">
+<title>Copyright</title>
+
+<para>Copyright © &cl-lifespan; Josef Weidendorfer</para>
+<para>This is free software; see the source for copying conditions.
+There is NO warranty; not even for MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.</para>
+
+</refsect1>
+
+
+
+</refentry>
+
--- /dev/null
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<refentry id="callgrind-control">
+<refmeta>
+ <refentrytitle>Callgrind Control</refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo class="c-source">October, 2005</refmiscinfo>
+</refmeta>
+
+<refnamediv id="c-name">
+ <refname>callgrind_control</refname>
+ <refpurpose>observe and control applications currently running under
+ supervision of <command>callgrind</command></refpurpose>
+</refnamediv>
+
+<refsynopsisdiv id="c-synopsis">
+ <cmdsynopsis>
+ <command>callgrind_control</command>
+ <arg choice="opt"><replaceable>options</replaceable></arg>
+ <arg choice="opt" rep="repeat"><replaceable>pid/program-name</replaceable></arg>
+ </cmdsynopsis>
+</refsynopsisdiv>
+
+
+<refsect1 id="c-description">
+<title>Description</title>
+
+<para>This manual page documents briefly the
+<command>callgrind_control</command> command. When not specifying a
+<command>pid/program name</command> argument, all applications run
+by callgrind on this system will be used for actions given by the
+specified option(s). The default action is to give short information
+for the applications run by callgrind.</para>
+
+</refsect1>
+
+
+<refsect1 id="c-options">
+<title>Options</title>
+
+<para>This program follows the usual GNU command line syntax, with long
+options starting with two dashes ("--"). A summary of options is
+included below.</para>
+
+<variablelist remap="TP">
+
+ <varlistentry>
+ <term><option>-h, --help</option></term>
+ <listitem>
+ <para>Show summary of options.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>Show version of callgrind_control.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <listitem>
+ <para>Show statistics</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-b</option></term>
+ <listitem>
+ <para>Show stack trace</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e [A,B,C] [default: all] </option></term>
+ <listitem>
+ <para>Only show figures for events A,B,C</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-z</option></term>
+ <listitem>
+ <para>Zero cost counters</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d, --dump [hint]</option></term>
+ <listitem>
+ <para>Request the dumping of profile information. Optionally, a
+ string can be specified which is written into the dump as part of
+ the Trigger reason. This can be used to distinguish multiple dumps.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-k</option></term>
+ <listitem>
+ <para>Kill</para>
+ </listitem>
+ </varlistentry>
+
+</variablelist>
+
+</refsect1>
+
+
+<refsect1 id="c-see_also">
+<title>See Also</title>
+
+<para><filename>&cl-doc-path;</filename></para>
+
+</refsect1>
+
+
+<refsect1 id="c-author">
+<title>Author</title>
+
+<para>This manual page was written by Josef Weidendorfer <&cl-email;>.</para>
+
+
+</refsect1>
+
+
+</refentry>
+
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../cachegrind/docs/cg-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
+ <xi:include href="../../callgrind/docs/cl-manual.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../massif/docs/ms-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../helgrind/docs/hg-manual.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="../../cachegrind/docs/cg-tech-docs.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
+ <xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
<xi:include href="writing-tools.xml" parse="xml"
xmlns:xi="http://www.w3.org/2001/XInclude" />
instructions executed and cache misses incurred.</para>
</listitem>
+ <listitem>
+ <para><option>callgrind</option> adds call graph tracing to cachegrind. It can be
+ used to get call counts and inclusive cost for each call happening in your
+ program. In addition to cachegrind, callgrind can annotate threads separatly,
+ and every instruction of disassembler output of your program with the number of
+ instructions executed and cache misses incurred.</para>
+ </listitem>
+
<listitem>
<para><option>helgrind</option> spots potential race conditions in
your program.</para>
+<refsect1 id="callgrind-options">
+<title>Callgrind Options</title>
+
+<xi:include href="../../callgrind/docs/cl-manual.xml"
+ xpointer="cl.opts.list"
+ xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+</refsect1>
+
+
+
<refsect1 id="massif-options">
<title>Massif Options</title>