From: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
Date: Mon, 20 Mar 2006 10:29:30 +0000 (+0000)
Subject: Callgrind merge: documentation
X-Git-Tag: svn/VALGRIND_3_2_0~178
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=1cdac21bd90854116726b80868c92490a796956a;p=thirdparty%2Fvalgrind.git

Callgrind merge: documentation


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5781
---

diff --git a/callgrind/docs/Makefile.am b/callgrind/docs/Makefile.am
index d539a6ecd5..540f9313a7 100644
--- a/callgrind/docs/Makefile.am
+++ b/callgrind/docs/Makefile.am
@@ -1 +1,8 @@
-EXTRA_DIST = 
+EXTRA_DIST =  \
+	cl-entities.xml \
+	cl-manual.xml \
+	cl-format.xml \
+	index.xml \
+	man-annotate.xml \
+	man-control.xml \
+	man-callgrind.xml
diff --git a/callgrind/docs/cl-entities.xml b/callgrind/docs/cl-entities.xml
new file mode 100644
index 0000000000..727962fa67
--- /dev/null
+++ b/callgrind/docs/cl-entities.xml
@@ -0,0 +1,22 @@
+<!-- callgrind release + version stuff -->
+<!ENTITY cl-version  "0.10.1">
+<!ENTITY cl-date     "November 25 2005">
+
+<!-- copyright length of years -->
+<!ENTITY cl-lifespan "2000-2005">
+
+<!-- website + email -->
+<!ENTITY cl-email    "Josef.Weidendorfer@gmx.de">
+<!ENTITY cl-url      "http://www.valgrind.org/info/developers.html">
+
+<!-- external urls used in the docs.  kept in here because when  -->
+<!-- they change it's a real pain tracking them down in the docs -->
+<!ENTITY vg-url      "http://www.valgrind.org/">
+<!ENTITY cg-doc-url  "http://www.valgrind.org/docs/manual/cg-manual.html">
+<!ENTITY cg-tool-url "http://www.valgrind.org/info/tools.html#cachegrind">
+<!ENTITY cl-gui      "http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex">
+
+<!-- path/to/callgrind/docs in valgrind install tree -->
+<!-- only used in the manpages -->
+<!ENTITY cl-doc-path  "/usr/share/doc/valgrind/html/callgrind.html">
+<!ENTITY cl-doc-url   "http://www.valgrind.org/docs/manual/cl-manual.html">
diff --git a/callgrind/docs/cl-format.xml b/callgrind/docs/cl-format.xml
new file mode 100644
index 0000000000..6777551d10
--- /dev/null
+++ b/callgrind/docs/cl-format.xml
@@ -0,0 +1,551 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+<chapter id="cl-format" xreflabel="Callgrind Format Specification">
+<title>Callgrind Format Specification</title>
+
+<para>This chapter describes the Callgrind Profile Format, Version 1.</para>
+
+<para>A synonymous name is "Calltree Profile Format". These names actually mean
+the same since Callgrind was previously named Calltree.</para>
+
+<para>The format description is meant for the user to be able to understand the
+file contents; but more important, it is given for authors of measurement or
+visualization tools to be able to write and read this format.</para>
+
+<sect1 id="cl-format.overview" xreflabel="Overview">
+<title>Overview</title>
+
+<para>The profile data format is ASCII based.
+It is written by Callgrind, and it is upwards compatible
+to the format used by Cachegrind (ie. Cachegrind uses a subset). It can
+be read by callgrind_annotate and KCachegrind.</para>
+
+<para>This chapter gives on overview of format features and examples.
+For detailed syntax, look at the format reference.</para>
+
+<sect2 id="cl-format.overview.basics" xreflabel="Basic Structure">
+<title>Basic Structure</title>
+
+<para>Each file has a header part of an arbitrary number of lines of the
+format "key: value". The lines with key "positions" and "events" define
+the meaning of cost lines in the second part of the file: the value of
+"positions" is a list of subpositions, and the value of "events" is a list
+of event type names. Cost lines consist of subpositions followed by 64-bit
+counters for the events, in the order specified by the "positions" and "events"
+header line.</para>
+
+<para>The "events" header line is always required in contrast to the optional
+line for "positions", which defaults to "line", i.e. a line number of some
+source file. In addition, the second part of the file contains position
+specifications of the form "spec=name". "spec" can be e.g. "fn" for a
+function name or "fl" for a file name. Cost lines are always related to
+the function/file specifications given directly before.</para>
+
+</sect2>
+
+<sect2 id="cl-format.overview.example1" xreflabel="Simple Example">
+<title>Simple Example</title>
+
+<para>
+<screen>events: Cycles Instructions Flops
+fl=file.f
+fn=main
+15 90 14 2
+16 20 12</screen></para>
+
+<para>The above example gives profile information for event types "Cycles",
+"Instructions", and "Flops". Thus, cost lines give the number of CPU cycles
+passed by, number of executed instructions, and number of floating point
+operations executed while running code corresponding to some source
+position. As there is no line specifying the value of "positions", it defaults
+to "line", which means that the first number of a cost line is always a line
+number.</para>
+
+<para>Thus, the first cost line specifies that in line 15 of source file
+"file.f" there is code belonging to function "main". While running, 90 CPU
+cycles passed by, and 2 of the 14 instructions executed were floating point
+operations. Similarily, the next line specifies that there were 12 instructions
+executed in the context of function "main" which can be related to line 16 in
+file "file.f", taking 20 CPU cycles. If a cost line specifies less event counts
+than given in the "events" line, the rest is assumed to be zero. I.e., there
+was no floating point instruction executed relating to line 16.</para>
+
+<para>Note that regular cost lines always give self (also called exclusive)
+cost of code at a given position. If you specify multiple cost lines for the
+same position, these will be summed up. On the other hand, in the example above
+there is no specification of how many times function "main" actually was
+called: profile data only contains sums.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.associations" xreflabel="Associations">
+<title>Associations</title>
+
+<para>The most important extension to the original format of Cachegrind is the
+ability to specify call relationship among functions. More generally, you
+specify assoziations among positions. For this, the second part of the
+file also can contain assoziation specifications. These look similar to
+position specifications, but consist of 2 lines. For calls, the format
+looks like 
+<screen>
+ calls=(Call Count) (Destination position)
+ (Source position) (Inclusive cost of call)
+</screen></para>
+
+<para>The destination only specifies subpositions like line number. Therefore,
+to be able to specify a call to another function in another source file, you
+have to precede the above lines with a "cfn=" specification for the name of the
+called function, and a "cfl=" specification if the function is in another
+source file. The 2nd line looks like a regular cost line with the difference
+that inclusive cost spent inside of the function call has to be specified.</para> 
+
+<para>Other assoziations which or for example (conditional) jumps. See the
+reference below for details.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.example2" xreflabel="Extended Example">
+<title>Extended Example</title>
+
+<para>The following example shows 3 functions, "main", "func1", and
+"func2". Function "main" calls "func1" once and "func2" 3 times. "func1" calls
+"func2" 2 times.
+<screen>events: Instructions
+
+fl=file1.c
+fn=main
+16 20
+cfn=func1
+calls=1 50
+16 400
+cfl=file2.c
+cfn=func2
+calls=3 20
+16 400
+
+fn=func1
+51 100
+cfl=file2.c
+cfn=func2
+calls=2 20
+51 300
+
+fl=file2.c
+fn=func2
+20 700</screen></para>
+
+<para>One can see that in "main" only code from line 16 is executed where also
+the other functions are called. Inclusive cost of "main" is 420, which is the
+sum of self cost 20 and costs spent in the calls.</para>
+
+<para>Function "func1" is located in "file1.c", the same as "main". Therefore,
+a "cfl=" specification for the call to "func1" is not needed. The function
+"func1" only consists of code at line 51 of "file1.c", where "func2" is called.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.compression1" xreflabel="Name Compression">
+<title>Name Compression</title>
+
+<para>With the introduction of association specifications like calls it is
+needed to specify the same function or same file name multiple times. As
+absolute filenames or symbol names in C++ can be quite long, it is advantageous
+to be able to specify integer IDs for position specifications.</para>
+
+<para>To support name compression, a position specification can be not only of
+the format "spec=name", but also "spec=(ID) name" to specify a mapping of an
+integer ID to a name, and "spec=(ID)" to reference a previously defined ID
+mapping. There is a separate ID mapping for each position specification,
+i.e. you can use ID 1 for both a file name and a symbol name.</para>
+
+<para>With string compression, the example from 1.4 looks like this:
+<screen>events: Instructions
+
+fl=(1) file1.c
+fn=(1) main
+16 20
+cfn=(2) func1
+calls=1 50
+16 400
+cfl=(2) file2.c
+cfn=(3) func2
+calls=3 20
+16 400
+
+fn=(2)
+51 100
+cfl=(2)
+cfn=(3)
+calls=2 20
+51 300
+
+fl=(2)
+fn=(3)
+20 700</screen></para>
+
+<para>As position specifications carry no information themself, but only change
+the meaning of subsequent cost lines or associations, they can appear
+everywhere in the file without any negative consequence. Especially, you can
+define name compression mappings directly after the header, and before any cost
+lines. Thus, the above example can also be written as
+<screen>events: Instructions
+
+# define file ID mapping
+fl=(1) file1.c
+fl=(2) file2.c
+# define function ID mapping
+fn=(1) main
+fn=(2) func1
+fn=(3) func2
+
+fl=(1)
+fn=(1)
+16 20
+...</screen></para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.compression2" xreflabel="Subposition Compression">
+<title>Subposition Compression</title>
+
+<para>If a Calltree data file should hold costs for each assembler instruction
+of a program, you specify subpostion "instr" in the "positions:" header line,
+and each cost line has to include the address of some instruction. Addresses
+are allowed to have a size of 64bit to support 64bit architectures. This
+motivates for subposition compression: instead of every cost line starting with
+a 16 character long address, one is allowed to specify relative subpositions.</para>
+
+<para>A relative subposition always is based on the corresponding subposition
+of the last cost line, and starts with a "+" to specify a positive difference,
+a "-" to specify a negative difference, or consists of "*" to specify the same
+subposition. Assume the following example (subpositions can always be specified
+as hexadecimal numbers, beginning with "0x"):
+<screen>positions: instr line
+events: ticks
+
+fn=func
+0x80001234 90 1
+0x80001237 90 5
+0x80001238 91 6</screen></para>
+
+<para>With subposition compression, this looks like
+<screen>positions: instr line
+events: ticks
+
+fn=func
+0x80001234 90 1
++3 * 5
++1 +1 6</screen></para>
+
+<para>Remark: For assembler annotation to work, instruction addresses have to
+be corrected to correspond to addresses found in the original binary. I.e. for
+relocatable shared objects, often a load offset has to be subtracted.</para>
+
+</sect2>
+
+
+<sect2 id="cl-format.overview.misc" xreflabel="Miscellaneous">
+<title>Miscellaneous</title>
+
+<sect3 id="cl-format.overview.misc.summary" xreflabel="Cost Summary Information">
+<title>Cost Summary Information</title>
+
+<para>For the visualization to be able to show cost percentage, a sum of the
+cost of the full run has to be known. Usually, it is assumed that this is the
+sum of all cost lines in a file. But sometimes, this is not correct. Thus, you
+can specify a "summary:" line in the header giving the full cost for the
+profile run. This has another effect: a import filter can show a progress bar
+while loading a large data file if he knows to cost sum in advance.</para>
+
+</sect3>
+
+<sect3 id="cl-format.overview.misc.events" xreflabel="Long Names for Event Types and inherited Types">
+<title>Long Names for Event Types and inherited Types</title>
+
+<para>Event types for cost lines are specified in the "events:" line with an
+abbreviated name. For visualization, it makes sense to be able to specify some
+longer, more descriptive name. For an event type "Ir" which means "Instruction
+Fetches", this can be specified the header line
+<screen>event: Ir : Instruction Fetches
+events: Ir Dr</screen></para>
+
+<para>In this example, "Dr" itself has no long name assoziated. The order of
+"event:" lines and the "events:" line is of no importance. Additionally,
+inherited event types can be introduced for which no raw data is available, but
+which are calculated from given types. Suppose the last example, you could add
+<screen>event: Sum = Ir + Dr</screen>
+to specify an additional event type "Sum", which is calculated by adding costs
+for "Ir and "Dr".</para>
+
+</sect3>
+
+</sect2>
+
+</sect1>
+
+<sect1 id="cl-format.reference" xreflabel="Reference">
+<title>Reference</title>
+
+<sect2 id="cl-format.reference.grammar" xreflabel="Grammar">
+<title>Grammar</title>
+
+<para>
+<screen>ProfileDataFile := FormatVersion? Creator? PartData*</screen>
+<screen>FormatVersion := "version:" Space* Number "\n"</screen>
+<screen>Creator := "creator:" NoNewLineChar* "\n"</screen>
+<screen>PartData := (HeaderLine "\n")+ (BodyLine "\n")+</screen>
+<screen>HeaderLine := (empty line)
+  | ('#' NoNewLineChar*)
+  | PartDetail
+  | Description
+  | EventSpecification
+  | CostLineDef</screen>
+<screen>PartDetail := TargetCommand | TargetID</screen>
+<screen>TargetCommand := "cmd:" Space* NoNewLineChar*</screen>
+<screen>TargetID := ("pid"|"thread"|"part") ":" Space* Number</screen>
+<screen>Description := "desc:" Space* Name Space* ":" NoNewLineChar*</screen>
+<screen>EventSpecification := "event:" Space* Name InheritedDef? LongNameDef?</screen>
+<screen>InheritedDef := "=" InheritedExpr</screen>
+<screen>InheritedExpr := Name
+  | Number Space* ("*" Space*)? Name
+  | InheritedExpr Space* "+" Space* InheritedExpr</screen>
+<screen>LongNameDef := ":" NoNewLineChar*</screen>
+<screen>CostLineDef := "events:" Space* Name (Space+ Name)*
+  | "positions:" "instr"? (Space+ "line")?</screen>
+<screen>BodyLine := (empty line)
+  | ('#' NoNewLineChar*)
+  | CostLine
+  | PositionSpecification
+  | AssoziationSpecification</screen>
+<screen>CostLine := SubPositionList Costs?</screen>
+<screen>SubPositionList := (SubPosition+ Space+)+</screen>
+<screen>SubPosition := Number | "+" Number | "-" Number | "*"</screen>
+<screen>Costs := (Number Space+)+</screen>
+<screen>PositionSpecification := Position "=" Space* PositionName</screen>
+<screen>Position := CostPosition | CalledPosition</screen>
+<screen>CostPosition := "ob" | "fl" | "fi" | "fe" | "fn"</screen>
+<screen>CalledPosition := " "cob" | "cfl" | "cfn"</screen>
+<screen>PositionName := ( "(" Number ")" )? (Space* NoNewLineChar* )?</screen>
+<screen>AssoziationSpecification := CallSpezification
+  | JumpSpecification</screen>
+<screen>CallSpecification := CallLine "\n" CostLine</screen>
+<screen>CallLine := "calls=" Space* Number Space+ SubPositionList</screen>
+<screen>JumpSpecification := ...</screen>
+<screen>Space := " " | "\t"</screen>
+<screen>Number := HexNumber | (Digit)+</screen>
+<screen>Digit := "0" | ... | "9"</screen>
+<screen>HexNumber := "0x" (Digit | HexChar)+</screen>
+<screen>HexChar := "a" | ... | "f" | "A" | ... | "F"</screen>
+<screen>Name = Alpha (Digit | Alpha)*</screen>
+<screen>Alpha = "a" | ... | "z" | "A" | ... | "Z"</screen>
+<screen>NoNewLineChar := all characters without "\n"</screen>
+</para>
+
+</sect2>
+
+<sect2 id="cl-format.reference.header" xreflabel="Description of Header Lines">
+<title>Description of Header Lines</title>
+
+<para>The header has an arbitrary number of lines of the format 
+"key: value". Possible <emphasis>key</emphasis> values for the header are:</para>
+
+<itemizedlist>
+
+  <listitem>
+    <para><computeroutput>version: number</computeroutput> [Callgrind]</para>
+    <para>This is used to distinguish future profile data formats.  A 
+    major version of 0 or 1 is supposed to be upwards compatible with 
+    Cachegrinds format.  It is optional; if not appearing, version 1 
+    is supposed.  Otherwise, this has to be the first header line.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>pid: process id</computeroutput> [Callgrind]</para>
+    <para>This specifies the process ID of the supervised application 
+    for which this profile was generated.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>cmd: program name + args</computeroutput> [Cachegrind]</para>
+    <para>This specifies the full command line of the supervised
+    application for which this profile was generated.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>part: number</computeroutput> [Callgrind]</para>
+    <para>This specifies a sequentially incremented number for each dump 
+    generated, starting at 1.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>desc: type: value</computeroutput> [Cachegrind]</para>
+    <para>This specifies various information for this dump.  For some 
+    types, the semantic is defined, but any description type is allowed. 
+    Unknown types should be ignored.</para>
+    <para>There are the types "I1 cache", "D1 cache", "L2 cache", which 
+    specify parameters used for the cache simulator.  These are the only
+    types originally used by Cachegrind.  Additionally, Callgrind uses 
+    the following types:  "Timerange" gives a rough range of the basic
+    block counter, for which the cost of this dump was collected. 
+    Type "Trigger" states the reason of why this trace was generated.
+    E.g. program termination or forced interactive dump.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>positions: [instr] [line]</computeroutput> [Callgrind]</para>
+    <para>For cost lines, this defines the semantic of the first numbers. 
+    Any combination of "instr", "bb" and "line" is allowed, but has to be 
+    in this order which corresponds to position numbers at the start of 
+    the cost lines later in the file.</para>
+    <para>If "instr" is specified, the position is the address of an 
+    instruction whose execution raised the events given later on the 
+    line.  This address is relative to the offset of the binary/shared 
+    library file to not have to specify relocation info.  For "line", 
+    the position is the line number of a source file, which is 
+    responsible for the events raised. Note that the mapping of "instr"
+    and "line" positions are given by the debugging line information
+    produced by the compiler.</para>
+    <para>This field is optional. If not specified, "line" is supposed 
+    only.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>events: event type abbrevations</computeroutput> [Cachegrind]</para>
+    <para>A list of short names of the event types logged in this file. 
+    The order is the same as in cost lines.  The first event type is the
+    second or third number in a cost line, depending on the value of 
+    "positions".  Callgrind does not add additional cost types.  Specify
+    exactly once.</para>
+    <para>Cost types from original Cachegrind are:
+      <itemizedlist>
+        <listitem>
+          <para><command>Ir</command>: Instruction read access</para>
+        </listitem>
+        <listitem>
+          <para><command>I1mr</command>: Instruction Level 1 read cache miss</para>
+        </listitem>
+        <listitem>
+          <para><command>I2mr</command>: Instruction Level 2 read cache miss</para>
+        </listitem>
+        <listitem>
+          <para>...</para>
+        </listitem>
+      </itemizedlist>
+    </para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>summary: costs</computeroutput> [Callgrind]</para>
+    <para><computeroutput>totals: costs</computeroutput> [Cachegrind]</para>
+    <para>The value or the total number of events covered by this trace
+    file.  Both keys have the same meaning, but the "totals:" line 
+    happens to be at the end of the file, while "summary:" appears in 
+    the header.  This was added to allow postprocessing tools to know
+    in advance to total cost. The two lines always give the same cost 
+    counts.</para>
+  </listitem>
+
+</itemizedlist>
+
+</sect2>
+
+<sect2 id="cl-format.reference.body" xreflabel="Description of Body Lines">
+<title>Description of Body Lines</title>
+
+<para>There exist lines
+<computeroutput>spec=position</computeroutput>.  The values for position
+specifications are arbitrary strings.  When starting with "(" and a
+digit, it's a string in compressed format.  Otherwise it's the real
+position string.  This allows for file and symbol names as position
+strings, as these never start with "(" + <emphasis>digit</emphasis>.
+The compressed format is either "(" <emphasis>number</emphasis> ")"
+<emphasis>space</emphasis> <emphasis>position</emphasis> or only 
+"(" <emphasis>number</emphasis> ")".  The first relates
+<emphasis>position</emphasis> to <emphasis>number</emphasis> in the
+context of the given format specification from this line to the end of
+the file; it makes the (<emphasis>number</emphasis>) an alias for
+<emphasis>position</emphasis>.  Compressed format is always
+optional.</para>
+
+<para>Position specifications allowed:</para>
+<itemizedlist>
+
+  <listitem>
+    <para><computeroutput>ob=</computeroutput> [Callgrind]</para>
+    <para>The ELF object where the cost of next cost lines happens.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>fl=</computeroutput> [Cachegrind]</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>fi=</computeroutput> [Cachegrind]</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>fe=</computeroutput> [Cachegrind]</para>
+    <para>The source file including the code which is responsible for
+    the cost of next cost lines. "fi="/"fe=" is used when the source
+    file changes inside of a function, i.e. for inlined code.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>fn=</computeroutput> [Cachegrind]</para>
+    <para>The name of the function where the cost of next cost lines 
+    happens.</para>
+  </listitem>
+
+  <listitem>
+     <para><computeroutput>cob=</computeroutput> [Callgrind]</para>
+    <para>The ELF object of the target of the next call cost lines.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>cfl=</computeroutput> [Callgrind]</para>
+    <para>The source file including the code of the target of the
+    next call cost lines.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>cfn=</computeroutput> [Callgrind]</para>
+    <para>The name of the target function of the next call cost 
+    lines.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>calls=</computeroutput> [Callgrind]</para>
+    <para>The number of nonrecursive calls which are responsible for the 
+    cost specified by the next call cost line. This is the cost spent 
+    inside of the called function.</para>
+    <para>After "calls=" there MUST be a cost line. This is the cost
+    spent in the called function. The first number is the source line 
+    from where the call happened.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>jump=count target position</computeroutput> [Callgrind]</para>
+    <para>Unconditional jump, executed count times, to the given target
+    position.</para>
+  </listitem>
+
+  <listitem>
+    <para><computeroutput>jcnd=exe.count jumpcount target position</computeroutput> [Callgrind]</para>
+    <para>Conditional jump, executed exe.count times with jumpcount 
+    jumps to the given target position.</para>
+  </listitem>
+
+</itemizedlist>
+
+</sect2>
+
+</sect1>
+
+</chapter>
\ No newline at end of file
diff --git a/callgrind/docs/cl-manual.xml b/callgrind/docs/cl-manual.xml
new file mode 100644
index 0000000000..6c8797f422
--- /dev/null
+++ b/callgrind/docs/cl-manual.xml
@@ -0,0 +1,810 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+<chapter id="cl-manual" xreflabel="Callgrind Manual">
+<title>Callgrind Manual</title>
+
+
+<sect1 id="cl-manual.use" xreflabel="Overview">
+<title>Overview</title>
+
+<para>Callgrind is a Valgrind tool, able to run applications under 
+supervision to generate profiling data. By default, this data consists of
+number of instructions executed on a run, related to source lines, and
+call relationship among functions together with call counts.
+Optionally, a cache simulator (similar to cachegrind) can produce
+further information about the memory access behavior of the application.
+</para>
+
+<para>The profile data is written out to a file at program
+termination. For presentation of the data, and interactive control
+of the profiling, two command line tools are provided:</para>
+<variablelist>
+  <varlistentry>
+  <term><command>callgrind_annotate</command></term>
+  <listitem>
+    <para>This command reads in the profile data, and prints a
+    sorted lists of functions, optionally with annotation.</para>
+    <para>You can read the manpage here: <xref
+	      linkend="callgrind-annotate"/>.</para>
+    <para>For graphical visualization of the data, check out
+    <ulink url="&cl-gui;">KCachegrind</ulink>.</para>
+
+  </listitem>
+  </varlistentry>
+
+  <varlistentry>
+  <term><command>callgrind_control</command></term>
+  <listitem>
+    <para>This command enables you to interactively observe and control 
+    the status of currently running applications supervised. You can 
+    get statistic information, the current stack trace, and request 
+    zeroing of counters, and dumping of profiles.</para>
+    <para>You can read the manpage here: <xref linkend="callgrind-control"/>.</para>
+  </listitem>
+  </varlistentry>
+</variablelist>
+
+<para>To use Callgrind, you must specify 
+<computeroutput>--tool=callgrind</computeroutput> on the Valgrind 
+command line or use the supplied script 
+<computeroutput>callgrind</computeroutput>.</para>
+
+<para>Callgrinds cache simulation is based on the 
+<ulink url="&cg-tool-url;">Cachegrind tool</ulink> of the 
+<ulink url="&vg-url;">Valgrind</ulink> package.  Read 
+<ulink url="&cg-doc-url;">Cachegrind's documentation</ulink> first; 
+this page describes the features supported in addition to 
+Cachegrind's features.</para>
+
+</sect1>
+
+
+<sect1 id="cl-manual.purpose" xreflabel="Purpose">
+<title>Purpose</title>
+
+
+  <sect2 id="cl-manual.devel" 
+         xreflabel="Profiling as part of Application Development">
+  <title>Profiling as part of Application Development</title>
+
+  <para>With application development, usually, one of the last steps is
+  to improve the runtime performance. To not waste time on
+  optimizing functions which are rarely used, one needs to know
+  in which part of the program most of the time is spent.</para>
+
+  <para>This is done with a technique called profiling. The program
+  is run under control of a profiling tool, which gives the time
+  distribution of executed functions in the run. After examination
+  of the program's profile, it should be clear if and where optimization
+  is useful. Afterwards, one should verify any runtime changes by another
+  profile run.</para>
+
+  </sect2>
+
+
+  <sect2 id="cl-manual.tools" xreflabel="Profiling Tools">
+  <title>Profiling Tools</title>
+
+  <para>Most known is the GCC profiling tool <command>GProf</command>:
+  one needs to compile an application with the compiler option 
+  <computeroutput>-pg</computeroutput>; running the program generates
+  a file <computeroutput>gmon.out</computeroutput>, which can be 
+  transformed into human readable form with the command line tool 
+  <computeroutput>gprof</computeroutput>.  An disadvantage here is the 
+  required compilation step for preparing the executable; additionally, the
+  application should be statically linked.</para>
+
+  <para>Another profiling tool is <command>Cachegrind</command>, part
+  of <ulink url="&vg-url;">Valgrind</ulink>. It uses the processor
+  emulation of Valgrind to run the executable, and catches all memory
+  accesses for the trace. The user program does not need to be
+  recompiled; it can use shared libraries and plugins, and the profile
+  measuring doesn't influence the trace results. The trace includes 
+  the number of instruction/data memory accesses and 1st/2nd level
+  cache misses, and relates it to source lines and functions of the
+  run program.  A disadvantage is the slowdown involved in the
+  processor emulation, it's around 50 times slower.</para>
+
+  <para>Cachegrind can only deliver a flat profile. There is no call 
+  relationship among the functions of an application stored.  Thus, 
+  inclusive costs, i.e. costs of a function including the cost of all 
+  functions called from there, cannot be calculated. Callgrind extends 
+  Cachegrind by including call relationship and exact event counts
+  spent while doing a call.</para>
+
+  <para>Because Callgrind (and Cachegrind) is based on simulation, the
+  slowdown due to processing the synthetic runtime events does not
+  influence the results.  See <xref linkend="cl-manual.usage"/> for more 
+  details on the possibilities.</para>
+
+  </sect2>
+
+</sect1>
+
+
+<sect1 id="cl-manual.usage" xreflabel="Usage">
+<title>Usage</title>
+
+  <sect2 id="cl-manual.basics" xreflabel="Basics">
+  <title>Basics</title>
+
+  <para>To start a profile run for a program, execute:
+  <screen>callgrind [callgrind options] your-program [program options]</screen>
+  </para>
+
+  <para>While the simulation is running, you can observe execution with
+  <screen>callgrind_control -b</screen>
+  This will print out a current backtrace. To annotate the backtrace with
+  event counts, run
+  <screen>callgrind_control -e -b</screen>
+  </para>
+
+  <para>After program termination, a profile data file named 
+  <computeroutput>callgrind.out.pid</computeroutput>
+  is generated with <emphasis>pid</emphasis> being the process ID 
+  of the execution of this profile run.</para>
+
+  <para>The data file contains information about the calls made in the
+  program among the functions executed, together with events of type
+  <command>Instruction Read Accesses</command> (Ir).</para>
+
+  <para>If you are additionally interested in memory accesses of your 
+  program, and if an access can be satisfied by loading from 1st/2nd
+  level cache, use Callgrind with the option
+  <option><xref linkend="opt.simulate-cache"/>=yes.</option>
+  This will further slow down the run approximatly by a factor of 2.</para>
+
+  <para>If the program section you want to profile is somewhere in the
+  middle of the run, it is beneficial to 
+  <emphasis>fast forward</emphasis> to this section without any 
+  profiling at all, and switch it on later. This is achieved by using
+  <option><xref linkend="opt.instr-atstart"/>=no</option> 
+  and interactively use 
+  <computeroutput>callgrind_control -i on</computeroutput> before the 
+  interesting code section is about to be executed.</para>
+
+  <para>If you want to be able to see assembler annotation, specify
+  <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce
+  profile data at instruction granularity. Note that this type of annotation
+  is only available with KCachegrind. For assembler annotation, it also is
+  interesting to see more details of the control flow inside of functions,
+  ie. (conditional) jumps. This will be collected by further specifying
+  <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para>
+
+  </sect2>
+
+
+  <sect2 id="cl-manual.dumps" 
+         xreflabel="Multiple dumps from one program run">
+  <title>Multiple dumps from one program run</title>
+
+  <para>Often, you aren't interested in time characteristics of a full 
+  program run, but only of a small part of it (e.g. execution of one
+  algorithm).  If there are multiple algorithms or one algorithm 
+  running with different input data, it's even useful to get different
+  profile information for multiple parts of one program run.</para>
+
+  <para>In full detail, a generated profile data files is named
+<screen>
+callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
+</screen>
+  </para>
+  <para>where <emphasis>pid</emphasis> is the PID of the running 
+  program, <emphasis>part</emphasis> is a number incremented on each
+  dump (".part" is skipped for the dump at program termination), and 
+  <emphasis>threadID</emphasis> is a thread identification 
+  ("-threadID" is only used if you request dumps of individual 
+  threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para>
+
+  <para>There are different ways to generate multiple profile dumps 
+  while a program is running under Callgrind's supervision.  Still, 
+  all methods trigger the same action, viz. "dump all profile 
+  information since the last dump or program start, and zero cost 
+  counters afterwards".  To allow for zeroing cost counters without
+  dumping, there is a second action "zero all cost counters now". 
+  The different methods are:</para>
+  <itemizedlist>
+
+    <listitem>
+      <para><command>Dump on program termination.</command>
+      This method is the standard way and doesn't need any special
+      action from your side.</para>
+    </listitem>
+
+    <listitem>
+      <para><command>Spontaneous, interactive dumping.</command> Use
+      <screen>callgrind_control -d [hint [PID/Name]]</screen> to 
+      request the dumping of profile information of the supervised
+      application with PID or Name.  <emphasis>hint</emphasis> is an
+      arbitrary string you can optionally specify to later be able to
+      distinguish profile dumps.  The control program will not terminate
+      before the dump is completely written.  Note that the application
+      must be actively running for detection of the dump command. So,
+      for a GUI application, resize the window or for a server send a
+      request.</para>
+      <para>If you are using <ulink url="&cl-gui;">KCachegrind</ulink>
+      for browsing of profile information, you can use the toolbar
+      button <command>Force dump</command>. This will request a dump
+      and trigger a reload after the dump is written.</para>
+    </listitem>
+
+    <listitem>
+      <para><command>Periodic dumping after execution of a specified
+      number of basic blocks</command>. For this, use the command line
+      option <option><xref linkend="opt.dump-every-bb"/>=count</option>.
+      The resultion of the internal basic block counter of Valgrind is
+      only rough, so you should at least specify a interval of 50000
+      basic blocks.</para>
+    </listitem>
+
+    <listitem>
+      <para><command>Dumping at enter/leave of all functions whose name
+      starts with</command> <emphasis>funcprefix</emphasis>.  Use the
+      option <option><xref linkend="opt.dump-before"/>=funcprefix</option>
+      and <option><xref linkend="opt.dump-after"/>=funcprefix</option>.
+      To zero cost counters before entering a function, use
+      <option><xref linkend="opt.zero-before"/>=funcprefix</option>.
+      The prefix method for specifying function names was choosen to
+      ease the use with C++: you don't have to specify full
+      signatures.</para> <para>You can specify these options multiple
+      times for different function prefixes.</para>
+    </listitem>
+
+    <listitem>
+      <para><command>Program controlled dumping.</command>
+      Put <screen><![CDATA[#include <valgrind/callgrind.h>]]></screen>
+      into your source and add 
+      <computeroutput>CALLGRIND_DUMP_STATS;</computeroutput> when you
+      want a dump to happen. Use 
+      <computeroutput>CALLGRIND_ZERO_STATS;</computeroutput> to only 
+      zero cost centers.</para>
+      <para>In Valgrind terminology, this way is called "Client
+      requests".  The given macros generate a special instruction
+      pattern with no effect at all (i.e. a NOP). Only when run under
+      Valgrind, the CPU simulation engine detects the special
+      instruction pattern and triggers special actions like the ones
+      described above.</para>
+    </listitem>
+  </itemizedlist>
+
+  <para>If you are running a multi-threaded application and specify the
+  command line option <option><xref linkend="opt.separate-threads"/>=yes</option>, 
+  every thread will be profiled on its own and will create its own
+  profile dump. Thus, the last two methods will only generate one dump
+  of the currently running thread. With the other methods, you will get
+  multiple dumps (one for each thread) on a dump request.</para>
+
+  </sect2>
+
+
+
+  <sect2 id="cl-manual.limits" 
+         xreflabel="Limiting range of event collection">
+  <title>Limiting range of event collection</title>
+
+  <para>For aggregating events (function enter/leave,
+  instruction execution, memory access) into event numbers,
+  first, the events must be recognizable by Callgrind, and second,
+  the collection state must be switched on.</para>
+
+  <para>Event recognition is only possible if <emphasis>instrumentation</emphasis>
+  for program code is switched on. This is the default, but for faster
+  execution (identical to <computeroutput>valgrind --tool=none</computeroutput>),
+  it can be temporarely switched off until the program reaches parts which
+  are interesting to be profiled. Callgrind can start without instrumentation
+  by specifying option <option><xref linkend="opt.instr-atstart"/>=no</option>.
+  The instrumentation state can be switched on interactively
+  with <screen>callgrind_control -i on</screen>
+  and off by specifying "off" instead of "on".
+  Furthermore, instrumentation state can be programatically changed with
+  the macros <computeroutput>CALLGRIND_START_INSTRUMENTATION;</computeroutput>
+  and <computeroutput>CALLGRIND_STOP_INSTRUMENTATION;</computeroutput>.
+  </para>
+  
+  <para>In addition to instrumentation, events must be allowed to be collected
+  to be counted. This, too, is by default the case.
+  You can explicitly control for which part of your program you want to
+  collect events by using 
+  <option><xref linkend="opt.toggle-collect"/>=funcprefix</option>. 
+  This will toggle the collection state on entering and leaving a
+  function.  When specifying this option, the default collection state
+  at program start is "off". Thus, only events happening while running
+  inside of functions starting with <emphasis>funcprefix</emphasis> will
+  be collected. Recursive
+  calls of functions with <emphasis>funcprefix</emphasis> do not trigger
+  any action.</para>
+
+  <para>It is important to note that with instrumentation switched off, the
+  cache simulator can not see any memory access events, and thus, any
+  simulated cache state will be frozen and wrong without instrumentation.
+  Therefore, to get useful cache events (hits/misses) after switching on
+  instrumentation, the cache first must warm up,
+  probably leading to many <emphasis>cold misses</emphasis>
+  which would not have happened in reality. If you do not want to see these,
+  start actual collection a few million instructions after you have switched
+  on instrumentation</para>.
+
+
+  </sect2>
+
+
+
+  <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
+  <title>Avoiding cycles</title>
+
+  <para>Each group of functions with any two of them happening to have a
+  call chain from one to the other, is called a cycle.  For example,
+  with A calling B, B calling C, and C calling A, the three functions
+  A,B,C build up one cycle.</para>
+
+  <para>If a call chain goes multiple times around inside of a cycle,
+  with profiling, you can not distinguish event counts coming from the
+  first round or the second. Thus, it makes no sense to attach any inclusive
+  cost to a call among functions inside of one cycle.
+  If "A &gt; B" appears multiple times in a call chain, you
+  have no way to partition the one big sum of all appearances of "A &gt;
+  B".  Thus, for profile data presentation, all functions of a cycle are
+  seen as one big virtual function.</para>
+
+  <para>Unfortunately, if you have an application using some callback
+  mechanism (like any GUI program), or even with normal polymorphism (as
+  in OO languages like C++), it's quite possible to get large cycles.
+  As it is often impossible to say anything about performance behaviour
+  inside of cycles, it is useful to introduce some mechanisms to avoid
+  cycles in call graphs at all.  This is done by treating the same
+  function in different ways, depending on the current execution
+  context. Either by giving them different names, or by ignoring calls to
+  functions at all.</para>
+
+  <para>There is an option to ignore calls to a function with
+  <option><xref linkend="opt.fn-skip"/>=funcprefix</option>.  E.g., you
+  usually do not want to see the trampoline functions in the PLT sections
+  for calls to functions in shared libraries. You can see the difference
+  if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>.
+  If a call is ignored, cost events happening will be attached to the
+  enclosing function.</para>
+
+  <para>If you have a recursive function, you can distinguish the first
+  10 recursion levels by specifying
+  <option><xref linkend="opt.fn-recursion-num"/>=funcprefix</option>.  
+  Or for all functions with 
+  <option><xref linkend="opt.fn-recursion"/>=10</option>, but this will 
+  give you much bigger profile data files.  In the profile data, you will see
+  the recursion levels of "func" as the different functions with names
+  "func", "func'2", "func'3" and so on.</para>
+
+  <para>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
+  in your program, you usually get a "false" cycle "B &lt;&gt; C". Use 
+  <option><xref linkend="opt.fn-caller-num"/>=B</option> 
+  <option><xref linkend="opt.fn-caller-num"/>=C</option>,
+  and functions "B" and "C" will be treated as different functions 
+  depending on the direct caller. Using the apostrophe for appending 
+  this "context" to the function name, you get "A &gt; B'A &gt; C'B" 
+  and "A &gt; C'A &gt; B'C", and there will be no cycle. Use 
+  <option><xref linkend="opt.fn-caller"/>=3</option> to get a 2-caller 
+  dependency for all functions. Again, this will multiplicate the 
+  profile data size.</para>
+
+  </sect2>
+
+</sect1>
+
+
+<sect1 id="cl-manual.options" xreflabel="Command line option reference">
+<title>Command line option reference</title>
+
+<para>
+This reference groups options into classes, and uses the same order as
+the output as <computeroutput>callgrind --help</computeroutput>.
+</para>
+
+<sect2 id="cl-manual.options.misc" 
+       xreflabel="Miscellaneous options">
+<title>Miscellaneous options</title>
+
+<variablelist id="cmd-options.misc">
+
+  <varlistentry>
+    <term><option>--help</option></term>
+    <listitem>
+      <para>Show summary of options. This is a short version of this
+      manual section.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>--version</option></term>
+    <listitem>
+      <para>Show version of callgrind.</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.creation" 
+       xreflabel="Dump creation options">
+<title>Dump creation options</title>
+
+<para>
+These options influence the name and format of the profile data files.
+</para>
+
+<variablelist id="cmd-options.creation">
+
+  <varlistentry id="opt.base">
+    <term>
+      <option><![CDATA[--base=<prefix> [default: callgrind.out] ]]></option>
+    </term>
+    <listitem>
+      <para>Specify another base name for the dump file names. To
+      distinguish different profile runs of the same application,
+      <computeroutput>.&lt;pid&gt;</computeroutput> is appended to the
+      base dump file name with
+      <computeroutput>&lt;pid&gt;</computeroutput> being the process ID
+      of the profile run (with multiple dumps happening, the file name
+      is modified further; see below).</para> <para>This option is
+      especially usefull if your application changes its working
+      directory.  Usually, the dump file is generated in the current
+      working directory of the application at program termination.  By
+      giving an absolute path with the base specification, you can force
+      a fixed directory for the dump files.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
+    <term>
+      <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>This specifies that event count relation at instruction granularity
+      should be available in the profile data file. This allows assembler
+      annotation, but currently can only be shown with KCachegrind.</para>
+  </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.dump-line" xreflabel="--dump-line">
+    <term>
+      <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>This specifies that event count relation at source line granularity
+      should be available in the profile data file. This allows source
+      annotation for source which was compiled with debug information ("-g").
+      This always should be enabled.</para>
+  </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
+    <term>
+      <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>This option influences the output format of the profile data.
+      It specifies whether strings (file and function names) should be
+      identified by numbers. This shrinks the file size, but makes it more difficult
+      to be read by humans (which is not recommand either way).</para>
+      <para>However, this currently has to be switched off if
+      the files are to be read by
+      <computeroutput>callgrind_annotate</computeroutput>!</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
+    <term>
+      <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>This option influences the output format of the profile data.
+      It specifies whether numerical positions are always specified as absolute
+      values or are allowed to be relative to previous numbers.
+      This shrinks the file size,</para>
+      <para>However, this currently has to be switched off if
+      the files are to be read by
+      <computeroutput>callgrind_annotate</computeroutput>!</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
+    <term>
+      <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>When multiple profile data parts are to be generated, these
+      parts are appended to the same output file if this option is set to
+      "yes". Not recommand.</para>
+  </listitem>
+  </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.activity" 
+       xreflabel="Activity options">
+<title>Activity options</title>
+
+<para>
+These options specify when different actions regarding event counts are to
+be executed. For interactive control use
+<computeroutput>callgrind_control</computeroutput>.
+</para>
+
+<variablelist id="cmd-options.activity">
+
+  <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
+    <term>
+      <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
+    </term>
+    <listitem>
+      <para>Dump profile data each &lt;count&gt; basic blocks</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.dump-before" xreflabel="--dump-before">
+    <term>
+      <option><![CDATA[--dump-before=<prefix> ]]></option>
+    </term>
+    <listitem>
+      <para>Dump when entering a function starting with &lt;prefix&gt;</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.zero-before" xreflabel="--zero-before">
+    <term>
+      <option><![CDATA[--zero-before=<prefix> ]]></option>
+    </term>
+    <listitem>
+      <para>Zero all costs when entering a function starting with &lt;prefix&gt;</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.dump-after" xreflabel="--dump-after">
+    <term>
+      <option><![CDATA[--dump-after=<prefix> ]]></option>
+    </term>
+    <listitem>
+      <para>Dump when leaving a function starting with &lt;prefix&gt;</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.collection"
+       xreflabel="Data collection options">
+<title>Data collection options</title>
+
+<para>
+These options specify when events are to be aggregated into event counts.
+Also see <xref linkend="cl-manual.limits"/>.</para>
+
+<variablelist id="cmd-options.collection">
+
+  <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
+    <term>
+      <option><![CDATA[--instr-atstart=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>Specify if you want Callgrind to start simulation and
+      profiling from the beginning.  If not, Callgrind will not be able
+      to collect any information, including calls, but it will have at
+      most a slowdown of around 4, which is the minimum Valgrind
+      overhead.  Instrumentation can be interactively switched on via
+      <computeroutput>callgrind_control -i on</computeroutput>.</para>
+      <para>Note that the resulting call graph will most probably not
+      contain <computeroutput>main</computeroutput>, but all the
+      functions executed after instrumentation was switched on.
+      Instrumentation can also programatically switched on/off. See the
+      Callgrind include file
+      <computeroutput>&lt;callgrind.h&gt;</computeroutput> for the macro
+      you have to use in your source code.</para> <para>For cache
+      simulation, results will be a little bit off when switching on
+      instrumentation later in the program run, as the simulator starts
+      with an empty cache at that moment.  Switch on event collection
+      later to cope with this error.</para>
+    </listitem>
+  </varlistentry>
+  
+  <varlistentry id="opt.collect-atstart">
+    <term>
+      <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>Specify whether event collection is switched on at beginning
+      of the profile run.</para>
+      <para>To only look at parts of your program, you have two
+      possibilities:</para>
+      <orderedlist>
+      <listitem>
+        <para>Zero event counters before entering the program part you
+        want to profile, and dump the event counters to a file after
+        leaving that program part.</para>
+        </listitem>
+        <listitem>
+          <para>Switch on/off collection state as needed to only see
+          event counters happening while inside of the program part you
+          want to profile.</para>
+        </listitem>
+      </orderedlist>
+      <para>The second option can be used if the programm part you want to
+      profile is called many times. Option 1, i.e. creating a lot of
+      dumps is not practical here.</para> <para>Collection state can be
+      toggled at entering and leaving of a given function with the
+      option <xref linkend="opt.toggle-collect"/>.  For this, collection
+      state should be switched off at the beginning.  Note that the
+      specification of <computeroutput>--toggle-collect</computeroutput>
+      implicitly sets
+      <computeroutput>--collect-state=no</computeroutput>.</para>
+      <para>Collection state can be toggled also by using a Valgrind
+      User Request in your application.  For this, include
+      <computeroutput>valgrind/callgrind.h</computeroutput> and specify
+      the macro
+      <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput> at the
+      needed positions. This only will have any effect if run under
+      supervision of the Callgrind tool.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
+    <term>
+      <option><![CDATA[--toggle-collect=<prefix> ]]></option>
+    </term>
+    <listitem>
+      <para>Toggle collection on enter/leave a function starting with
+      &lt;prefix&gt;.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps=">
+    <term>
+      <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>This specifies whether information for (conditional) jumps
+      should be collected. Same as above, callgrind_annotate currently is not
+      able to show you the data. You have to use KCachegrind to get jump
+      arrows in the annotated code.</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.separation"
+       xreflabel="Cost entity separation options">
+<title>Cost entity separation options</title>
+
+<para>
+These options specify how event count relation to execution contexts should be
+done. More specifically, this specifies e.g. if the recursion level or the
+call chain leading to a function should be accounted for, are if the
+thread ID should be remembered.
+Also see <xref linkend="cl-manual.cycles"/>.</para>
+
+<variablelist id="cmd-options.separation">
+
+  <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
+    <term>
+      <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>This option specifies whether profile data should be generated
+      separately for every thread. If yes, the file names get "-threadID"
+      appended.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.fn-recursion" xreflabel="--fn-recursion">
+    <term>
+      <option><![CDATA[--fn-recursion=<level> [default: 2] ]]></option>
+    </term>
+    <listitem>
+      <para>Separate function recursions, maximal &lt;level&gt;.
+      See <xref linkend="cl-manual.cycles"/>.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.fn-caller" xreflabel="--fn-caller">
+    <term>
+      <option><![CDATA[--fn-caller=<callers> [default: 0] ]]></option>
+    </term>
+    <listitem>
+      <para>Separate contexts by maximal &lt;callers&gt; functions in the
+      call chain. See <xref linkend="cl-manual.cycles"/>.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
+    <term>
+      <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>Ignore calls to/from PLT sections.</para>
+    </listitem>
+  </varlistentry>
+  
+  <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
+    <term>
+      <option><![CDATA[--fn-skip=<function> ]]></option>
+    </term>
+    <listitem>
+      <para>Ignore calls to/from a given function?  E.g. if you have a
+      call chain A &gt; B &gt; C, and you specify function B to be
+      ignored, you will only see A &gt; C.</para>
+      <para>This is very convenient to skip functions handling callback
+      behaviour. E.g. for the SIGNAL/SLOT mechanism in QT, you only want
+      to see the function emitting a signal to call the slots connected
+      to that signal. First, determine the real call chain to see the
+      functions needed to be skipped, then use this option.</para>
+    </listitem>
+  </varlistentry>
+  
+  <varlistentry id="opt.fn-group">
+    <term>
+      <option><![CDATA[--fn-group<number>=<function> ]]></option>
+    </term>
+    <listitem>
+      <para>Put a function into a separation group. This influences the
+      context name for cycle avoidance. All functions inside of such a
+      group are treated as being the same for context name building, which
+      resembles the call chain leading to a context. By specifying function
+      groups with this option, you can shorten the context name, as functions
+      in the same group will not appear in sequence in the name. </para>
+    </listitem>
+  </varlistentry>
+  
+  <varlistentry id="opt.fn-recursion-num" xreflabel="--fn-recursion10">
+    <term>
+      <option><![CDATA[--fn-recursion<number>=<function> ]]></option>
+    </term>
+    <listitem>
+      <para>Separate &lt;number&gt; recursions for &lt;function&gt;.
+      See <xref linkend="cl-manual.cycles"/>.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry id="opt.fn-caller-num" xreflabel="--fn-caller2">
+    <term>
+      <option><![CDATA[--fn-caller<number>=<function> ]]></option>
+    </term>
+    <listitem>
+      <para>Separate &lt;number&gt; callers for &lt;function&gt;.
+      See <xref linkend="cl-manual.cycles"/>.</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+</sect2>
+
+<sect2 id="cl-manual.options.simulation"
+       xreflabel="Cache simulation options">
+<title>Cache simulation options</title>
+
+<variablelist id="cmd-options.simulation">
+  
+  <varlistentry id="opt.simulate-cache" xreflabel="--simulate-cache">
+    <term>
+      <option><![CDATA[--simulate-cache=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>Specify if you want to do full cache simulation. Disabled by
+      default; only instruction read accesses will be profiled.</para>
+      <para>Note however, that estimating of how much real time your
+      program will need only by using the instruction read counts is
+      impossible. Use it if you want to find out how many times
+      different functions are called and there call relation.</para>
+    </listitem>
+  </varlistentry>
+  
+</variablelist>
+
+</sect2>
+
+</sect1>
+
+</chapter>
+
+
diff --git a/callgrind/docs/index.xml b/callgrind/docs/index.xml
new file mode 100644
index 0000000000..45d2f8e94f
--- /dev/null
+++ b/callgrind/docs/index.xml
@@ -0,0 +1,120 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<book id="cl-docs" xreflabel="Callgrind Documentation">
+
+<bookinfo>
+  <title>Callgrind Documentation</title>
+  <subtitle>A call-graph generating Cache Simulator and Profiler</subtitle>
+  <releaseinfo>Release &cl-version; &cl-date;</releaseinfo>
+  <copyright>
+    <year>&cl-lifespan;</year>
+    <holder>
+      <link linkend="dist.authors" endterm="dist.authors.title"></link>
+    </holder>
+  </copyright>
+  <author>
+    <email><ulink url="mailto:&cl-email;">&cl-email;</ulink></email>
+  </author>
+  <legalnotice>
+    <para>Permission is granted to copy, distribute and/or modify this 
+    document under the terms of the GNU Free Documentation License,
+    Version 1.2 or any later version published by the Free Software Foundation; 
+    with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+    Texts.  A copy of the license is included in the section entitled 
+    <xref linkend="dist.license-gfdl"/>.</para>
+  </legalnotice>
+</bookinfo>
+
+
+<xi:include href="cl-manual.xml" parse="xml"  
+    xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+<xi:include href="cl-format.xml" parse="xml"  
+    xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+<chapter id="man-annotate" xreflabel="Callgrind Annotate (1)">
+  <title>Callgrind Annotate (1)</title>
+  <xi:include href="man-annotate.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
+</chapter>
+
+<chapter id="man-control" xreflabel="Callgrind Control (1)">
+  <title>Callgrind Control (1)</title>
+  <xi:include href="man-control.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
+</chapter>
+
+<!-- included for the sake of completeness -->
+<chapter id="man-callgrind" xreflabel="Callgrind (1)">
+  <title>Callgrind (1)</title>
+  <xi:include href="man-callgrind.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
+</chapter>
+
+<!-- Because these are all text files, we have to wrap -->
+<!-- them in suitable XML.  Hence the chapter/title stuff  -->
+<chapter id="dist.authors" xreflabel="Authors">
+  <title id="dist.authors.title">AUTHORS</title>
+  <literallayout>
+    <xi:include href="../../AUTHORS" parse="text"  
+        xmlns:xi="http://www.w3.org/2001/XInclude" />
+  </literallayout>
+</chapter>
+
+<chapter id="dist.readme" xreflabel="Readme">
+  <title>README</title>
+  <literallayout>
+    <xi:include href="../../README" parse="text"  
+        xmlns:xi="http://www.w3.org/2001/XInclude" />
+  </literallayout>
+</chapter>
+
+<chapter id="dist.changelog" xreflabel="ChangeLog">
+  <title>ChangeLog</title>
+  <literallayout>
+    <xi:include href="../../ChangeLog" parse="text"  
+        xmlns:xi="http://www.w3.org/2001/XInclude" />
+  </literallayout>
+</chapter>
+
+<!-- NEWS is empty, so comment it out -->
+<!--
+<chapter id="dist.news" xreflabel="News">
+  <title>NEWS</title>
+  <literallayout>
+    <xi:include href="../../NEWS" parse="text"  
+        xmlns:xi="http://www.w3.org/2001/XInclude" />
+  </literallayout>
+</chapter>
+-->
+
+<chapter id="dist.install" xreflabel="Install">
+  <title>INSTALL</title>
+  <literallayout>
+    <xi:include href="../../INSTALL" parse="text"  
+        xmlns:xi="http://www.w3.org/2001/XInclude" />
+  </literallayout>
+</chapter>
+
+<chapter id="dist.license-gpl" xreflabel=" The GNU General Public License">
+  <title>The GNU General Public License</title>
+    <literallayout>
+      <xi:include href="../../COPYING" parse="text"  
+          xmlns:xi="http://www.w3.org/2001/XInclude" />
+    </literallayout>
+</chapter>
+
+<chapter id="dist.license-gfdl" xreflabel=" The GNU Free Documentation License">
+  <title>The GNU Free Documentation License</title>
+    <literallayout>
+      <xi:include href="../COPYING.DOCS" parse="text"  
+          xmlns:xi="http://www.w3.org/2001/XInclude" />
+    </literallayout>
+</chapter>
+
+
+</book>
diff --git a/callgrind/docs/man-annotate.xml b/callgrind/docs/man-annotate.xml
new file mode 100644
index 0000000000..dd668e26e5
--- /dev/null
+++ b/callgrind/docs/man-annotate.xml
@@ -0,0 +1,163 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<refentry id="callgrind-annotate">
+
+<refmeta>
+  <refentrytitle>Callgrind Annotate</refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo class="a-source">May 13, 2003</refmiscinfo>
+</refmeta>
+
+<refnamediv id="a-name">
+  <refname>callgrind_annotate</refname>
+  <refpurpose>produces human readable ASCII output from profile 
+  information in <command>cachegrind.out</command> files</refpurpose>
+</refnamediv>
+
+<refsynopsisdiv id="a-synopsis">
+  <cmdsynopsis>
+    <command>callgrind_annotate</command>    
+    <arg choice="opt"><replaceable>options</replaceable></arg>
+    <arg choice="opt"><replaceable>source-files</replaceable></arg>
+  </cmdsynopsis>
+</refsynopsisdiv>
+
+
+<refsect1 id="a-description">
+<title>Description</title>
+
+<para>This manual page documents briefly the 
+<command>callgrind_annotate</command> command.  This manual page was 
+written for the Debian distribution because the original program does 
+not have a manual page.</para>
+
+</refsect1>
+
+
+<refsect1 id="a-options">
+<title>Options</title>
+
+<para>This program follows the usual GNU command line syntax, with long
+options starting with two dashes ('--').  A summary of options is 
+included below.</para>
+
+<variablelist remap="TP">
+
+  <varlistentry>
+    <term><option>-h, --help</option></term>
+    <listitem>
+      <para>Show summary of options.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>--version</option></term>
+    <listitem>
+      <para>Show version of callgrind_annotate.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option>--show=A,B,C [default: all]</option>
+    </term>
+    <listitem>
+      <para>only show figures for events A,B,C</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option>--sort=A,B,C</option>
+    </term>
+    <listitem>
+      <para>sort columns by events A,B,C [event column order]</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
+    </term>
+    <listitem>
+      <para>percentage of counts (of primary sort event) we are 
+      interested in</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--auto=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>annotate all source files containing functions that helped 
+      reach the event count threshold</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option>--context=N [default: 8] </option>
+    </term>
+    <listitem>
+      <para>print N lines of context before and after annotated 
+      lines</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--cumulative=<yes|no> [default: no] ]]></option>
+    </term>
+    <listitem>
+      <para>add subroutine costs to functions calls</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
+    </term>
+    <listitem>
+      <para>print for each function their callers, the called functions 
+      or both</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[-I, --include=<dir> ]]></option>
+    </term>
+    <listitem>
+      <para>add &lt;dir&gt; to the list of directories to search for source 
+      files</para>
+  </listitem>
+  </varlistentry>
+
+</variablelist>
+
+</refsect1>
+
+
+<refsect1 id="a-see_also">
+<title>See Also</title>
+
+<para><filename>&cl-doc-path;</filename></para>
+
+</refsect1>
+
+
+<refsect1 id="a-author">
+<title>Author</title>
+
+<para>This manual page was written by 
+Philipp Frauenfelder &lt;pfrauenf@debian.org&gt;, for the Debian 
+GNU/Linux system (but may be used by others).</para>
+</refsect1>
+
+
+</refentry>
diff --git a/callgrind/docs/man-callgrind.xml b/callgrind/docs/man-callgrind.xml
new file mode 100644
index 0000000000..152543fc67
--- /dev/null
+++ b/callgrind/docs/man-callgrind.xml
@@ -0,0 +1,100 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<refentry id="callgrind">
+<refmeta>
+  <refentrytitle>Callgrind</refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo class="source">November 18, 2005</refmiscinfo>
+</refmeta>
+
+<refnamediv id="name">
+  <refname>callgrind</refname>
+  <refpurpose>calls <command>valgrind</command> with the callgrind tool</refpurpose>
+</refnamediv>
+
+<refsynopsisdiv id="synopsis">
+  <cmdsynopsis>
+    <command>callgrind</command>    
+    <arg choice="opt"><replaceable>options</replaceable></arg>
+    <arg choice="plain"><replaceable>progs-and-args</replaceable></arg>
+  </cmdsynopsis>
+</refsynopsisdiv>
+
+
+<refsect1 id="description">
+<title>Description</title>
+
+<para><command>Callgrind</command> is a profiling tool similar to gprof, 
+but by being able to observe a program run in great detail - using 
+Valgrind - it can give much more information. The binary does not have 
+to be prepared for profiling with <command>callgrind</command> in any 
+special way. Still, it is recommended to compile with debug information.</para>
+
+<para><command>Callgrind</command> builds up the call graph of a program 
+while it is running, and optionally does cache simulation. The collected 
+profiling data can be stored into an output file multiple times in a 
+program run, optionally separately for every thread in the case of 
+multithreaded code.  For interactive inspection and control, see
+<command>callgrind_control</command>.  The data produced 
+(callgrind.out.PID) can be analysed with 
+<command>callgrind_annotate</command> or better with the graphical profile 
+visualization <command>KCachegrind</command>.  Further documentation can 
+be found in HTML format either on your filesystem: 
+<filename>&cl-doc-path;</filename> or online at 
+<filename>&cl-doc-url;</filename>.</para>
+
+</refsect1>
+
+
+<refsect1 id="options">
+<title>Options</title>
+
+<para>This program follows the usual GNU command line syntax, with long
+options starting with two dashes ('--').</para>
+
+
+<xi:include href="cl-manual.xml" xpointer="cmd-options"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+</refsect1>
+
+
+
+<refsect1 id="see_also">
+<title>See Also</title>
+
+<para><command>callgrind_control</command>, 
+<command>callgrind_annotate</command>, 
+<filename>&cl-doc-path;</filename>
+</para>
+
+</refsect1>
+
+
+<refsect1 id="author">
+<title>Author</title>
+
+<para>This manual page was written by Josef Weidendorfer &lt;&cl-email;&gt;.</para>
+
+
+</refsect1>
+
+
+<refsect1 id="copyright">
+<title>Copyright</title>
+
+<para>Copyright &copy; &cl-lifespan; Josef Weidendorfer</para>
+<para>This is free software; see the source for copying conditions. 
+There is NO warranty; not even for MERCHANTABILITY or 
+FITNESS FOR A PARTICULAR PURPOSE.</para>
+
+</refsect1>
+
+
+
+</refentry>
+
diff --git a/callgrind/docs/man-control.xml b/callgrind/docs/man-control.xml
new file mode 100644
index 0000000000..ca3edde6e8
--- /dev/null
+++ b/callgrind/docs/man-control.xml
@@ -0,0 +1,132 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]>
+
+
+<refentry id="callgrind-control">
+<refmeta>
+  <refentrytitle>Callgrind Control</refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo class="c-source">October, 2005</refmiscinfo>
+</refmeta>
+
+<refnamediv id="c-name">
+  <refname>callgrind_control</refname>
+  <refpurpose>observe and control applications currently running under 
+  supervision of <command>callgrind</command></refpurpose>
+</refnamediv>
+
+<refsynopsisdiv id="c-synopsis">
+  <cmdsynopsis>
+    <command>callgrind_control</command>    
+    <arg choice="opt"><replaceable>options</replaceable></arg>
+    <arg choice="opt" rep="repeat"><replaceable>pid/program-name</replaceable></arg>
+  </cmdsynopsis>
+</refsynopsisdiv>
+
+
+<refsect1 id="c-description">
+<title>Description</title>
+
+<para>This manual page documents briefly the 
+<command>callgrind_control</command> command. When not specifying a
+<command>pid/program name</command> argument, all applications run 
+by callgrind on this system will be used for actions given by the 
+specified option(s). The default action is to give short information 
+for the applications run by callgrind.</para>
+
+</refsect1>
+
+
+<refsect1 id="c-options">
+<title>Options</title>
+
+<para>This program follows the usual GNU command line syntax, with long
+options starting with two dashes ("--").  A summary of options is 
+included below.</para>
+
+<variablelist remap="TP">
+
+  <varlistentry>
+    <term><option>-h, --help</option></term>
+    <listitem>
+      <para>Show summary of options.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>--version</option></term>
+    <listitem>
+      <para>Show version of callgrind_control.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>-s</option></term>
+    <listitem>
+      <para>Show statistics</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>-b</option></term>
+    <listitem>
+      <para>Show stack trace</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>-e [A,B,C] [default: all] </option></term>
+    <listitem>
+      <para>Only show figures for events A,B,C</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>-z</option></term>
+    <listitem>
+      <para>Zero cost counters</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>-d, --dump [hint]</option></term>
+    <listitem>
+      <para>Request the dumping of profile information. Optionally, a 
+      string can be specified which is written into the dump as part of 
+      the Trigger reason. This can be used to distinguish multiple dumps.</para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term><option>-k</option></term>
+    <listitem>
+      <para>Kill</para>
+    </listitem>
+  </varlistentry>
+
+</variablelist>
+
+</refsect1>
+
+
+<refsect1 id="c-see_also">
+<title>See Also</title>
+
+<para><filename>&cl-doc-path;</filename></para>
+
+</refsect1>
+
+
+<refsect1 id="c-author">
+<title>Author</title>
+
+<para>This manual page was written by Josef Weidendorfer &lt;&cl-email;&gt;.</para>
+
+
+</refsect1>
+
+
+</refentry>
+
diff --git a/docs/xml/manual.xml b/docs/xml/manual.xml
index 1599aedb9b..0d996488c2 100644
--- a/docs/xml/manual.xml
+++ b/docs/xml/manual.xml
@@ -28,6 +28,8 @@
       xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="../../cachegrind/docs/cg-manual.xml" parse="xml"  
       xmlns:xi="http://www.w3.org/2001/XInclude" />
+  <xi:include href="../../callgrind/docs/cl-manual.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="../../massif/docs/ms-manual.xml" parse="xml"  
       xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="../../helgrind/docs/hg-manual.xml" parse="xml"  
diff --git a/docs/xml/tech-docs.xml b/docs/xml/tech-docs.xml
index 825c8e14fd..8ce8dfdf0d 100644
--- a/docs/xml/tech-docs.xml
+++ b/docs/xml/tech-docs.xml
@@ -21,6 +21,8 @@
       xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="../../cachegrind/docs/cg-tech-docs.xml" parse="xml"  
       xmlns:xi="http://www.w3.org/2001/XInclude" />
+  <xi:include href="../../callgrind/docs/cl-format.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="writing-tools.xml" parse="xml"  
       xmlns:xi="http://www.w3.org/2001/XInclude" />
 
diff --git a/docs/xml/valgrind-manpage.xml b/docs/xml/valgrind-manpage.xml
index e86ebb37df..ffe33646a1 100644
--- a/docs/xml/valgrind-manpage.xml
+++ b/docs/xml/valgrind-manpage.xml
@@ -76,6 +76,14 @@ leaks.</para>
     instructions executed and cache misses incurred.</para>
   </listitem>
 
+  <listitem>
+    <para><option>callgrind</option> adds call graph tracing to cachegrind.  It can be
+    used to get call counts and inclusive cost for each call happening in your
+    program. In addition to cachegrind, callgrind can annotate threads separatly,
+    and every instruction of disassembler output of your program with the number of
+    instructions executed and cache misses incurred.</para>
+  </listitem>
+
   <listitem>
     <para><option>helgrind</option> spots potential race conditions in
     your program.</para>
@@ -198,6 +206,17 @@ leaks.</para>
 
 
 
+<refsect1 id="callgrind-options">
+<title>Callgrind Options</title>
+
+<xi:include href="../../callgrind/docs/cl-manual.xml" 
+            xpointer="cl.opts.list"
+            xmlns:xi="http://www.w3.org/2001/XInclude" />
+
+</refsect1>
+
+
+
 <refsect1 id="massif-options">
 <title>Massif Options</title>