From: Nicholas Nethercote <n.nethercote@gmail.com>
Date: Thu, 20 Apr 2023 21:20:11 +0000 (+1000)
Subject: Rewrite Cachegrind docs.
X-Git-Tag: VALGRIND_3_21_0~30
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=c2e62127ad8a9b71c4abf4b166ad545988490c32;p=thirdparty%2Fvalgrind.git

Rewrite Cachegrind docs.

For all the changes I've made recently. And also various other changes
that occurred over the past 20 years that didn't previously make it into
the docs.

Also, this change de-emphasises the cache and branch simulation aspect,
because they're no longer that useful. Instead it emphasises the
precision and reproducibility of instruction count profiling.
---

diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml
index 92fe086824..35d6a412e3 100644
--- a/cachegrind/docs/cg-manual.xml
+++ b/cachegrind/docs/cg-manual.xml
@@ -5,167 +5,117 @@
 
 <!-- Referenced from both the manual and manpage -->
 <chapter id="&vg-cg-manual-id;" xreflabel="&vg-cg-manual-label;">
-<title>Cachegrind: a cache and branch-prediction profiler</title>
+<title>Cachegrind: a high-precision tracing profiler</title>
 
-<para>To use this tool, you must specify
-<option>--tool=cachegrind</option> on the
-Valgrind command line.</para>
+<para>
+To use this tool, specify <option>--tool=cachegrind</option> on the Valgrind
+command line.
+</para>
 
 <sect1 id="cg-manual.overview" xreflabel="Overview">
 <title>Overview</title>
 
-<para>Cachegrind simulates how your program interacts with a machine's cache
-hierarchy and (optionally) branch predictor.  It simulates a machine with
-independent first-level instruction and data caches (I1 and D1), backed by a
-unified second-level cache (L2).  This exactly matches the configuration of
-many modern machines.</para>
-
-<para>However, some modern machines have three or four levels of cache.  For these
-machines (in the cases where Cachegrind can auto-detect the cache
-configuration) Cachegrind simulates the first-level and last-level caches.
-The reason for this choice is that the last-level cache has the most influence on
-runtime, as it masks accesses to main memory.  Furthermore, the L1 caches
-often have low associativity, so simulating them can detect cases where the
-code interacts badly with this cache (eg. traversing a matrix column-wise
-with the row length being a power of 2).</para>
-
-<para>Therefore, Cachegrind always refers to the I1, D1 and LL (last-level)
-caches.</para>
-
 <para>
-Cachegrind gathers the following statistics (abbreviations used for each statistic
-is given in parentheses):</para>
+Cachegrind is a high-precision tracing profiler. It runs slowly, but collects
+precise and reproducible profiling data. It can merge and diff data from
+different runs. To expand on these characteristics:
+</para>
+
 <itemizedlist>
   <listitem>
-    <para>I cache reads (<computeroutput>Ir</computeroutput>,
-    which equals the number of instructions executed),
-    I1 cache read misses (<computeroutput>I1mr</computeroutput>) and
-    LL cache instruction read misses (<computeroutput>ILmr</computeroutput>).
-    </para>
-  </listitem>
-  <listitem>
-    <para>D cache reads (<computeroutput>Dr</computeroutput>, which
-    equals the number of memory reads),
-    D1 cache read misses (<computeroutput>D1mr</computeroutput>), and
-    LL cache data read misses (<computeroutput>DLmr</computeroutput>).
-    </para>
-  </listitem>
-  <listitem>
-    <para>D cache writes (<computeroutput>Dw</computeroutput>, which equals
-    the number of memory writes),
-    D1 cache write misses (<computeroutput>D1mw</computeroutput>), and
-    LL cache data write misses (<computeroutput>DLmw</computeroutput>).
-    </para>
-  </listitem>
-  <listitem>
-    <para>Conditional branches executed (<computeroutput>Bc</computeroutput>) and
-    conditional branches mispredicted (<computeroutput>Bcm</computeroutput>).
+    <para>
+    <emphasis>Precise.</emphasis> Cachegrind measures the exact number of
+    instructions executed by your program, not an approximation. Furthermore,
+    it presents the gathered data at the file, function, and line level. This
+    is different to many other profilers that measure approximate execution
+    time, using sampling, and only at the function level.
     </para>
   </listitem>
+
   <listitem>
-    <para>Indirect branches executed (<computeroutput>Bi</computeroutput>) and
-    indirect branches mispredicted (<computeroutput>Bim</computeroutput>).
+    <para>
+    <emphasis>Reproducible.</emphasis> In general, execution time is a better
+    metric than instruction counts because it's what users perceive. However,
+    execution time often has high variability. When running the exact same
+    program on the exact same input multiple times, execution time might vary
+    by several percent. Furthermore, small changes in a program can change its
+    memory layout and have even larger effects on runtime. In contrast,
+    instruction counts are highly reproducible; for some programs they are
+    perfectly reproducible. This means the effects of small changes in a
+    program can be measured with high precision.
     </para>
   </listitem>
 </itemizedlist>
 
-<para>Note that D1 total accesses is given by
-<computeroutput>D1mr</computeroutput> +
-<computeroutput>D1mw</computeroutput>, and that LL total
-accesses is given by <computeroutput>ILmr</computeroutput> +
-<computeroutput>DLmr</computeroutput> +
-<computeroutput>DLmw</computeroutput>.
+<para>
+For these reasons, Cachegrind is an excellent complement to time-based profilers.
 </para>
 
-<para>These statistics are presented for the entire program and for each
-function in the program.  You can also annotate each line of source code in
-the program with the counts that were caused directly by it.</para>
-
-<para>On a modern machine, an L1 miss will typically cost
-around 10 cycles, an LL miss can cost as much as 200
-cycles, and a mispredicted branch costs in the region of 10
-to 30 cycles.  Detailed cache and branch profiling can be very useful
-for understanding how your program interacts with the machine and thus how
-to make it faster.</para>
+<para>
+Cachegrind can annotate programs written in any language, so long as debug info
+is present to map machine code back to the original source code. Cachegrind has
+been used successfully on programs written in C, C++, Rust, and assembly.
+</para>
 
-<para>Also, since one instruction cache read is performed per
-instruction executed, you can find out how many instructions are
-executed per line, which can be useful for traditional profiling.</para>
+<para>
+Cachegrind can also simulate how your program interacts with a machine's cache
+hierarchy and branch predictor. This simulation was the original motivation for
+the tool, hence its name. However, the simulations are basic and unlikely to
+reflect the behaviour of a modern machine. For this reason they are off by
+default. If you really want cache and branch information, a profiler like
+<computeroutput>perf</computeroutput> that accesses hardware counters is a
+better choice.
+</para>
 
 </sect1>
 
 
-
 <sect1 id="cg-manual.profile"
-       xreflabel="Using Cachegrind, cg_annotate and cg_merge">
-<title>Using Cachegrind, cg_annotate and cg_merge</title>
+       xreflabel="Using Cachegrind and cg_annotate">
+<title>Using Cachegrind and cg_annotate</title>
+
+<para>
+First, as for normal Valgrind use, you should compile with debugging info (the
+<option>-g</option> option in most compilers). But by contrast with normal
+Valgrind use, you probably do want to turn optimisation on, since you should
+profile your program as it will be normally run.
+</para>
 
-<para>First off, as for normal Valgrind use, you probably want to
-compile with debugging info (the
-<option>-g</option> option).  But by contrast with
-normal Valgrind use, you probably do want to turn
-optimisation on, since you should profile your program as it will
-be normally run.</para>
+<para>
+Second, run Cachegrind itself to gather the profiling data.
+</para>
 
-<para>Then, you need to run Cachegrind itself to gather the profiling
-information, and then run cg_annotate to get a detailed presentation of that
-information.  As an optional intermediate step, you can use cg_merge to sum
-together the outputs of multiple Cachegrind runs into a single file which
-you then use as the input for cg_annotate.  Alternatively, you can use
-cg_diff to difference the outputs of two Cachegrind runs into a single file
-which you then use as the input for cg_annotate.</para>
+<para>
+Third, run cg_annotate to get a detailed presentation of that data. cg_annotate
+can combine the results of multiple Cachegrind output files. It can also
+perform a diff between two Cachegrind output files.
+</para>
 
 
 <sect2 id="cg-manual.running-cachegrind" xreflabel="Running Cachegrind">
 <title>Running Cachegrind</title>
 
-<para>To run Cachegrind on a program <filename>prog</filename>, run:</para>
+<para>
+To run Cachegrind on a program <filename>prog</filename>, run:
 <screen><![CDATA[
 valgrind --tool=cachegrind prog
 ]]></screen>
+</para>
 
-<para>The program will execute (slowly).  Upon completion,
-summary statistics that look like this will be printed:</para>
+<para>
+The program will execute (slowly). Upon completion, summary statistics that
+look like this will be printed:
+</para>
 
 <programlisting><![CDATA[
-==31751== I   refs:      27,742,716
-==31751== I1  misses:           276
-==31751== LLi misses:           275
-==31751== I1  miss rate:        0.0%
-==31751== LLi miss rate:        0.0%
-==31751== 
-==31751== D   refs:      15,430,290  (10,955,517 rd + 4,474,773 wr)
-==31751== D1  misses:        41,185  (    21,905 rd +    19,280 wr)
-==31751== LLd misses:        23,085  (     3,987 rd +    19,098 wr)
-==31751== D1  miss rate:        0.2% (       0.1%   +       0.4%)
-==31751== LLd miss rate:        0.1% (       0.0%   +       0.4%)
-==31751== 
-==31751== LL misses:         23,360  (     4,262 rd +    19,098 wr)
-==31751== LL miss rate:         0.0% (       0.0%   +       0.4%)]]></programlisting>
-
-<para>Cache accesses for instruction fetches are summarised
-first, giving the number of fetches made (this is the number of
-instructions executed, which can be useful to know in its own
-right), the number of I1 misses, and the number of LL instruction
-(<computeroutput>LLi</computeroutput>) misses.</para>
-
-<para>Cache accesses for data follow. The information is similar
-to that of the instruction fetches, except that the values are
-also shown split between reads and writes (note each row's
-<computeroutput>rd</computeroutput> and
-<computeroutput>wr</computeroutput> values add up to the row's
-total).</para>
-
-<para>Combined instruction and data figures for the LL cache
-follow that.  Note that the LL miss rate is computed relative to the total
-number of memory accesses, not the number of L1 misses.  I.e.  it is
-<computeroutput>(ILmr + DLmr + DLmw) / (Ir + Dr + Dw)</computeroutput>
-not
-<computeroutput>(ILmr + DLmr + DLmw) / (I1mr + D1mr + D1mw)</computeroutput>
-</para>
-
-<para>Branch prediction statistics are not collected by default.
-To do so, add the option <option>--branch-sim=yes</option>.</para>
+==17942== I refs:          8,195,070
+]]></programlisting>
+
+<para>
+The <computeroutput>I refs</computeroutput> number is short for "Instruction
+cache references", which is equivalent to "instructions executed". If you
+enable the cache and/or branch simulation, additional counts will be shown.
+</para>
 
 </sect2>
 
@@ -173,691 +123,791 @@ To do so, add the option <option>--branch-sim=yes</option>.</para>
 <sect2 id="cg-manual.outputfile" xreflabel="Output File">
 <title>Output File</title>
 
-<para>As well as printing summary information, Cachegrind also writes
-more detailed profiling information to a file.  By default this file is named
-<filename>cachegrind.out.&lt;pid&gt;</filename> (where
-<filename>&lt;pid&gt;</filename> is the program's process ID), but its name
-can be changed with the <option>--cachegrind-out-file</option> option.  This
-file is human-readable, but is intended to be interpreted by the
-accompanying program cg_annotate, described in the next section.</para>
-
-<para>The default <computeroutput>.&lt;pid&gt;</computeroutput> suffix
-on the output file name serves two purposes.  Firstly, it means you 
-don't have to rename old log files that you don't want to overwrite.  
-Secondly, and more importantly, it allows correct profiling with the
-<option>--trace-children=yes</option> option of
-programs that spawn child processes.</para>
+<para>
+Cachegrind also writes more detailed profiling data to a file. By default this
+Cachegrind output file is named <filename>cachegrind.out.&lt;pid&gt;</filename>
+(where <filename>&lt;pid&gt;</filename> is the program's process ID), but its
+name can be changed with the <option>--cachegrind-out-file</option> option.
+This file is human-readable, but is intended to be interpreted by the
+accompanying program cg_annotate, described in the next section.
+</para>
 
-<para>The output file can be big, many megabytes for large applications
-built with full debugging information.</para>
+<para>
+The default <computeroutput>.&lt;pid&gt;</computeroutput> suffix on the output
+file name serves two purposes. First, it means existing Cachegrind output files
+aren't immediately overwritten. Second, and more importantly, it allows correct
+profiling with the <option>--trace-children=yes</option> option of programs
+that spawn child processes.
+</para>
 
 </sect2>
 
-
   
 <sect2 id="cg-manual.running-cg_annotate" xreflabel="Running cg_annotate">
 <title>Running cg_annotate</title>
 
-<para>Before using cg_annotate,
-it is worth widening your window to be at least 120-characters
-wide if possible, as the output lines can be quite long.</para>
-
-<para>To get a function-by-function summary, run:</para>
+<para>
+Before using cg_annotate, it is worth widening your window to be at least 120
+characters wide if possible, because the output lines can be quite long.
+</para>
 
+<para>
+Then run:
 <screen>cg_annotate &lt;filename&gt;</screen>
-
-<para>on a Cachegrind output file.</para>
+on a Cachegrind output file.
+</para>
 
 </sect2>
 
+<!--
+To produce the sample date, I did the following. Note that the single hypens in
+the valgrind command should be double hyphens, but XML doesn't allow double
+hyphens in comments.
+
+  gcc -g -O concord.c -o concord
+  valgrind -tool=cachegrind -cachegrind-out-file=concord.cgout ./concord ../cg_main.c
+  (to exit, type `q` and hit enter)
+  python ../cg_annotate concord.cgout > concord.cgann
+
+concord.c is a small C program I wrote at university. It's a good size for an example.
+-->
 
-<sect2 id="cg-manual.the-output-preamble" xreflabel="The Output Preamble">
-<title>The Output Preamble</title>
+<sect2 id="cg-manual.the-metadata" xreflabel="The Metadata Section">
+<title>The Metadata Section</title>
 
-<para>The first part of the output looks like this:</para>
+<para>
+The first part of the output looks like this:
+</para>
 
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
-I1 cache:              65536 B, 64 B, 2-way associative
-D1 cache:              65536 B, 64 B, 2-way associative
-LL cache:              262144 B, 64 B, 8-way associative
-Command:               concord vg_to_ucode.c
-Events recorded:       Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
-Events shown:          Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
-Event sort order:      Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
-Threshold:             99%
-Chosen for annotation:
-Auto-annotation:       off
+-- Metadata
+--------------------------------------------------------------------------------
+Invocation:       ../cg_annotate concord.cgout
+Command:          ./concord ../cg_main.c
+Events recorded:  Ir
+Events shown:     Ir
+Event sort order: Ir
+Threshold:        0.1%
+Annotation:       on
 ]]></programlisting>
 
-
-<para>This is a summary of the annotation options:</para>
+<para>
+It summarizes how Cachegrind and the profiled program were run.
+</para>
                     
 <itemizedlist>
-
   <listitem>
-    <para>I1 cache, D1 cache, LL cache: cache configuration.  So
-    you know the configuration with which these results were
-    obtained.</para>
+    <para>
+    Invocation: the command line used to produce this output.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Command: the command line invocation of the program
-      under examination.</para>
+    <para>
+    Command: the command line used to run the profiled program.
+    </para>
   </listitem>
 
   <listitem>
-   <para>Events recorded: which events were recorded.</para>
-
- </listitem>
-
- <listitem>
-   <para>Events shown: the events shown, which is a subset of the events
-   gathered.  This can be adjusted with the
-   <option>--show</option> option.</para>
+    <para>
+    Events recorded: which events were recorded. By default, this is
+    <computeroutput>Ir</computeroutput>. More events will be recorded if cache
+    and/or branch simulation is enabled.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Event sort order: the sort order in which functions are
-    shown.  For example, in this case the functions are sorted
-    from highest <computeroutput>Ir</computeroutput> counts to
-    lowest.  If two functions have identical
-    <computeroutput>Ir</computeroutput> counts, they will then be
-    sorted by <computeroutput>I1mr</computeroutput> counts, and
-    so on.  This order can be adjusted with the
-    <option>--sort</option> option.</para>
-
-    <para>Note that this dictates the order the functions appear.
-    It is <emphasis>not</emphasis> the order in which the columns
-    appear; that is dictated by the "events shown" line (and can
-    be changed with the <option>--show</option>
-    option).</para>
+    <para>
+    Events shown: the events shown, which is a subset of the events gathered.
+    This can be adjusted with the <option>--show</option> option.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Threshold: cg_annotate
-    by default omits functions that cause very low counts
-    to avoid drowning you in information.  In this case,
-    cg_annotate shows summaries the functions that account for
-    99% of the <computeroutput>Ir</computeroutput> counts;
-    <computeroutput>Ir</computeroutput> is chosen as the
-    threshold event since it is the primary sort event.  The
-    threshold can be adjusted with the
-    <option>--threshold</option>
-    option.</para>
+    <para>
+    Event sort order: the sort order used for the subsequent sections. For
+    example, in this case those sections are sorted from highest
+    <computeroutput>Ir</computeroutput> counts to lowest. If there are multiple
+    events, one will be the primary sort event, and then there can be a
+    secondary sort event, tertiary sort event, etc., though more than one is
+    rarely needed. This order can be adjusted with the <option>--sort</option>
+    option. Note that this does <emphasis>not</emphasis> specify the order in
+    which the columns appear. That is specified by the "events shown" line (and
+    can be changed with the <option>--show</option> option).
+    </para>
   </listitem>
 
   <listitem>
-    <para>Chosen for annotation: names of files specified
-    manually for annotation; in this case none.</para>
+    <para>
+    Threshold: cg_annotate by default omits files and functions with very low
+    counts to keep the output size reasonable. By default cg_annotate only
+    shows files and functions that account for at least 0.1% of the primary
+    sort event. The threshold can be adjusted with the
+    <option>--threshold</option> option.
+    </para>
   </listitem>
 
   <listitem>
-    <para>Auto-annotation: whether auto-annotation was requested
-    via the <option>--auto=yes</option>
-    option. In this case no.</para>
+    <para>
+    Annotation: whether source file annotation is enabled. Controlled with the
+    <option>--annotate</option> option.
+    </para>
   </listitem>
 
 </itemizedlist>
 
+<para>
+If cache simulation is enabled, details of the cache parameters will be shown
+above the "Invocation" line.
+</para>
+
 </sect2>
 
 
 <sect2 id="cg-manual.the-global"
-       xreflabel="The Global and Function-level Counts">
-<title>The Global and Function-level Counts</title>
+       xreflabel="Global, File, and Function-level Counts">
+<title>Global, File, and Function-level Counts</title>
 
-<para>Then follows summary statistics for the whole
-program:</para>
+<para>
+Next comes the summary for the whole program:
+</para>
   
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
-Ir         I1mr ILmr Dr         D1mr   DLmr  Dw        D1mw   DLmw
+-- Summary
+--------------------------------------------------------------------------------
+Ir________________ 
+
+8,195,070 (100.0%)  PROGRAM TOTALS
+]]></programlisting>
+
+<para>
+The <computeroutput>Ir</computeroutput> column label is suffixed with
+underscores to show the bounds of the columns underneath.
+</para>
+
+<para>
+Then comes file:function counts. Here is the first part of that section:
+</para>
+
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- File:function summary
 --------------------------------------------------------------------------------
-27,742,716  276  275 10,955,517 21,905 3,987 4,474,773 19,280 19,098  PROGRAM TOTALS]]></programlisting>
+  Ir______________________  file:function
+
+< 3,078,746 (37.6%, 37.6%)  /home/njn/grind/ws1/cachegrind/concord.c:
+  1,630,232 (19.9%)           get_word
+    630,918  (7.7%)           hash
+    461,095  (5.6%)           insert
+    130,560  (1.6%)           add_existing
+     91,014  (1.1%)           init_hash_table
+     88,056  (1.1%)           create
+     46,676  (0.6%)           new_word_node
+
+< 1,746,038 (21.3%, 58.9%)  ./malloc/./malloc/malloc.c:
+  1,285,938 (15.7%)           _int_malloc
+    458,225  (5.6%)           malloc
+
+< 1,107,550 (13.5%, 72.4%)  ./libio/./libio/getc.c:getc
+
+<   551,071  (6.7%, 79.1%)  ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2
+
+<   521,228  (6.4%, 85.5%)  ./ctype/../include/ctype.h:
+    260,616  (3.2%)           __ctype_tolower_loc
+    260,612  (3.2%)           __ctype_b_loc
+
+<   468,163  (5.7%, 91.2%)  ???:
+    468,151  (5.7%)           ???
+
+<   456,071  (5.6%, 96.8%)  /usr/include/ctype.h:get_word
+
+]]></programlisting>
+
+<para>
+Each entry covers one file, and one or more functions within that file. If
+there is only one significant function within a file, as in the first entry,
+the file and function are shown on the same line separate by a colon. If there
+are multiple significant functions within a file, as in the third entry, each
+function gets its own line.
+</para>
+
+<para>
+This example involves a small C program, and shows a combination of code from
+the program itself (including functions like <function>get_word</function> and
+<function>hash</function> in the file <filename>concord.c</filename>) as well
+as code from system libraries, such as functions like
+<function>malloc</function> and <function>getc</function>.
+</para>
+
+<para>
+Each entry is preceded with a <computeroutput>&lt;</computeroutput>, which can
+be useful when navigating through the output in an editor, or grepping through
+results.
+</para>
 
 <para>
-These are similar to the summary provided when Cachegrind finishes running.
+The first percentage in each column indicates the proportion of the total event
+count is covered by this line. The second percentage, which only shows on the
+first line of each entry, shows the cumulative percentage of all the entries up
+to and including this one. The entries shown here account for 96.8% of the
+instructions executed by the program.
 </para>
 
-<para>Then comes function-by-function statistics:</para>
+<para>
+The name <computeroutput>???</computeroutput> is used if the file name and/or
+function name could not be determined from debugging information. If
+<filename>???</filename> filenames dominate, the program probably wasn't
+compiled with <option>-g</option>. If <function>???</function> function names
+dominate, the program may have had symbols stripped.
+</para>
+
+<para>
+After that comes function:file counts. Here is the first part of that section:
+</para>
 
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
-Ir        I1mr ILmr Dr        D1mr  DLmr  Dw        D1mw   DLmw    file:function
+-- Function:file summary
 --------------------------------------------------------------------------------
-8,821,482    5    5 2,242,702 1,621    73 1,794,230      0      0  getc.c:_IO_getc
-5,222,023    4    4 2,276,334    16    12   875,959      1      1  concord.c:get_word
-2,649,248    2    2 1,344,810 7,326 1,385         .      .      .  vg_main.c:strcmp
-2,521,927    2    2   591,215     0     0   179,398      0      0  concord.c:hash
-2,242,740    2    2 1,046,612   568    22   448,548      0      0  ctype.c:tolower
-1,496,937    4    4   630,874 9,000 1,400   279,388      0      0  concord.c:insert
-  897,991   51   51   897,831    95    30        62      1      1  ???:???
-  598,068    1    1   299,034     0     0   149,517      0      0  ../sysdeps/generic/lockfile.c:__flockfile
-  598,068    0    0   299,034     0     0   149,517      0      0  ../sysdeps/generic/lockfile.c:__funlockfile
-  598,024    4    4   213,580    35    16   149,506      0      0  vg_clientmalloc.c:malloc
-  446,587    1    1   215,973 2,167   430   129,948 14,057 13,957  concord.c:add_existing
-  341,760    2    2   128,160     0     0   128,160      0      0  vg_clientmalloc.c:vg_trap_here_WRAPPER
-  320,782    4    4   150,711   276     0    56,027     53     53  concord.c:init_hash_table
-  298,998    1    1   106,785     0     0    64,071      1      1  concord.c:create
-  149,518    0    0   149,516     0     0         1      0      0  ???:tolower@@GLIBC_2.0
-  149,518    0    0   149,516     0     0         1      0      0  ???:fgetc@@GLIBC_2.0
-   95,983    4    4    38,031     0     0    34,409  3,152  3,150  concord.c:new_word_node
-   85,440    0    0    42,720     0     0    21,360      0      0  vg_clientmalloc.c:vg_bogus_epilogue]]></programlisting>
-
-<para>Each function
-is identified by a
-<computeroutput>file_name:function_name</computeroutput> pair. If
-a column contains only a dot it means the function never performs
-that event (e.g. the third row shows that
-<computeroutput>strcmp()</computeroutput> contains no
-instructions that write to memory). The name
-<computeroutput>???</computeroutput> is used if the file name
-and/or function name could not be determined from debugging
-information. If most of the entries have the form
-<computeroutput>???:???</computeroutput> the program probably
-wasn't compiled with <option>-g</option>.</para>
-
-<para>It is worth noting that functions will come both from
-the profiled program (e.g. <filename>concord.c</filename>)
-and from libraries (e.g. <filename>getc.c</filename>)</para>
+  Ir______________________  function:file
+
+> 2,086,303 (25.5%, 25.5%)  get_word:
+  1,630,232 (19.9%)           /home/njn/grind/ws1/cachegrind/concord.c
+    456,071  (5.6%)           /usr/include/ctype.h
+
+> 1,285,938 (15.7%, 41.1%)  _int_malloc:./malloc/./malloc/malloc.c
+
+> 1,107,550 (13.5%, 54.7%)  getc:./libio/./libio/getc.c
+
+>   630,918  (7.7%, 62.4%)  hash:/home/njn/grind/ws1/cachegrind/concord.c
+
+>   551,071  (6.7%, 69.1%)  __strcmp_avx2:./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+
+>   480,248  (5.9%, 74.9%)  malloc:
+    458,225  (5.6%)           ./malloc/./malloc/malloc.c
+     22,023  (0.3%)           ./malloc/./malloc/arena.c
+
+>   468,151  (5.7%, 80.7%)  ???:???
+
+>   461,095  (5.6%, 86.3%)  insert:/home/njn/grind/ws1/cachegrind/concord.c
+]]></programlisting>
+
+<para>
+This is similar to the previous section, but is grouped by functions first and
+files second. Also, the entry markers are <computeroutput>&gt;</computeroutput>
+instead of <computeroutput>&lt;</computeroutput>.
+</para>
+
+<para>
+You might wonder why this section is needed, and how it differs from the
+previous section. The answer is inlining. In this example there are two entries
+demonstrating a function whose code is effectively spread across more than one
+file: <function>get_word</function> and <function>malloc</function>. Here is an
+example from profiling the Rust compiler, a much larger program that uses
+inlining more:
+</para>
+
+<programlisting><![CDATA[
+>  30,469,230 (1.3%, 11.1%)  <rustc_middle::ty::context::CtxtInterners>::intern_ty:
+   10,269,220 (0.5%)           /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.3/src/raw/mod.rs
+    7,696,827 (0.3%)           /home/njn/dev/rust0/compiler/rustc_middle/src/ty/context.rs
+    3,858,099 (0.2%)           /home/njn/dev/rust0/library/core/src/cell.rs
+]]></programlisting>
+
+<para>
+In this case the compiled function <function>intern_ty</function> includes code
+from three different source files, due to inlining. These should be examined
+together. Older versions of cg_annotate presented this entry as three separate
+file:function entries, which would typically be intermixed with all the other
+entries, making it hard to see that they are all really part of the same
+function.
+</para>
 
 </sect2>
 
 
-<sect2 id="cg-manual.line-by-line" xreflabel="Line-by-line Counts">
-<title>Line-by-line Counts</title>
+<sect2 id="cg-manual.line-by-line" xreflabel="Per-line Counts">
+<title>Per-line Counts</title>
+
+<para>
+By default, a source file is annotated if it contains at least one function
+that meets the significance threshold. This can be disabled with the
+<option>--annotate</option> option.
+</para>
 
-<para>By default, all source code annotation is also shown. (Filenames to be
-annotated can also by specified manually as arguments to cg_annotate, but this
-is rarely needed.) For example, the output from running <filename>cg_annotate
-&lt;filename&gt; </filename> for our example produces the same output as above
-followed by an annotated version of <filename>concord.c</filename>, a section
-of which looks like:</para>
+<para>
+To continue the previous example, here is part of the annotation of the file
+<filename>concord.c</filename>:
+</para>
 
 <programlisting><![CDATA[
 --------------------------------------------------------------------------------
--- Auto-annotated source: concord.c
+-- Annotated source file: /home/njn/grind/ws1/cachegrind/docs/concord.c
 --------------------------------------------------------------------------------
-Ir        I1mr ILmr Dr      D1mr  DLmr  Dw      D1mw   DLmw
-
-        .    .    .       .     .     .       .      .      .  void init_hash_table(char *file_name, Word_Node *table[])
-        3    1    1       .     .     .       1      0      0  {
-        .    .    .       .     .     .       .      .      .      FILE *file_ptr;
-        .    .    .       .     .     .       .      .      .      Word_Info *data;
-        1    0    0       .     .     .       1      1      1      int line = 1, i;
-        .    .    .       .     .     .       .      .      .
-        5    0    0       .     .     .       3      0      0      data = (Word_Info *) create(sizeof(Word_Info));
-        .    .    .       .     .     .       .      .      .
-    4,991    0    0   1,995     0     0     998      0      0      for (i = 0; i < TABLE_SIZE; i++)
-    3,988    1    1   1,994     0     0     997     53     52          table[i] = NULL;
-        .    .    .       .     .     .       .      .      .
-        .    .    .       .     .     .       .      .      .      /* Open file, check it. */
-        6    0    0       1     0     0       4      0      0      file_ptr = fopen(file_name, "r");
-        2    0    0       1     0     0       .      .      .      if (!(file_ptr)) {
-        .    .    .       .     .     .       .      .      .          fprintf(stderr, "Couldn't open '%s'.\n", file_name);
-        1    1    1       .     .     .       .      .      .          exit(EXIT_FAILURE);
-        .    .    .       .     .     .       .      .      .      }
-        .    .    .       .     .     .       .      .      .
-  165,062    1    1  73,360     0     0  91,700      0      0      while ((line = get_word(data, line, file_ptr)) != EOF)
-  146,712    0    0  73,356     0     0  73,356      0      0          insert(data->;word, data->line, table);
-        .    .    .       .     .     .       .      .      .
-        4    0    0       1     0     0       2      0      0      free(data);
-        4    0    0       1     0     0       2      0      0      fclose(file_ptr);
-        3    0    0       2     0     0       .      .      .  }]]></programlisting>
-
-<para>(Although column widths are automatically minimised, a wide
-terminal is clearly useful.)</para>
-  
-<para>Each source file is clearly marked
-(<computeroutput>User-annotated source</computeroutput>) as
-having been chosen manually for annotation.  If the file was
-found in one of the directories specified with the
-<option>-I</option>/<option>--include</option> option, the directory
-and file are both given.</para>
-
-<para>Each line is annotated with its event counts.  Events not
-applicable for a line are represented by a dot.  This is useful
-for distinguishing between an event which cannot happen, and one
-which can but did not.</para>
-
-<para>Sometimes only a small section of a source file is
-executed.  To minimise uninteresting output, Cachegrind only shows
-annotated lines and lines within a small distance of annotated
-lines.  Gaps are marked with the line numbers so you know which
-part of a file the shown code comes from, eg:</para>
+Ir____________
+
+      .         /* Function builds the hash table from the given file. */  
+      .         void init_hash_table(char *file_name, Word_Node *table[])  
+      8 (0.0%)  {                                                          
+      .             FILE *file_ptr;                                        
+      .             Word_Info *data;                                       
+      2 (0.0%)      int line = 1, i;                                       
+      .                                                                    
+      .             /* Structure used when reading in words and line numbers. */
+      3 (0.0%)      data = (Word_Info *) create(sizeof(Word_Info));        
+      .                                                                    
+      .             /* Initialise entire table to NULL. */                 
+  2,993 (0.0%)      for (i = 0; i < TABLE_SIZE; i++)                       
+    997 (0.0%)          table[i] = NULL;                                   
+      .                                                                    
+      .             /* Open file, check it. */                             
+      4 (0.0%)      file_ptr = fopen(file_name, "r");                      
+      2 (0.0%)      if (!(file_ptr)) {                                     
+      .                 fprintf(stderr, "Couldn't open '%s'.\n", file_name);
+      .                 exit(EXIT_FAILURE);                                
+      .             }                                                      
+      .                                                                    
+      .             /*  'Get' the words and lines one at a time from the file, and insert them
+      .             ** into the table one at a time. */                    
+ 55,363 (0.7%)      while ((line = get_word(data, line, file_ptr)) != EOF) 
+ 31,632 (0.4%)          insert(data->word, data->line, table);             
+      .                                                                    
+      2 (0.0%)      free(data);                                            
+      2 (0.0%)      fclose(file_ptr);                                      
+      6 (0.0%)  }  
+]]></programlisting>
+
+<para>
+Each executed line is annotated with its event counts. Other lines are
+annotated with a dot. This may be because they contain no executable code, or
+they contain executable code but were never executed.
+</para>
+
+<para>
+You can easily tell if a function is inlined from this output. If it is not
+inlined, it will have event counts on the lines containing the opening and
+closing braces. If it is inlined, it will not have event counts on those lines.
+In the example above, <function>init_hash_table</function> does have counts,
+so you can tell it is not inlined.
+</para>
+
+<para>
+Note again that inlining can lead to surprising results. If a function
+<function>f</function> is always inlined, in the file:function and
+function:file sections counts will be attributed to the functions it is inlined
+into, rather than itself. However, if you look at the line-by-line annotations
+for <function>f</function> you'll see the counts that belong to
+<function>f</function>. So it's worth looking for large counts/percentages in the
+line-by-line annotations.
+</para>
+
+<para>
+Sometimes only a small section of a source file is executed. To minimise
+uninteresting output, Cachegrind only shows annotated lines and lines within a
+small distance of annotated lines. Gaps are marked with line numbers, for
+example:
+</para>
 
 <programlisting><![CDATA[
-(figures and code for line 704)
--- line 704 ----------------------------------------
--- line 878 ----------------------------------------
-(figures and code for line 878)]]></programlisting>
-
-<para>The amount of context to show around annotated lines is
-controlled by the <option>--context</option>
-option.</para>
-
-<para>Automatic annotation is enabled by default.
-cg_annotate will automatically annotate every source file it can
-find that is mentioned in the function-by-function summary.
-Therefore, the files chosen for auto-annotation are affected by
-the <option>--sort</option> and
-<option>--threshold</option> options.  Each
-source file is clearly marked (<computeroutput>Auto-annotated
-source</computeroutput>) as being chosen automatically.  Any
-files that could not be found are mentioned at the end of the
-output, eg:</para>
+(counts and code for line 704)
+-- line 375 ----------------------------------------
+-- line 514 ----------------------------------------
+(counts and code for line 878)
+]]></programlisting>
+
+<para>
+The number of lines of context shown around annotated lines is controlled by
+the <option>--context</option> option.
+</para>
+
+<para>
+Any significant source files that could not be found are shown like this:
+</para>
 
 <programlisting><![CDATA[
-------------------------------------------------------------------
-The following files chosen for auto-annotation could not be found:
-------------------------------------------------------------------
-  getc.c
-  ctype.c
-  ../sysdeps/generic/lockfile.c]]></programlisting>
-
-<para>This is quite common for library files, since libraries are
-usually compiled with debugging information, but the source files
-are often not present on a system.  If a file is chosen for
-annotation both manually and automatically, it
-is marked as <computeroutput>User-annotated
-source</computeroutput>. Use the
-<option>-I</option>/<option>--include</option> option to tell Valgrind where
-to look for source files if the filenames found from the debugging
-information aren't specific enough.</para>
-
-<para> Beware that auto-annotation can produce a lot of output if your program
-is large.</para>
+--------------------------------------------------------------------------------
+-- Annotated source file: ./malloc/./malloc/malloc.c                       
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:    
+- ./malloc/./malloc/malloc.c 
+]]></programlisting>
 
-</sect2>
+<para>
+This is common for library files, because libraries are usually compiled with
+debugging information but the source files are rarely present on a system.
+</para>
+
+<para>
+Cachegrind relies heavily on accurate debug info. Sometimes compilers do not
+map a particular compiled instruction to line number 0, where the 0 represents
+"unknown" or "none". This is annoying but does happen in practice. cg_annotate
+prints these in the following way:
+</para>
 
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- Annotated source file: /home/njn/dev/rust0/compiler/rustc_borrowck/src/lib.rs
+--------------------------------------------------------------------------------
+Ir______________
 
-<sect2 id="cg-manual.assembler" xreflabel="Annotating Assembly Code Programs">
-<title>Annotating Assembly Code Programs</title>
+1,046,746 (0.0%)  <unknown (line 0)>
+]]></programlisting>
 
-<para>Valgrind can annotate assembly code programs too, or annotate
-the assembly code generated for your C program.  Sometimes this is
-useful for understanding what is really happening when an
-interesting line of C code is translated into multiple
-instructions.</para>
+<para>
+Finally, when annotation is performed, the output ends with a summary of how
+many counts were annotated and unannotated, and why. For example:
+</para>
 
-<para>To do this, you just need to assemble your
-<computeroutput>.s</computeroutput> files with assembly-level debug
-information.  You can use compile with the <option>-S</option> to compile C/C++
-programs to assembly code, and then assemble the assembly code files with
-<option>-g</option> to achieve this.  You can then profile and annotate the
-assembly code source files in the same way as C/C++ source files.</para>
+<programlisting><![CDATA[
+--------------------------------------------------------------------------------
+-- Annotation summary
+--------------------------------------------------------------------------------
+Ir_______________ 
+
+3,534,817 (43.1%)    annotated: files known & above threshold & readable, line numbers known
+        0            annotated: files known & above threshold & readable, line numbers unknown
+        0          unannotated: files known & above threshold & two or more non-identical
+4,132,126 (50.4%)  unannotated: files known & above threshold & unreadable 
+   59,950  (0.7%)  unannotated: files known & below threshold
+  468,163  (5.7%)  unannotated: files unknown
+]]></programlisting>
 
 </sect2>
 
+
 <sect2 id="cg-manual.forkingprograms" xreflabel="Forking Programs">
 <title>Forking Programs</title>
-<para>If your program forks, the child will inherit all the profiling data that
-has been gathered for the parent.</para>
-
-<para>If the output file format string (controlled by
-<option>--cachegrind-out-file</option>) does not contain <option>%p</option>,
-then the outputs from the parent and child will be intermingled in a single
-output file, which will almost certainly make it unreadable by
-cg_annotate.</para>
+
+<para>
+If your program forks, the child will inherit all the profiling data that
+has been gathered for the parent.
+</para>
+
+<para>
+If the output file name (controlled by <option>--cachegrind-out-file</option>)
+does not contain <option>%p</option>, then the outputs from the parent and
+child will be intermingled in a single output file, which will almost certainly
+make it unreadable by cg_annotate.
+</para>
+
 </sect2>
 
 
 <sect2 id="cg-manual.annopts.warnings" xreflabel="cg_annotate Warnings">
 <title>cg_annotate Warnings</title>
 
-<para>There are a couple of situations in which
-cg_annotate issues warnings.</para>
+<para>
+There are two situations in which cg_annotate prints warnings.
+</para>
 
 <itemizedlist>
   <listitem>
-    <para>If a source file is more recent than the
-    <filename>cachegrind.out.&lt;pid&gt;</filename> file.
-    This is because the information in
-    <filename>cachegrind.out.&lt;pid&gt;</filename> is only
-    recorded with line numbers, so if the line numbers change at
-    all in the source (e.g.  lines added, deleted, swapped), any
-    annotations will be incorrect.</para>
+    <para>
+    If a source file is more recent than the Cachegrind output file. This is
+    because the information in the Cachegrind output file is only recorded with
+    line numbers, so if the line numbers change at all in the source (e.g.
+    lines added, deleted, swapped), any annotations will be incorrect.
+    </para>
   </listitem>
   <listitem>
-    <para>If information is recorded about line numbers past the
-    end of a file.  This can be caused by the above problem,
-    i.e. shortening the source file while using an old
-    <filename>cachegrind.out.&lt;pid&gt;</filename> file.  If
-    this happens, the figures for the bogus lines are printed
-    anyway (clearly marked as bogus) in case they are
-    important.</para>
+    <para>
+    If information is recorded about line numbers past the end of a file. This
+    can be caused by the above problem, e.g. shortening the source file while
+    using an old Cachegrind output file. If this happens, the figures for the
+    bogus lines are printed anyway (and clearly marked as bogus) in case they
+    are important.
+    </para>
   </listitem>
 </itemizedlist>
 
 </sect2>
 
 
+<sect2 id="cg-manual.cg_merge" xreflabel="cg_merge">
+<title>Merging Cachegrind Output Files</title>
 
-<sect2 id="cg-manual.annopts.things-to-watch-out-for"
-       xreflabel="Unusual Annotation Cases">
-<title>Unusual Annotation Cases</title>
+<para>
+cg_annotate can merge data from multiple Cachegrind output files in a single
+run. (There is also a program called cg_merge that can merge multiple
+Cachegrind output files into a single Cachegrind output file, but it is now
+deprecated because cg_annotate's merging does a better job.)
+</para>
 
-<para>Some odd things that can occur during annotation:</para>
+<para>
+Use it as follows:
+</para>
 
-<itemizedlist>
-  <listitem>
-    <para>If annotating at the assembler level, you might see
-    something like this:</para>
 <programlisting><![CDATA[
-      1    0    0  .    .    .  .    .    .          leal -12(%ebp),%eax
-      1    0    0  .    .    .  1    0    0          movl %eax,84(%ebx)
-      2    0    0  0    0    0  1    0    0          movl $1,-20(%ebp)
-      .    .    .  .    .    .  .    .    .          .align 4,0x90
-      1    0    0  .    .    .  .    .    .          movl $.LnrB,%eax
-      1    0    0  .    .    .  1    0    0          movl %eax,-16(%ebp)]]></programlisting>
-
-    <para>How can the third instruction be executed twice when
-    the others are executed only once?  As it turns out, it
-    isn't.  Here's a dump of the executable, using
-    <computeroutput>objdump -d</computeroutput>:</para>
-<programlisting><![CDATA[
-      8048f25:       8d 45 f4                lea    0xfffffff4(%ebp),%eax
-      8048f28:       89 43 54                mov    %eax,0x54(%ebx)
-      8048f2b:       c7 45 ec 01 00 00 00    movl   $0x1,0xffffffec(%ebp)
-      8048f32:       89 f6                   mov    %esi,%esi
-      8048f34:       b8 08 8b 07 08          mov    $0x8078b08,%eax
-      8048f39:       89 45 f0                mov    %eax,0xfffffff0(%ebp)]]></programlisting>
-
-    <para>Notice the extra <computeroutput>mov
-    %esi,%esi</computeroutput> instruction.  Where did this come
-    from?  The GNU assembler inserted it to serve as the two
-    bytes of padding needed to align the <computeroutput>movl
-    $.LnrB,%eax</computeroutput> instruction on a four-byte
-    boundary, but pretended it didn't exist when adding debug
-    information.  Thus when Valgrind reads the debug info it
-    thinks that the <computeroutput>movl
-    $0x1,0xffffffec(%ebp)</computeroutput> instruction covers the
-    address range 0x8048f2b--0x804833 by itself, and attributes
-    the counts for the <computeroutput>mov
-    %esi,%esi</computeroutput> to it.</para>
-  </listitem>
-
-  <!--
-  I think this isn't true any more, not since cost centres were moved from
-  being associated with instruction addresses to being associated with
-  source line numbers.
-  <listitem>
-    <para>Inlined functions can cause strange results in the
-    function-by-function summary.  If a function
-    <computeroutput>inline_me()</computeroutput> is defined in
-    <filename>foo.h</filename> and inlined in the functions
-    <computeroutput>f1()</computeroutput>,
-    <computeroutput>f2()</computeroutput> and
-    <computeroutput>f3()</computeroutput> in
-    <filename>bar.c</filename>, there will not be a
-    <computeroutput>foo.h:inline_me()</computeroutput> function
-    entry.  Instead, there will be separate function entries for
-    each inlining site, i.e.
-    <computeroutput>foo.h:f1()</computeroutput>,
-    <computeroutput>foo.h:f2()</computeroutput> and
-    <computeroutput>foo.h:f3()</computeroutput>.  To find the
-    total counts for
-    <computeroutput>foo.h:inline_me()</computeroutput>, add up
-    the counts from each entry.</para>
-
-    <para>The reason for this is that although the debug info
-    output by GCC indicates the switch from
-    <filename>bar.c</filename> to <filename>foo.h</filename>, it
-    doesn't indicate the name of the function in
-    <filename>foo.h</filename>, so Valgrind keeps using the old
-    one.</para>
-  </listitem>
-  -->
-
-  <listitem>
-    <para>Sometimes, the same filename might be represented with
-    a relative name and with an absolute name in different parts
-    of the debug info, eg:
-    <filename>/home/user/proj/proj.h</filename> and
-    <filename>../proj.h</filename>.  In this case, if you use
-    auto-annotation, the file will be annotated twice with the
-    counts split between the two.</para>
-  </listitem>
-
-  <listitem>
-    <para>If you compile some files with
-    <option>-g</option> and some without, some
-    events that take place in a file without debug info could be
-    attributed to the last line of a file with debug info
-    (whichever one gets placed before the non-debug-info file in
-    the executable).</para>
-  </listitem>
+cg_annotate file1 file2 file3 ...
+]]></programlisting>
 
-</itemizedlist>
+<para>
+cg_annotate computes the sum of these files (effectively
+<filename>file1</filename> + <filename>file2</filename> +
+<filename>file3</filename>), and then produces output as usual that shows the
+summed counts.
+</para>
 
-<para>These cases should be rare.</para>
+<para>
+The most common merging scenario is if you want to aggregate costs over
+multiple runs of the same program, possibly on different inputs.
+</para>
 
 </sect2>
 
 
-<sect2 id="cg-manual.cg_merge" xreflabel="cg_merge">
-<title>Merging Profiles with cg_merge</title>
+<sect2 id="cg-manual.cg_diff" xreflabel="cg_diff">
+<title>Differencing Cachegrind output files</title>
 
 <para>
-cg_merge is a simple program which
-reads multiple profile files, as created by Cachegrind, merges them
-together, and writes the results into another file in the same format.
-You can then examine the merged results using
-<computeroutput>cg_annotate &lt;filename&gt;</computeroutput>, as
-described above.  The merging functionality might be useful if you
-want to aggregate costs over multiple runs of the same program, or
-from a single parallel run with multiple instances of the same
-program.</para>
+cg_annotate can diff data from two Cachegrind output files in a single run.
+(There is also a program called cg_diff that can diff two Cachegrind output
+files into a single Cachegrind output file, but it is now deprecated because
+cg_annotate's differencing does a better job.)
+</para>
 
 <para>
-cg_merge is invoked as follows:
+Use it as follows:
 </para>
 
 <programlisting><![CDATA[
-cg_merge -o outputfile file1 file2 file3 ...]]></programlisting>
+cg_annotate --diff file1 file2
+]]></programlisting>
 
 <para>
-It reads and checks <computeroutput>file1</computeroutput>, then read
-and checks <computeroutput>file2</computeroutput> and merges it into
-the running totals, then the same with
-<computeroutput>file3</computeroutput>, etc.  The final results are
-written to <computeroutput>outputfile</computeroutput>, or to standard
-out if no output file is specified.</para>
+cg_annotate computes the difference between these two files (effectively
+<filename>file2</filename> - <filename>file1</filename>), and then
+produces output as usual that shows the count differences. Note that many of
+the counts may be negative; this indicates that the counts for the relevant
+file/function/line are smaller in the second version than those in the first
+version.
+</para>
 
 <para>
-Costs are summed on a per-function, per-line and per-instruction
-basis.  Because of this, the order in which the input files does not
-matter, although you should take care to only mention each file once,
-since any file mentioned twice will be added in twice.</para>
+The simplest common scenario is comparing two Cachegrind output files that came
+from the same program, but on different inputs. cg_annotate will do a good job
+on this without assistance.
+</para>
 
 <para>
-cg_merge does not attempt to check
-that the input files come from runs of the same executable.  It will
-happily merge together profile files from completely unrelated
-programs.  It does however check that the
-<computeroutput>Events:</computeroutput> lines of all the inputs are
-identical, so as to ensure that the addition of costs makes sense.
-For example, it would be nonsensical for it to add a number indicating
-D1 read references to a number from a different file indicating LL
-write misses.</para>
+A more complex scenario is if you want to compare Cachegrind output files from
+two slightly different versions of a program that you have sitting
+side-by-side, running on the same input. For example, you might have
+<filename>version1/prog.c</filename> and <filename>version2/prog.c</filename>.
+A straight comparison of the two would not be useful. Because functions are
+always paired with filenames, a function <function>f</function> would be listed
+as <filename>version1/prog.c:f</filename> for the first version but
+<filename>version2/prog.c:f</filename> for the second version.
+</para>
 
 <para>
-A number of other syntax and sanity checks are done whilst reading the
-inputs.  cg_merge will stop and
-attempt to print a helpful error message if any of the input files
-fail these checks.</para>
-
-</sect2>
-
-
-<sect2 id="cg-manual.cg_diff" xreflabel="cg_diff">
-<title>Differencing Profiles with cg_diff</title>
+In this case, use the <option>--mod-filename</option> option. Its argument is a
+search-and-replace expression that will be applied to all the filenames in both
+Cachegrind output files.  It can be used to remove minor differences in
+filenames. For example, the option
+<option>--mod-filename='s/version[0-9]/versionN/'</option> will suffice for the
+above example.
+</para>
 
 <para>
-cg_diff is a simple program which
-reads two profile files, as created by Cachegrind, finds the difference
-between them, and writes the results into another file in the same format.
-You can then examine the merged results using
-<computeroutput>cg_annotate &lt;filename&gt;</computeroutput>, as
-described above.  This is very useful if you want to measure how a change to
-a program affected its performance.
+Similarly, sometimes compilers auto-generate certain functions and give them
+randomized names like <function>T.1234</function> where the suffixes vary from
+build to build. You can use the <option>--mod-funcname</option> option to
+remove small differences like these; it works in the same way as
+<option>--mod-filename</option>.
 </para>
 
 <para>
-cg_diff is invoked as follows:
+When <option>--mod-filename</option> is used to compare two different versions
+of the same program, cg_annotate will not annotate any file that is different
+between the two versions, because the per-line counts are not reliable in such
+a case. For example, imagine if <filename>version2/prog.c</filename> is the
+same as <filename>version1/prog.c</filename> except with an extra blank line at
+the top of the file. Every single per-line count will have changed. In
+comparison, the per-file and per-function counts have not changed, and are
+still very useful for determining differences between programs. You might think
+that this means every interesting file will be left unannotated, but again
+inlining means that files that are identical in the two versions can have
+different counts on many lines.
 </para>
 
-<programlisting><![CDATA[
-cg_diff file1 file2]]></programlisting>
 
-<para>
-It reads and checks <computeroutput>file1</computeroutput>, then read
-and checks <computeroutput>file2</computeroutput>, then computes the
-difference (effectively <computeroutput>file1</computeroutput> -
-<computeroutput>file2</computeroutput>).  The final results are written to
-standard output.</para>
+</sect2>
 
-<para>
-Costs are summed on a per-function basis.  Per-line costs are not summed,
-because doing so is too difficult.  For example, consider differencing two
-profiles, one from a single-file program A, and one from the same program A
-where a single blank line was inserted at the top of the file.  Every single
-per-line count has changed.  In comparison, the per-function counts have not
-changed.  The per-function count differences are still very useful for
-determining differences between programs.  Note that because the result is
-the difference of two profiles, many of the counts will be negative;  this
-indicates that the counts for the relevant function are fewer in the second
-version than those in the first version.</para>
+<sect2 id="cg-manual.cache-branch-sim" xreflabel="cache-branch-sim">
+<title>Cache and Branch Simulation</title>
 
 <para>
-cg_diff does not attempt to check
-that the input files come from runs of the same executable.  It will
-happily merge together profile files from completely unrelated
-programs.  It does however check that the
-<computeroutput>Events:</computeroutput> lines of all the inputs are
-identical, so as to ensure that the addition of costs makes sense.
-For example, it would be nonsensical for it to add a number indicating
-D1 read references to a number from a different file indicating LL
-write misses.</para>
+Cachegrind can simulate how your program interacts with a machine's cache
+hierarchy and/or branch predictor.
+
+The cache simulation models a machine with independent first-level instruction
+and data caches (I1 and D1), backed by a unified second-level cache (L2). For
+these machines (in the cases where Cachegrind can auto-detect the cache
+configuration) Cachegrind simulates the first-level and last-level caches.
+Therefore, Cachegrind always refers to the I1, D1 and LL (last-level) caches.
+</para>
 
 <para>
-A number of other syntax and sanity checks are done whilst reading the
-inputs.  cg_diff will stop and
-attempt to print a helpful error message if any of the input files
-fail these checks.</para>
+When simulating the cache, with <option>--cache-sim=yes</option>, Cachegrind
+gathers the following statistics:
+</para>
+
+<itemizedlist>
+  <listitem>
+    <para>
+    I cache reads (<computeroutput>Ir</computeroutput>, which equals the number
+    of instructions executed), I1 cache read misses
+    (<computeroutput>I1mr</computeroutput>) and LL cache instruction read
+    misses (<computeroutput>ILmr</computeroutput>).
+    </para>
+  </listitem>
+  <listitem>
+    <para>
+    D cache reads (<computeroutput>Dr</computeroutput>, which equals the number
+    of memory reads), D1 cache read misses
+    (<computeroutput>D1mr</computeroutput>), and LL cache data read misses
+    (<computeroutput>DLmr</computeroutput>).
+    </para>
+  </listitem>
+  <listitem>
+    <para>
+    D cache writes (<computeroutput>Dw</computeroutput>, which equals the
+    number of memory writes), D1 cache write misses
+    (<computeroutput>D1mw</computeroutput>), and LL cache data write misses
+    (<computeroutput>DLmw</computeroutput>).
+    </para>
+  </listitem>
+</itemizedlist>
 
 <para>
-Sometimes you will want to compare Cachegrind profiles of two versions of a
-program that you have sitting side-by-side.  For example, you might have
-<computeroutput>version1/prog.c</computeroutput> and
-<computeroutput>version2/prog.c</computeroutput>, where the second is
-slightly different to the first.  A straight comparison of the two will not
-be useful -- because functions are qualified with filenames, a function
-<function>f</function> will be listed as
-<computeroutput>version1/prog.c:f</computeroutput> for the first version but
-<computeroutput>version2/prog.c:f</computeroutput> for the second
-version.</para>
+Note that D1 total accesses is given by <computeroutput>D1mr</computeroutput> +
+<computeroutput>D1mw</computeroutput>, and that LL total accesses is given by
+<computeroutput>ILmr</computeroutput> + <computeroutput>DLmr</computeroutput> +
+<computeroutput>DLmw</computeroutput>.
+</para>
 
 <para>
-When this happens, you can use the <option>--mod-filename</option> option.
-Its argument is a Perl search-and-replace expression that will be applied
-to all the filenames in both Cachegrind output files.  It can be used to
-remove minor differences in filenames.  For example, the option
-<option>--mod-filename='s/version[0-9]/versionN/'</option> will suffice for
-this case.</para>
+When simulating the branch predictor, with <option>--branch-sim=yes</option>,
+Cachegrind gathers the following statistics:
+</para>
+
+<itemizedlist>
+  <listitem>
+    <para>
+    Conditional branches executed (<computeroutput>Bc</computeroutput>) and
+    conditional branches mispredicted (<computeroutput>Bcm</computeroutput>).
+    </para>
+  </listitem>
+  <listitem>
+    <para>
+    Indirect branches executed (<computeroutput>Bi</computeroutput>) and
+    indirect branches mispredicted (<computeroutput>Bim</computeroutput>).
+    </para>
+  </listitem>
+</itemizedlist>
 
 <para>
-Similarly, sometimes compilers auto-generate certain functions and give them
-randomized names.  For example, GCC sometimes auto-generates functions with
-names like <function>T.1234</function>, and the suffixes vary from build to
-build.  You can use the <option>--mod-funcname</option> option to remove
-small differences like these;  it works in the same way as
-<option>--mod-filename</option>.</para>
+When cache and/or branch simulation is enabled, cg_annotate will print multiple
+counts per line of output. For example:
+</para>
 
-</sect2>
+<programlisting><![CDATA[
+  Ir______________________ Bc____________________ Bcm__________________ Bi____________________ Bim______________  function:file
 
+>     8,547  (0.1%, 99.4%)     936  (0.1%, 99.1%)    177  (0.3%, 96.7%)      59  (0.0%, 99.9%) 38 (19.4%, 66.3%)  strcmp:
+      8,503  (0.1%)            928  (0.1%)           175  (0.3%)             59  (0.0%)        38 (19.4%)           ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+]]></programlisting>
 
-</sect1>
+</sect2>
 
+</sect1>
 
 
 <sect1 id="cg-manual.cgopts" xreflabel="Cachegrind Command-line Options">
 <title>Cachegrind Command-line Options</title>
 
 <!-- start of xi:include in the manpage -->
-<para>Cachegrind-specific options are:</para>
+<para>
+Cachegrind-specific options are:
+</para>
 
 <variablelist id="cg.opts.list">
 
-  <varlistentry id="cg.opt.I1" xreflabel="--I1">
+  <varlistentry id="opt.cachegrind-out-file" xreflabel="--cachegrind-out-file">
     <term>
-      <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
+      <option><![CDATA[--cachegrind-out-file=<file> ]]></option>
     </term>
     <listitem>
-      <para>Specify the size, associativity and line size of the level 1
-      instruction cache.  </para>
+      <para>
+      Write the Cachegrind output file to <filename>file</filename> rather than
+      to the default output file,
+      <filename>cachegrind.out.&lt;pid&gt;</filename>. The <option>%p</option>
+      and <option>%q</option> format specifiers can be used to embed the
+      process ID and/or the contents of an environment variable in the name, as
+      is the case for the core option
+      <option><link linkend="opt.log-file">--log-file</link></option>.
+      </para>
     </listitem>
   </varlistentry>
 
-  <varlistentry id="cg.opt.D1" xreflabel="--D1">
+  <varlistentry id="opt.cache-sim" xreflabel="--cache-sim">
     <term>
-      <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
+      <option><![CDATA[--cache-sim=no|yes [no] ]]></option>
     </term>
     <listitem>
-      <para>Specify the size, associativity and line size of the level 1
-      data cache.</para>
+      <para>
+      Enables or disables collection of cache access and miss counts.
+      </para>
     </listitem>
   </varlistentry>
 
-  <varlistentry id="cg.opt.LL" xreflabel="--LL">
+  <varlistentry id="opt.branch-sim" xreflabel="--branch-sim">
     <term>
-      <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
+      <option><![CDATA[--branch-sim=no|yes [no] ]]></option>
     </term>
     <listitem>
-      <para>Specify the size, associativity and line size of the last-level
-      cache.</para>
+      <para>
+      Enables or disables collection of branch instruction and
+      misprediction counts.
+      </para>
     </listitem>
   </varlistentry>
 
-  <varlistentry id="opt.cache-sim" xreflabel="--cache-sim">
+  <varlistentry id="cg.opt.I1" xreflabel="--I1">
     <term>
-      <option><![CDATA[--cache-sim=no|yes [yes] ]]></option>
+      <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
     </term>
     <listitem>
-      <para>Enables or disables collection of cache access and miss
-            counts.</para>
+      <para>
+      Specify the size, associativity and line size of the level 1 instruction
+      cache. Only useful with <option>--cache-sim=yes</option>.
+      </para>
     </listitem>
   </varlistentry>
 
-  <varlistentry id="opt.branch-sim" xreflabel="--branch-sim">
+  <varlistentry id="cg.opt.D1" xreflabel="--D1">
     <term>
-      <option><![CDATA[--branch-sim=no|yes [no] ]]></option>
+      <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
     </term>
     <listitem>
-      <para>Enables or disables collection of branch instruction and
-            misprediction counts.  By default this is disabled as it
-            slows Cachegrind down by approximately 25%.  Note that you
-            cannot specify <option>--cache-sim=no</option>
-            and <option>--branch-sim=no</option>
-            together, as that would leave Cachegrind with no
-            information to collect.</para>
+      <para>
+      Specify the size, associativity and line size of the level 1 data cache.
+      Only useful with <option>--cache-sim=yes</option>.
+      </para>
     </listitem>
   </varlistentry>
 
-  <varlistentry id="opt.cachegrind-out-file" xreflabel="--cachegrind-out-file">
+  <varlistentry id="cg.opt.LL" xreflabel="--LL">
     <term>
-      <option><![CDATA[--cachegrind-out-file=<file> ]]></option>
+      <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
     </term>
     <listitem>
-      <para>Write the profile data to 
-            <computeroutput>file</computeroutput> rather than to the default
-            output file,
-            <filename>cachegrind.out.&lt;pid&gt;</filename>.  The
-            <option>%p</option> and <option>%q</option> format specifiers
-            can be used to embed the process ID and/or the contents of an
-            environment variable in the name, as is the case for the core
-            option <option><link linkend="opt.log-file">--log-file</link></option>.
+      <para>
+      Specify the size, associativity and line size of the last-level cache.
+      Only useful with <option>--cache-sim=yes</option>.
       </para>
     </listitem>
   </varlistentry>
@@ -895,94 +945,114 @@ small differences like these;  it works in the same way as
 
   <varlistentry>
     <term>
-      <option><![CDATA[--show=A,B,C [default: all, using order in
-      cachegrind.out.<pid>] ]]></option>
+      <option><![CDATA[--diff ]]></option>
     </term>
     <listitem>
-      <para>Specifies which events to show (and the column
-      order). Default is to use all present in the
-      <filename>cachegrind.out.&lt;pid&gt;</filename> file (and
-      use the order in the file).  Useful if you want to concentrate on, for
-      example, I cache misses (<option>--show=I1mr,ILmr</option>), or data
-      read misses (<option>--show=D1mr,DLmr</option>), or LL data misses
-      (<option>--show=DLmr,DLmw</option>).  Best used in conjunction with
-      <option>--sort</option>.</para>
+      <para>Diff two Cachegrind output files.</para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--sort=A,B,C [default: order in
-      cachegrind.out.<pid>] ]]></option>
+      <option><![CDATA[--mod-filename <regex> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Specifies the events upon which the sorting of the
-      function-by-function entries will be based.</para>
+      <para>
+      Specifies an <option>s/old/new/</option> search-and-replace expression
+      that is applied to all filenames. Useful when differencing, for removing
+      minor differences in paths between two different versions of a program
+      that are sitting in different directories. An <option>i</option> suffix
+      makes the regex case-insensitive, and a <option>g</option> suffix makes
+      it match multiple times.
+      </para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--threshold=X [default: 0.1%] ]]></option>
+      <option><![CDATA[--mod-funcname <regex> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Sets the threshold for the function-by-function
-      summary.  A function is shown if it accounts for more than X%
-      of the counts for the primary sort event.  If auto-annotating, also
-      affects which files are annotated.</para>
-        
-      <para>Note: thresholds can be set for more than one of the
-      events by appending any events for the
-      <option>--sort</option> option with a colon
-      and a number (no spaces, though).  E.g. if you want to see
-      each function that covers more than 1% of LL read misses or 1% of LL
-      write misses, use this option:</para>
-      <para><option>--sort=DLmr:1,DLmw:1</option></para>
+      <para>
+      Like <option>--mod-filename</option>, but for filenames. Useful for
+      removing minor differences in randomized names of auto-generated
+      functions generated by some compilers.
+      </para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--show-percs, --no-show-percs, --show-percs=<no|yes> [default: yes] ]]></option>
+      <option><![CDATA[--show=A,B,C [default: all, using order in
+      the Cachegrind output file] ]]></option>
     </term>
     <listitem>
-      <para>When enabled, a percentage is printed next to all event counts.
-      This helps gauge the relative importance of each function and line.
+      <para>
+      Specifies which events to show (and the column order). Default is to use
+      all present in the Cachegrind output file (and use the order in the
+      file). Best used in conjunction with <option>--sort</option>.
       </para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--auto, --no-auto, --auto=<no|yes> [default: yes] ]]></option>
+      <option><![CDATA[--sort=A,B,C [default: order in the Cachegrind output file] ]]></option>
     </term>
     <listitem>
-      <para>When enabled, automatically annotates every file that
-      is mentioned in the function-by-function summary that can be
-      found.  Also gives a list of those that couldn't be found.</para>
+      <para>
+      Specifies the events upon which the sorting of the file:function and
+      function:file entries will be based.
+      </para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[--context=N [default: 8] ]]></option>
+      <option><![CDATA[--threshold=X [default: 0.1%] ]]></option>
+    </term>
+    <listitem>
+      <para>
+      Sets the significance threshold for the file:function and function:files
+      sections. A file or function is shown if it accounts for more than X% of
+      the counts for the primary sort event.  If annotating source files, this
+      also affects which files are annotated.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--show-percs, --no-show-percs, --show-percs=<no|yes> [default: yes] ]]></option>
+    </term>
+    <listitem>
+      <para>
+      When enabled, a percentage is printed next to all event counts. This
+      helps gauge the relative importance of each function and line.
+      </para>
+    </listitem>
+  </varlistentry>
+
+  <varlistentry>
+    <term>
+      <option><![CDATA[--annotate, --no-annotate, --auto=<no|yes> [default: yes] ]]></option>
     </term>
     <listitem>
-      <para>Print N lines of context before and after each
-      annotated line.  Avoids printing large sections of source
-      files that were not executed.  Use a large number
-      (e.g. 100000) to show all source lines.</para>
+      <para>
+      Enables or disables source file annotation.
+      </para>
     </listitem>
   </varlistentry>
 
   <varlistentry>
     <term>
-      <option><![CDATA[-I<dir> --include=<dir> [default: none] ]]></option>
+      <option><![CDATA[--context=N [default: 8] ]]></option>
     </term>
     <listitem>
-      <para>Adds a directory to the list in which to search for
-      files.  Multiple <option>-I</option>/<option>--include</option>
-      options can be given to add multiple directories.</para>
+      <para>
+      The number of lines of context to show before and after each annotated
+      line. Use a large number (e.g. 100000) to show all source lines.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -995,6 +1065,8 @@ small differences like these;  it works in the same way as
 <sect1 id="cg-manual.mergeopts" xreflabel="cg_merge Command-line Options">
 <title>cg_merge Command-line Options</title>
 
+Although cg_merge is deprecated, its options are listed here for completeness.
+
 <!-- start of xi:include in the manpage -->
 <variablelist id="cg_merge.opts.list">
 
@@ -1003,8 +1075,9 @@ small differences like these;  it works in the same way as
       <option><![CDATA[-o outfile]]></option>
     </term>
     <listitem>
-      <para>Write the profile data to <computeroutput>outfile</computeroutput>
-            rather than to standard output.
+      <para>
+      Write the output to to <computeroutput>outfile</computeroutput>
+      instead of standard output.
       </para>
     </listitem>
   </varlistentry>
@@ -1018,6 +1091,8 @@ small differences like these;  it works in the same way as
 <sect1 id="cg-manual.diffopts" xreflabel="cg_diff Command-line Options">
 <title>cg_diff Command-line Options</title>
 
+Although cg_diff is deprecated, its options are listed here for completeness.
+
 <!-- start of xi:include in the manpage -->
 <variablelist id="cg_diff.opts.list">
 
@@ -1044,10 +1119,10 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--mod-filename=<expr> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Specifies a Perl search-and-replace expression that is applied
-      to all filenames.  Useful for removing minor differences in paths
-      between two different versions of a program that are sitting in
-      different directories.</para>
+      <para>
+      Specifies an <option>s/old/new/</option> search-and-replace expression
+      that is applied to all filenames.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -1056,9 +1131,9 @@ small differences like these;  it works in the same way as
       <option><![CDATA[--mod-funcname=<expr> [default: none]]]></option>
     </term>
     <listitem>
-      <para>Like <option>--mod-filename</option>, but for filenames.
-      Useful for removing minor differences in randomized names of
-      auto-generated functions generated by some compilers.</para>
+      <para>
+      Like <option>--mod-filename</option>, but for filenames.
+      </para>
     </listitem>
   </varlistentry>
 
@@ -1068,99 +1143,6 @@ small differences like these;  it works in the same way as
 </sect1>
 
 
-
-
-<sect1 id="cg-manual.acting-on"
-       xreflabel="Acting on Cachegrind's Information">
-<title>Acting on Cachegrind's Information</title>
-<para>
-Cachegrind gives you lots of information, but acting on that information
-isn't always easy.  Here are some rules of thumb that we have found to be
-useful.</para>
-
-<para>
-First of all, the global hit/miss counts and miss rates are not that useful.
-If you have multiple programs or multiple runs of a program, comparing the
-numbers might identify if any are outliers and worthy of closer
-investigation.  Otherwise, they're not enough to act on.</para>
-
-<para>
-The function-by-function counts are more useful to look at, as they pinpoint
-which functions are causing large numbers of counts.  However, beware that
-inlining can make these counts misleading.  If a function
-<function>f</function> is always inlined, counts will be attributed to the
-functions it is inlined into, rather than itself.  However, if you look at
-the line-by-line annotations for <function>f</function> you'll see the
-counts that belong to <function>f</function>.  (This is hard to avoid, it's
-how the debug info is structured.)  So it's worth looking for large numbers
-in the line-by-line annotations.</para>
-
-<para>
-The line-by-line source code annotations are much more useful.  In our
-experience, the best place to start is by looking at the
-<computeroutput>Ir</computeroutput> numbers.  They simply measure how many
-instructions were executed for each line, and don't include any cache
-information, but they can still be very useful for identifying
-bottlenecks.</para>
-
-<para>
-After that, we have found that LL misses are typically a much bigger source
-of slow-downs than L1 misses.  So it's worth looking for any snippets of
-code with high <computeroutput>DLmr</computeroutput> or
-<computeroutput>DLmw</computeroutput> counts.  (You can use
-<option>--show=DLmr
---sort=DLmr</option> with cg_annotate to focus just on
-<literal>DLmr</literal> counts, for example.) If you find any, it's still
-not always easy to work out how to improve things.  You need to have a
-reasonable understanding of how caches work, the principles of locality, and
-your program's data access patterns.  Improving things may require
-redesigning a data structure, for example.</para>
-
-<para>
-Looking at the <computeroutput>Bcm</computeroutput> and
-<computeroutput>Bim</computeroutput> misses can also be helpful.
-In particular, <computeroutput>Bim</computeroutput> misses are often caused
-by <literal>switch</literal> statements, and in some cases these
-<literal>switch</literal> statements can be replaced with table-driven code.
-For example, you might replace code like this:</para>
-
-<programlisting><![CDATA[
-enum E { A, B, C };
-enum E e;
-int i;
-...
-switch (e)
-{
-    case A: i += 1; break;
-    case B: i += 2; break;
-    case C: i += 3; break;
-}
-]]></programlisting>
-
-<para>with code like this:</para>
-
-<programlisting><![CDATA[
-enum E { A, B, C };
-enum E e;
-int table[] = { 1, 2, 3 };
-int i;
-...
-i += table[e];
-]]></programlisting>
-
-<para>
-This is obviously a contrived example, but the basic principle applies in a
-wide variety of situations.</para>
-
-<para>
-In short, Cachegrind can tell you where some of the bottlenecks in your code
-are, but it can't tell you how to fix them.  You have to work that out for
-yourself.  But at least you have the information!
-</para>
-
-</sect1>
-
-
 <sect1 id="cg-manual.sim-details"
        xreflabel="Simulation Details">
 <title>Simulation Details</title>
@@ -1172,8 +1154,9 @@ use Cachegrind, but may be of interest to some people.
 <sect2 id="cache-sim" xreflabel="Cache Simulation Specifics">
 <title>Cache Simulation Specifics</title>
 
-<para>Specific characteristics of the cache simulation are as
-follows:</para>
+<para>
+The cache simulation approximates the hardware of an AMD Athlon CPU circa 2002.
+Its specific characteristics are as follows:</para>
 
 <itemizedlist>
 
@@ -1271,11 +1254,11 @@ need to specify it with the
 
 </itemizedlist>
 
-<para>If you are interested in simulating a cache with different
-properties, it is not particularly hard to write your own cache
-simulator, or to modify the existing ones in
-<computeroutput>cg_sim.c</computeroutput>. We'd be
-interested to hear from anyone who does.</para>
+<para>
+If you are interested in simulating a cache with different properties, it is
+not particularly hard to write your own cache simulator, or to modify the
+existing ones in <computeroutput>cg_sim.c</computeroutput>.
+</para>
 
 </sect2>
 
@@ -1324,19 +1307,38 @@ Architecture: A Quantitative Approach", 4th edition (2007), Section
 <sect2 id="cg-manual.annopts.accuracy" xreflabel="Accuracy">
 <title>Accuracy</title>
 
-<para>Valgrind's cache profiling has a number of
-shortcomings:</para>
+<para>
+Cachegrind's instruction counting has one shortcoming on x86/amd64:
+</para>
+
+<itemizedlist>
+  <listitem>
+    <para>
+    When a <function>REP</function>-prefixed instruction executes each
+    iteration is counted separately. In contrast, hardware counters count each
+    such instruction just once, no matter how many times it iterates. It is
+    arguable that Cachegrind's behaviour is more useful.
+    </para>
+  </listitem>
+</itemizedlist>
+
+<para>
+Cachegrind's cache profiling has a number of shortcomings:
+</para>
 
 <itemizedlist>
   <listitem>
-    <para>It doesn't account for kernel activity -- the effect of system
-    calls on the cache and branch predictor contents is ignored.</para>
+    <para>
+    It doesn't account for kernel activity. The effect of system calls on the
+    cache and branch predictor contents is ignored.
+    </para>
   </listitem>
 
   <listitem>
-    <para>It doesn't account for other process activity.
-    This is probably desirable when considering a single
-    program.</para>
+    <para>
+    It doesn't account for other process activity. This is arguably desirable
+    when considering a single program.
+    </para>
   </listitem>
 
   <listitem>
@@ -1360,15 +1362,15 @@ shortcomings:</para>
   </listitem>
 
   <listitem>
-    <para>The x86/amd64 instructions <computeroutput>bts</computeroutput>,
+    <para>
+    The x86/amd64 instructions <computeroutput>bts</computeroutput>,
     <computeroutput>btr</computeroutput> and
-    <computeroutput>btc</computeroutput> will incorrectly be
-    counted as doing a data read if both the arguments are
-    registers, eg:</para>
-<programlisting><![CDATA[
+    <computeroutput>btc</computeroutput> will incorrectly be counted as doing a
+    data read if both the arguments are registers, e.g.:
+    <programlisting><![CDATA[
     btsl %eax, %edx]]></programlisting>
-
-    <para>This should only happen rarely.</para>
+    This should only happen rarely.
+    </para>
   </listitem>
 
   <listitem>
@@ -1387,13 +1389,12 @@ file names, can perturb the results.  Variations will be small, but
 don't expect perfectly repeatable results if your program changes at
 all.</para>
 
-<para>More recent GNU/Linux distributions do address space
-randomisation, in which identical runs of the same program have their
-shared libraries loaded at different locations, as a security measure.
-This also perturbs the results.</para>
-
-<para>While these factors mean you shouldn't trust the results to
-be super-accurate, they should be close enough to be useful.</para>
+<para>
+Many Linux distributions perform address space layout randomisation (ASLR), in
+which identical runs of the same program have their shared libraries loaded at
+different locations, as a security measure. This also perturbs the
+results.
+</para>
 
 </sect2>
 
diff --git a/cachegrind/docs/cg_annotate-manpage.xml b/cachegrind/docs/cg_annotate-manpage.xml
index 5790eb0609..e4239c8a2b 100644
--- a/cachegrind/docs/cg_annotate-manpage.xml
+++ b/cachegrind/docs/cg_annotate-manpage.xml
@@ -30,8 +30,9 @@
 <refsect1 id="cg_annotate-description">
 <title>Description</title>
 
-<para><command>cg_annotate</command> takes an output file produced by the
-Valgrind tool Cachegrind and prints the information in an easy-to-read form.
+<para>
+<command>cg_annotate</command> takes one or more Cachegrind output files and
+prints data about the profiled program in an easy-to-read form.
 </para>
 
 </refsect1>
diff --git a/cachegrind/docs/cg_diff-manpage.xml b/cachegrind/docs/cg_diff-manpage.xml
index daffdfbbb0..fe14d14c60 100644
--- a/cachegrind/docs/cg_diff-manpage.xml
+++ b/cachegrind/docs/cg_diff-manpage.xml
@@ -14,7 +14,7 @@
 
 <refnamediv>
   <refname>cg_diff</refname>
-  <refpurpose>compares two Cachegrind output files</refpurpose>
+  <refpurpose>(deprecated) diffs two Cachegrind output files</refpurpose>
 </refnamediv>
 
 <refsynopsisdiv>
@@ -30,9 +30,10 @@
 <refsect1 id="cg_diff-description">
 <title>Description</title>
 
-<para><command>cg_diff</command> takes two output files produced by the
-Valgrind tool Cachegrind, computes the difference and prints the result
-in the same format that Cachegrinds outputs.
+<para>
+<command>cg_diff</command> diffs two Cachegrind output files into a single
+Cachegrind output file. It is deprecated because <command>cg_annotate</command>
+can now do much the same thing, but better.
 </para>
 
 </refsect1>
diff --git a/cachegrind/docs/cg_merge-manpage.xml b/cachegrind/docs/cg_merge-manpage.xml
index e4e97310e5..48aef4d775 100644
--- a/cachegrind/docs/cg_merge-manpage.xml
+++ b/cachegrind/docs/cg_merge-manpage.xml
@@ -14,7 +14,7 @@
 
 <refnamediv>
   <refname>cg_merge</refname>
-  <refpurpose>merges multiple Cachegrind output files into one</refpurpose>
+  <refpurpose>(deprecated) merges multiple Cachegrind output files into one</refpurpose>
 </refnamediv>
 
 <refsynopsisdiv>
@@ -29,8 +29,10 @@
 <refsect1 id="cg_merge-description">
 <title>Description</title>
 
-<para><command>cg_merge</command> sums together the outputs of multiple
-Cachegrind runs into a single output file.
+<para>
+<command>cg_merge</command> sums together multiple Cachegrind output files into
+a single Cachegrind output file. It is deprecated because
+<command>cg_annotate</command> can now do much the same thing, but better.
 </para>
 
 </refsect1>
diff --git a/cachegrind/docs/concord.c b/cachegrind/docs/concord.c
new file mode 100644
index 0000000000..7ebdbea654
--- /dev/null
+++ b/cachegrind/docs/concord.c
@@ -0,0 +1,532 @@
+/********************************************************************************
+**  Program: concord.c
+**  By: Nick Nethercote, 36448.  Any code taken from elsewhere as noted.
+**  For: 433-253 assignment 3.
+**  
+**  Program description:  This program is a tool for finding specific 
+**  occurrences of words in a text;  it can count the number of times a single
+**  word appears, or list the lines that a word, or multiple words, all appear
+**  on.  See the project specification for more detail.
+**  	The primary data structure used is a static hash table, of fixed size.
+**  Any collisions of words hashing to the same position in the table are
+**  dealt with via separate chaining.  Also, for each word, there is a 
+**  subsidiary linked list containing the line numbers that the word appears on.
+**  Thus there are linked lists within linked lists.
+**  	I have implemented the entire program within one file, partly because
+**  there isn't a great deal of code, and partly because I haven't yet done
+**  433-252, and thus don't know a great deal about .h files, makefiles, etc.
+*/
+
+#include <stdio.h>
+#include <ctype.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define TRUE 1
+#define FALSE 0
+#define MAX_WORD_LENGTH 100 
+#define DUMMY_WORD_LENGTH 2
+#define TABLE_SIZE 997 
+#define BEFORE_WORD 1
+#define IN_WORD 2
+#define AFTER_WORD 3
+#define HASH_CONSTANT 256
+#define ARGS_NUMBER 1
+
+typedef struct word_node Word_Node;
+typedef struct line_node Line_Node;
+typedef struct word_info Word_Info;
+typedef struct arg_node Arg_Node;
+
+/*  Linked list node for storing each word */
+struct word_node {
+    char *word;		    /* The actual word */
+    int number;		    /* The number of occurrences */	
+    Line_Node *line_list;   /* Points to the linked list of line numbers */
+    Line_Node *last_line;   /* Points to the last line node, for easy append */
+    Word_Node *next_word;   /* Next node in list */
+};
+
+/*  Subsidiary linked list node for storing line numbers */
+struct line_node {
+    int line;
+    Line_Node *next_line;
+};
+
+/*  Structure used when reading each word, and it line number, from file. */
+struct word_info {
+    char word[MAX_WORD_LENGTH];
+    int line;
+};
+
+/*  Linked list node used for holding multiple arguments from the program's
+**  internal command line.  Also, can point to a list of line numbers;  this
+**  is used when displaying line numbers.  
+*/ 
+struct arg_node {
+    char *word;
+    Line_Node *line_list;
+    Arg_Node *next_arg;
+};
+
+int        hash(char *word);
+void      *create(int mem_size);
+void       init_hash_table(char *file_name, Word_Node *table[]);
+int        get_word(Word_Info *data, int line, FILE *file_ptr);
+void       insert(char *inword, int in_line, Word_Node *table[]);
+Word_Node *new_word_node(char *inword, int in_line);
+Line_Node *add_existing(Line_Node *curr, int in_line);
+void       interact(Word_Node *table[]);
+Arg_Node  *place_args_in_list(char command[]);
+Arg_Node  *append(char *word, Arg_Node *head);
+void       count(Arg_Node *head, Word_Node *table[]);
+void       list_lines(Arg_Node *head, Word_Node *table[]);
+void       intersection(Arg_Node *head);
+void       intersect_array(int master[], int size, Arg_Node *arg_head);
+void       kill_arg_list(Arg_Node *head);
+
+int main(int argc, char *argv[])
+{
+    /* The actual hash table, a fixed-size array of pointers to word nodes */
+    Word_Node *table[TABLE_SIZE];
+
+    /* Checking command line input for one file name */
+    if (argc != ARGS_NUMBER + 1) {
+	fprintf(stderr, "%s requires %d argument\n", argv[0], ARGS_NUMBER); 
+	exit(EXIT_FAILURE);
+    }
+
+    init_hash_table(argv[1], table);
+    interact(table);
+
+    /* Nb:  I am not freeing the dynamic memory in the hash table, having been
+    ** told this is not necessary. */
+    return 0;
+}
+
+/* General dynamic allocation function that allocates and then checks. */
+void *create(int mem_size)
+{
+    void *dyn_block;
+
+    dyn_block = malloc(mem_size);
+    if (!(dyn_block)) {
+        fprintf(stderr, "Couldn't allocate enough memory to continue.\n");
+        exit(EXIT_FAILURE);
+    }
+
+    return dyn_block;
+}
+
+/* Function returns a hash value on a word.  Almost identical to the hash
+** function presented in Sedgewick.
+*/
+int hash(char *word)
+{
+    int hash_value = 0;
+
+    for ( ; *word; word++)
+        hash_value = (HASH_CONSTANT * hash_value + *word) % TABLE_SIZE;
+
+    return hash_value;
+}
+
+/* Function builds the hash table from the given file. */
+void init_hash_table(char *file_name, Word_Node *table[])
+{
+    FILE *file_ptr;
+    Word_Info *data;
+    int line = 1, i;
+
+    /* Structure used when reading in words and line numbers. */
+    data = (Word_Info *) create(sizeof(Word_Info));
+
+    /* Initialise entire table to NULL. */
+    for (i = 0; i < TABLE_SIZE; i++)
+        table[i] = NULL;
+
+    /* Open file, check it. */
+    file_ptr = fopen(file_name, "r");
+    if (!(file_ptr)) {
+        fprintf(stderr, "Couldn't open '%s'.\n", file_name);
+        exit(EXIT_FAILURE);
+    }
+
+    /*  'Get' the words and lines one at a time from the file, and insert them
+    ** into the table one at a time. */
+    while ((line = get_word(data, line, file_ptr)) != EOF)
+        insert(data->word, data->line, table);
+
+    free(data);
+    fclose(file_ptr);
+}
+
+/* Function reads the next word, and it's line number, and places them in the 
+** structure 'data', via a pointer.
+*/
+int get_word(Word_Info *data, int line, FILE *file_ptr)
+{
+    int index = 0, pos = BEFORE_WORD;
+
+    /* Only alphabetic characters are read, apostrophes are ignored, and other
+    ** characters are considered separators.  'pos' helps keep track whether
+    ** the current file position is inside a word or between words.
+    */
+    while ((data->word[index] = tolower(fgetc(file_ptr))) != EOF) {
+        if (data->word[index] == '\n')
+            line++;
+        if (islower(data->word[index])) {
+            if (pos == BEFORE_WORD) {
+                pos = IN_WORD;
+                data->line = line;
+            }
+            index++;
+        }
+        else if ((pos == IN_WORD) && (data->word[index] != '\'')) {
+            break;
+        }
+    }
+    /* Signals end of file has been reached. */
+    if (data->word[index] == EOF)
+        line = EOF;
+
+    /* Adding the null character. */
+    data->word[index] = '\0';
+
+    return line;
+}
+
+/* Function inserts a word and it's line number into the hash table. */
+void insert(char *inword, int in_line, Word_Node *table[])
+{
+    int position = hash(inword);
+    Word_Node *curr, *prev = NULL;
+    char dummy_word[DUMMY_WORD_LENGTH] = "A";
+
+    /* The case where that hash position hasn't been used before; a new word
+    ** node is created. 
+    */
+    if (table[position] == NULL)
+        table[position] = new_word_node(dummy_word, 0);
+    curr = table[position];
+
+    /* Traverses that position's list of words until the current word is found
+    ** (i.e. it's come up before) or the list end is reached (i.e. it's the
+    ** first occurrence of the word).
+    */
+    while ((curr != NULL) && (strcmp(inword, curr->word) > 0)) {
+        prev = curr;
+        curr = curr->next_word;
+    }
+
+    /* If the word hasn't appeared before, it's inserted alphabetically into
+    ** the list.
+    */
+    if ((curr == NULL) || (strcmp(curr->word, inword) != 0)) {
+        prev->next_word = new_word_node(inword, in_line);
+        prev->next_word->next_word = curr;
+    }
+    /* Otherwise, the word count is incremented, and the line number is added
+    ** to the existing list.
+    */
+    else {
+        (curr->number)++;
+        curr->last_line = add_existing(curr->last_line, in_line);
+    }
+}
+
+/* Function creates a new node for when a word is inserted for the first time.
+*/
+Word_Node *new_word_node(char *inword, int in_line)
+{
+    Word_Node *new;
+
+    new = (Word_Node *) create(sizeof(Word_Node));
+    new->word = (char *) create(sizeof(char) * (strlen(inword) + 1));
+    new->word = strcpy(new->word, inword);
+    /* The word count is set to 1, as this is the first occurrence! */
+    new->number = 1;
+    new->next_word = NULL;
+    /* One line number node is added. */
+    new->line_list = (Line_Node *) create(sizeof(Line_Node));
+    new->line_list->line = in_line;
+    new->line_list->next_line = NULL;
+    new->last_line = new->line_list;
+
+    return new;
+}
+
+/* Function adds a line number to the line number list of a word that has
+** already been inserted at least once.  The pointer 'last_line', part of
+** the word node structure, allows easy appending to the list.
+*/
+Line_Node *add_existing(Line_Node *last_line, int in_line)
+{
+    /* Check to see if that line has already occurred - multiple occurrences on
+    ** the one line are only recorded once.  (Nb:  They are counted twice, but
+    ** only listed once.)
+    */
+    if (last_line->line != in_line) {
+        last_line->next_line = (Line_Node *) create(sizeof(Line_Node));
+        last_line = last_line->next_line;
+        last_line->line = in_line;
+        last_line->next_line = NULL;
+    }
+
+    return last_line;
+}
+
+/*  Function controls the interactive command line part of the program. */
+void interact(Word_Node *table[])
+{
+    char args[MAX_WORD_LENGTH];     /* Array to hold command line */
+    Arg_Node *arg_list = NULL;      /* List that holds processed arguments */ 
+    int not_quitted = TRUE;         /* Quit flag */
+
+    /* The prompt (?) is displayed.  Commands are read into an array, and then
+    ** individual arguments are placed into a linked list for easy use. 
+    ** The first argument (actually the command) is looked at to determine
+    ** what action should be performed.  'arg_list->next_arg' is passed to
+    ** count() and list_lines(), because the actual 'c' or 'l' is not needed
+    ** by them.  Lastly, the argument linked list is freed, by 'kill_arg_list'.  
+    */ 
+    do {
+        printf("?");		     
+        fgets(args, MAX_WORD_LENGTH - 1, stdin);
+        arg_list = place_args_in_list(args);
+        if (arg_list) {
+            if (strcmp(arg_list->word, "c") == 0)
+		count(arg_list->next_arg, table);
+            else if (strcmp(arg_list->word, "l") == 0)
+               	list_lines(arg_list->next_arg, table); 
+            else if (strcmp(arg_list->word, "q") == 0) {
+               	printf("Quitting concord\n");
+		not_quitted = FALSE;
+	    }
+            else
+               	printf("Not a valid command.\n");
+	    kill_arg_list(arg_list);
+        }
+    } while (not_quitted);	/* Quits on flag */
+}
+
+/* Function takes an array containing a command line, and parses it, placing
+** actual word into a linked list.
+*/
+Arg_Node *place_args_in_list(char command[])
+{
+    int index1 = 0, index2 = 0, pos = BEFORE_WORD;
+    char token[MAX_WORD_LENGTH], c;
+    Arg_Node *head = NULL;
+
+    /* Non alphabetic characters are discarded.  Alphabetic characters are
+    ** copied into the array 'token'.  Once the current word has been copied
+    ** into 'token', 'append' is called, copying 'token' to a new node in the
+    ** linked list.
+    */
+    while (command[index1] != '\0') {
+        c = tolower(command[index1++]);
+        if (islower(c)) {
+            token[index2++] = c;
+            pos = IN_WORD;
+        }
+        else if (c == '\'')
+            token[index2] = c;
+        else if (pos == IN_WORD) {
+            pos = BEFORE_WORD;
+            token[index2] = '\0';
+            head = append(token, head);
+            index2 = 0;
+        }
+    }
+
+    return head;
+}
+
+/* Function takes a word, and appends a new node containing that word to the
+** list.
+*/
+Arg_Node *append(char *word, Arg_Node *head)
+{
+    Arg_Node *curr = head,
+             *new = (Arg_Node *) create(sizeof(Arg_Node));
+
+    new->word = (char *) create(sizeof(char) * (strlen(word) + 1));
+    strcpy(new->word, word);
+    new->line_list = NULL;
+    new->next_arg = NULL;
+
+    if (head == NULL)
+        return new;
+
+    while (curr->next_arg != NULL)
+        curr = curr->next_arg;
+    curr->next_arg = new;
+
+    return head;
+}
+
+
+/* Function displays the number of times a word has occurred. */
+void count(Arg_Node *arg_list, Word_Node *table[])
+{
+    int hash_pos = 0;		/* Only initialised to avoid gnuc warnings */
+    Word_Node *curr_word = NULL;  
+
+    /* Checking for the right number of arguments (one). */
+    if (arg_list) {
+        if (arg_list->next_arg != NULL) {
+	    printf("c requires only one argument\n");
+	    return;
+	}
+        hash_pos = hash(arg_list->word);
+    }
+    else
+	return;    
+
+    /* Finds if the supplied word is in table, firstly by hashing to it's 
+    ** would be position, and then traversing the list of words.  If present,
+    ** it's number is displayed, otherwise '0' is printed. 
+    */
+    if (table[hash_pos]) {
+	curr_word = table[hash_pos]->next_word;
+	while ((curr_word != NULL) &&
+	       (strcmp(arg_list->word, curr_word->word) != 0)) 
+	    curr_word = curr_word->next_word; 	
+        if (curr_word) 
+  	    printf("%d\n", curr_word->number);
+        else
+	    printf("0\n");
+    }
+    else
+	printf("0\n");
+}
+
+/* Function that takes each node in the argument list, and directs a pointer
+** to that word's list of lines, which are present in the hash table. 
+*/
+void list_lines(Arg_Node *arg_head, Word_Node *table[])
+{
+    int hash_pos = 0;		/* Only initialised to avoid gnuc warnings */
+    Word_Node *curr_word;
+    Arg_Node *curr_arg = arg_head;
+
+    /* For each word in the list of arguments, the word is looked for in the 
+    ** hash table.  Each argument node has a pointer, and if the word is there,
+    ** that pointer is set to point at that word's list of line numbers. 
+    */ 
+    while (curr_arg != NULL) {
+        hash_pos = hash(curr_arg->word);
+        if (table[hash_pos]) {
+            curr_word = table[hash_pos]->next_word;   /* Gets past dummy node */
+            while (curr_word != NULL && 
+		   strcmp(curr_arg->word, curr_word->word) != 0) 
+	        curr_word = curr_word->next_word;
+            if (curr_word) 
+	        curr_arg->line_list = curr_word->line_list;
+        }
+        curr_arg = curr_arg->next_arg;
+    }
+    /* An intersection is then performed, to determine which lines, if any, 
+    ** all the arguments appear on.
+    */
+    if (arg_head)
+        intersection(arg_head); 
+}
+
+/*  Function takes a list of line lists, and finds the lines that are common
+**  to each line list, by using a comparison array.
+*/
+void intersection(Arg_Node *arg_head)
+{
+    Line_Node *curr_line;
+    int *master, n = 0, index = 0, output = FALSE;
+ 
+    /* Find size of first list, for creating master array */
+    curr_line = arg_head->line_list;
+    while (curr_line) {
+        n++;
+        curr_line = curr_line->next_line;
+    }
+
+    /* The master comparison array is created. */ 
+    master = (int *) create(sizeof(int) * n);
+    curr_line = arg_head->line_list;
+ 
+    /*  Copy first list into master array */
+    while (curr_line) {
+        *(master + index++) = curr_line->line; 
+	curr_line = curr_line->next_line;
+    }
+
+    /* Perform the actual intersection. */
+    intersect_array(master, n, arg_head->next_arg);
+
+    /* Print the line numbers left in the processed array, those left contain
+    ** all the words specified in the command. 
+    */
+    for (index = 0; index < n; index++)
+	if (*(master + index) != 0) { 
+	    printf("%d ", *(master + index));
+	    output = TRUE; 
+	}
+    /* 'Output' merely prevents an unnecessary newline when 'l' returns no 
+    ** answer. 
+    */
+    if (output)
+        printf("\n");
+
+    /* Deallocate dynamic memory for master array */
+    free(master);
+}
+
+/* Function takes master array containing line numbers - these depend on the
+** first list of lines, and is done in 'list_lines'.  It then moves through the
+** argument list.  For each word, each line number in master is compared to each
+** line number in that word's line list.  If there is no match, then that 
+** position in the array is set to 0, because that line is no longer in 
+** contention as an answer.
+*/
+void intersect_array(int master[], int size, Arg_Node *arg_head)
+{
+    int index = 0;
+    Line_Node *curr_line;
+
+    while (arg_head) {
+        index = 0;
+        curr_line = arg_head->line_list;
+    /* For each line in the list, any number less than that in the array will
+    ** be set to zero.  Any number equal to that in the list will remain.
+    ** This loop depends on the fact that both the line list, and the master 
+    ** array, are sorted. */ 
+        while (curr_line) {
+            while (*(master + index) < curr_line->line && index < size)
+                *(master + index++) = 0;
+            while (*(master + index) <= curr_line->line && index < size)
+                index++;
+            curr_line = curr_line->next_line;
+        }
+    /* Once the list of lines has been traversed, any array positions that 
+    ** haven't been examined are set to zero, as they are no longer in 
+    ** contention. 
+    */
+        for ( ; index < size; index++)
+            *(master + index) = 0;
+
+        arg_head = arg_head->next_arg;
+    }
+}
+
+/*  Function to free dynamic memory used by the arguments linked list. */
+void kill_arg_list(Arg_Node *head)
+{
+    Arg_Node *temp;
+
+    while (head != NULL) {
+        temp = head;
+        head = head->next_arg;
+        free(temp->word);
+        free(temp);
+    }
+}
+
diff --git a/cachegrind/docs/concord.cgann b/cachegrind/docs/concord.cgann
new file mode 100644
index 0000000000..930e4dc7bc
--- /dev/null
+++ b/cachegrind/docs/concord.cgann
@@ -0,0 +1,560 @@
+--------------------------------------------------------------------------------
+-- Metadata
+--------------------------------------------------------------------------------
+Invocation:       ../cg_annotate concord.cgout
+Command:          ./concord ../cg_main.c
+Events recorded:  Ir
+Events shown:     Ir
+Event sort order: Ir
+Threshold:        0.1%
+Annotation:       on
+
+--------------------------------------------------------------------------------
+-- Summary
+--------------------------------------------------------------------------------
+Ir________________ 
+
+8,195,056 (100.0%)  PROGRAM TOTALS
+
+--------------------------------------------------------------------------------
+-- File:function summary
+--------------------------------------------------------------------------------
+  Ir______________________  file:function
+
+< 3,078,746 (37.6%, 37.6%)  /home/njn/grind/ws1/cachegrind/docs/concord.c:
+  1,630,232 (19.9%)           get_word
+    630,918  (7.7%)           hash
+    461,095  (5.6%)           insert
+    130,560  (1.6%)           add_existing
+     91,014  (1.1%)           init_hash_table
+     88,056  (1.1%)           create
+     46,676  (0.6%)           new_word_node
+
+< 1,746,038 (21.3%, 58.9%)  ./malloc/./malloc/malloc.c:
+  1,285,938 (15.7%)           _int_malloc
+    458,225  (5.6%)           malloc
+
+< 1,107,550 (13.5%, 72.4%)  ./libio/./libio/getc.c:getc
+
+<   551,071  (6.7%, 79.1%)  ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2
+
+<   521,228  (6.4%, 85.5%)  ./ctype/../include/ctype.h:
+    260,616  (3.2%)           __ctype_tolower_loc
+    260,612  (3.2%)           __ctype_b_loc
+
+<   468,163  (5.7%, 91.2%)  ???:
+    468,151  (5.7%)           ???
+
+<   456,071  (5.6%, 96.8%)  /usr/include/ctype.h:get_word
+
+<    48,344  (0.6%, 97.3%)  ./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S:__strcpy_avx2
+
+<    40,776  (0.5%, 97.8%)  ./elf/./elf/dl-lookup.c:
+     25,623  (0.3%)           do_lookup_x
+      9,515  (0.1%)           _dl_lookup_symbol_x
+
+<    37,412  (0.5%, 98.3%)  ./elf/./elf/dl-tunables.c:
+     36,500  (0.4%)           __GI___tunables_init
+
+<    23,366  (0.3%, 98.6%)  ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S:__strlen_avx2
+
+<    22,107  (0.3%, 98.9%)  ./malloc/./malloc/arena.c:
+     22,023  (0.3%)           malloc
+
+<    16,539  (0.2%, 99.1%)  ./elf/./elf/dl-reloc.c:_dl_relocate_object
+
+<     9,160  (0.1%, 99.2%)  ./elf/../sysdeps/generic/dl-new-hash.h:_dl_lookup_symbol_x
+
+<     8,535  (0.1%, 99.3%)  ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S:
+      8,503  (0.1%)           strcmp
+
+--------------------------------------------------------------------------------
+-- Function:file summary
+--------------------------------------------------------------------------------
+  Ir______________________  function:file
+
+> 2,086,303 (25.5%, 25.5%)  get_word:
+  1,630,232 (19.9%)           /home/njn/grind/ws1/cachegrind/docs/concord.c
+    456,071  (5.6%)           /usr/include/ctype.h
+
+> 1,285,938 (15.7%, 41.1%)  _int_malloc:./malloc/./malloc/malloc.c
+
+> 1,107,550 (13.5%, 54.7%)  getc:./libio/./libio/getc.c
+
+>   630,918  (7.7%, 62.4%)  hash:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>   551,071  (6.7%, 69.1%)  __strcmp_avx2:./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+
+>   480,248  (5.9%, 74.9%)  malloc:
+    458,225  (5.6%)           ./malloc/./malloc/malloc.c
+     22,023  (0.3%)           ./malloc/./malloc/arena.c
+
+>   468,151  (5.7%, 80.7%)  ???:???
+
+>   461,095  (5.6%, 86.3%)  insert:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>   260,616  (3.2%, 89.5%)  __ctype_tolower_loc:./ctype/../include/ctype.h
+
+>   260,612  (3.2%, 92.6%)  __ctype_b_loc:./ctype/../include/ctype.h
+
+>   130,560  (1.6%, 94.2%)  add_existing:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    91,014  (1.1%, 95.4%)  init_hash_table:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    88,056  (1.1%, 96.4%)  create:/home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    50,010  (0.6%, 97.0%)  new_word_node:
+     46,676  (0.6%)           /home/njn/grind/ws1/cachegrind/docs/concord.c
+
+>    48,344  (0.6%, 97.6%)  __strcpy_avx2:./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+
+>    42,906  (0.5%, 98.1%)  __GI___tunables_init:
+     36,500  (0.4%)           ./elf/./elf/dl-tunables.c
+
+>    26,514  (0.3%, 98.5%)  do_lookup_x:
+     25,623  (0.3%)           ./elf/./elf/dl-lookup.c
+
+>    25,642  (0.3%, 98.8%)  _dl_relocate_object:
+     16,539  (0.2%)           ./elf/./elf/dl-reloc.c
+
+>    23,366  (0.3%, 99.1%)  __strlen_avx2:./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+
+>    18,675  (0.2%, 99.3%)  _dl_lookup_symbol_x:
+      9,515  (0.1%)           ./elf/./elf/dl-lookup.c
+      9,160  (0.1%)           ./elf/../sysdeps/generic/dl-new-hash.h
+
+>     8,547  (0.1%, 99.4%)  strcmp:
+      8,503  (0.1%)           ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./ctype/../include/ctype.h
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./ctype/../include/ctype.h
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/../sysdeps/generic/dl-new-hash.h
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/../sysdeps/generic/dl-new-hash.h
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/./elf/dl-lookup.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/./elf/dl-lookup.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/./elf/dl-reloc.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/./elf/dl-reloc.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./elf/./elf/dl-tunables.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./elf/./elf/dl-tunables.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./libio/./libio/getc.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./libio/./libio/getc.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./malloc/./malloc/arena.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./malloc/./malloc/arena.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./malloc/./malloc/malloc.c
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./malloc/./malloc/malloc.c
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+--------------------------------------------------------------------------------
+Unannotated because one or more of these original files are unreadable:
+- ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+
+--------------------------------------------------------------------------------
+-- Annotated source file: /home/njn/grind/ws1/cachegrind/docs/concord.c
+--------------------------------------------------------------------------------
+Ir____________ 
+
+-- line 81 ----------------------------------------
+      .         Arg_Node  *append(char *word, Arg_Node *head);
+      .         void       count(Arg_Node *head, Word_Node *table[]);
+      .         void       list_lines(Arg_Node *head, Word_Node *table[]);
+      .         void       intersection(Arg_Node *head);
+      .         void       intersect_array(int master[], int size, Arg_Node *arg_head);
+      .         void       kill_arg_list(Arg_Node *head);
+      .         
+      .         int main(int argc, char *argv[])
+      8 (0.0%)  {
+      .             /* The actual hash table, a fixed-size array of pointers to word nodes */
+      .             Word_Node *table[TABLE_SIZE];
+      .         
+      .             /* Checking command line input for one file name */
+      2 (0.0%)      if (argc != ARGS_NUMBER + 1) {
+      .         	fprintf(stderr, "%s requires %d argument\n", argv[0], ARGS_NUMBER); 
+      .         	exit(EXIT_FAILURE);
+      .             }
+      .         
+      4 (0.0%)      init_hash_table(argv[1], table);
+      2 (0.0%)      interact(table);
+      .         
+      .             /* Nb:  I am not freeing the dynamic memory in the hash table, having been
+      .             ** told this is not necessary. */
+      .             return 0;
+      7 (0.0%)  }
+      .         
+      .         /* General dynamic allocation function that allocates and then checks. */
+      .         void *create(int mem_size)
+ 22,014 (0.3%)  {
+      .             void *dyn_block;
+      .         
+ 22,014 (0.3%)      dyn_block = malloc(mem_size);
+ 22,014 (0.3%)      if (!(dyn_block)) {
+      .                 fprintf(stderr, "Couldn't allocate enough memory to continue.\n");
+      .                 exit(EXIT_FAILURE);
+      .             }
+      .         
+      .             return dyn_block;
+ 22,014 (0.3%)  }
+      .         
+      .         /* Function returns a hash value on a word.  Almost identical to the hash
+      .         ** function presented in Sedgewick.
+      .         */
+      .         int hash(char *word)
+  7,908 (0.1%)  {
+  7,908 (0.1%)      int hash_value = 0;
+      .         
+161,292 (2.0%)      for ( ; *word; word++)
+453,810 (5.5%)          hash_value = (HASH_CONSTANT * hash_value + *word) % TABLE_SIZE;
+      .         
+      .             return hash_value;
+      .         }
+      .         
+      .         /* Function builds the hash table from the given file. */
+      .         void init_hash_table(char *file_name, Word_Node *table[])
+      8 (0.0%)  {
+      .             FILE *file_ptr;
+      .             Word_Info *data;
+      2 (0.0%)      int line = 1, i;
+      .         
+      .             /* Structure used when reading in words and line numbers. */
+      3 (0.0%)      data = (Word_Info *) create(sizeof(Word_Info));
+      .         
+      .             /* Initialise entire table to NULL. */
+  2,993 (0.0%)      for (i = 0; i < TABLE_SIZE; i++)
+    997 (0.0%)          table[i] = NULL;
+      .         
+      .             /* Open file, check it. */
+      4 (0.0%)      file_ptr = fopen(file_name, "r");
+      2 (0.0%)      if (!(file_ptr)) {
+      .                 fprintf(stderr, "Couldn't open '%s'.\n", file_name);
+      .                 exit(EXIT_FAILURE);
+      .             }
+      .         
+      .             /*  'Get' the words and lines one at a time from the file, and insert them
+      .             ** into the table one at a time. */
+ 55,363 (0.7%)      while ((line = get_word(data, line, file_ptr)) != EOF)
+ 31,632 (0.4%)          insert(data->word, data->line, table);
+      .         
+      2 (0.0%)      free(data);
+      2 (0.0%)      fclose(file_ptr);
+      6 (0.0%)  }
+      .         
+      .         /* Function reads the next word, and it's line number, and places them in the 
+      .         ** structure 'data', via a pointer.
+      .         */
+      .         int get_word(Word_Info *data, int line, FILE *file_ptr)
+ 86,999 (1.1%)  {
+ 15,818 (0.2%)      int index = 0, pos = BEFORE_WORD;
+      .         
+      .             /* Only alphabetic characters are read, apostrophes are ignored, and other
+      .             ** characters are considered separators.  'pos' helps keep track whether
+      .             ** the current file position is inside a word or between words.
+      .             */
+529,133 (6.5%)      while ((data->word[index] = tolower(fgetc(file_ptr))) != EOF) {
+      .                 if (data->word[index] == '\n')
+260,608 (3.2%)              line++;
+390,912 (4.8%)          if (islower(data->word[index])) {
+ 64,830 (0.8%)              if (pos == BEFORE_WORD) {
+ 15,816 (0.2%)                  pos = IN_WORD;
+  7,908 (0.1%)                  data->line = line;
+      .                     }
+ 32,415 (0.4%)              index++;
+      .                 }
+146,702 (1.8%)          else if ((pos == IN_WORD) && (data->word[index] != '\'')) {
+      .                     break;
+      .                 }
+      .             }
+      .             /* Signals end of file has been reached. */
+      .             if (data->word[index] == EOF)
+      1 (0.0%)          line = EOF;
+      .         
+      .             /* Adding the null character. */
+ 15,818 (0.2%)      data->word[index] = '\0';
+      .         
+      .             return line;
+ 63,272 (0.8%)  }
+      .         
+      .         /* Function inserts a word and it's line number into the hash table. */
+      .         void insert(char *inword, int in_line, Word_Node *table[])
+102,804 (1.3%)  {
+  7,908 (0.1%)      int position = hash(inword);
+      .             Word_Node *curr, *prev = NULL;
+  7,908 (0.1%)      char dummy_word[DUMMY_WORD_LENGTH] = "A";
+      .         
+      .             /* The case where that hash position hasn't been used before; a new word
+      .             ** node is created. 
+      .             */
+ 31,632 (0.4%)      if (table[position] == NULL)
+  3,185 (0.0%)          table[position] = new_word_node(dummy_word, 0);
+  7,908 (0.1%)      curr = table[position];
+      .         
+      .             /* Traverses that position's list of words until the current word is found
+      .             ** (i.e. it's come up before) or the list end is reached (i.e. it's the
+      .             ** first occurrence of the word).
+      .             */
+118,384 (1.4%)      while ((curr != NULL) && (strcmp(inword, curr->word) > 0)) {
+      .                 prev = curr;
+ 28,366 (0.3%)          curr = curr->next_word;
+      .             }
+      .         
+      .             /* If the word hasn't appeared before, it's inserted alphabetically into
+      .             ** the list.
+      .             */
+ 35,410 (0.4%)      if ((curr == NULL) || (strcmp(curr->word, inword) != 0)) {
+  4,120 (0.1%)          prev->next_word = new_word_node(inword, in_line);
+  1,030 (0.0%)          prev->next_word->next_word = curr;
+      .             }
+      .             /* Otherwise, the word count is incremented, and the line number is added
+      .             ** to the existing list.
+      .             */
+      .             else {
+  6,878 (0.1%)          (curr->number)++;
+ 27,512 (0.3%)          curr->last_line = add_existing(curr->last_line, in_line);
+      .             }
+ 78,050 (1.0%)  }
+      .         
+      .         /* Function creates a new node for when a word is inserted for the first time.
+      .         */
+      .         Word_Node *new_word_node(char *inword, int in_line)
+ 10,002 (0.1%)  {
+      .             Word_Node *new;
+      .         
+  5,001 (0.1%)      new = (Word_Node *) create(sizeof(Word_Node));
+  8,335 (0.1%)      new->word = (char *) create(sizeof(char) * (strlen(inword) + 1));
+  1,667 (0.0%)      new->word = strcpy(new->word, inword);
+      .             /* The word count is set to 1, as this is the first occurrence! */
+  1,667 (0.0%)      new->number = 1;
+  1,667 (0.0%)      new->next_word = NULL;
+      .             /* One line number node is added. */
+  5,001 (0.1%)      new->line_list = (Line_Node *) create(sizeof(Line_Node));
+  1,667 (0.0%)      new->line_list->line = in_line;
+  1,667 (0.0%)      new->line_list->next_line = NULL;
+  1,667 (0.0%)      new->last_line = new->line_list;
+      .         
+      .             return new;
+  8,335 (0.1%)  }
+      .         
+      .         /* Function adds a line number to the line number list of a word that has
+      .         ** already been inserted at least once.  The pointer 'last_line', part of
+      .         ** the word node structure, allows easy appending to the list.
+      .         */
+      .         Line_Node *add_existing(Line_Node *last_line, int in_line)
+ 34,390 (0.4%)  {
+      .             /* Check to see if that line has already occurred - multiple occurrences on
+      .             ** the one line are only recorded once.  (Nb:  They are counted twice, but
+      .             ** only listed once.)
+      .             */
+ 13,756 (0.2%)      if (last_line->line != in_line) {
+ 18,009 (0.2%)          last_line->next_line = (Line_Node *) create(sizeof(Line_Node));
+ 12,006 (0.1%)          last_line = last_line->next_line;
+  6,003 (0.1%)          last_line->line = in_line;
+  6,003 (0.1%)          last_line->next_line = NULL;
+      .             }
+      .         
+      .             return last_line;
+ 40,393 (0.5%)  }
+      .         
+      .         /*  Function controls the interactive command line part of the program. */
+      .         void interact(Word_Node *table[])
+     12 (0.0%)  {
+      .             char args[MAX_WORD_LENGTH];     /* Array to hold command line */
+      .             Arg_Node *arg_list = NULL;      /* List that holds processed arguments */ 
+      .             int not_quitted = TRUE;         /* Quit flag */
+      .         
+      .             /* The prompt (?) is displayed.  Commands are read into an array, and then
+      .             ** individual arguments are placed into a linked list for easy use. 
+      .             ** The first argument (actually the command) is looked at to determine
+      .             ** what action should be performed.  'arg_list->next_arg' is passed to
+      .             ** count() and list_lines(), because the actual 'c' or 'l' is not needed
+      .             ** by them.  Lastly, the argument linked list is freed, by 'kill_arg_list'.  
+      .             */ 
+      .             do {
+      .                 printf("?");		     
+      .                 fgets(args, MAX_WORD_LENGTH - 1, stdin);
+      3 (0.0%)          arg_list = place_args_in_list(args);
+      2 (0.0%)          if (arg_list) {
+      7 (0.0%)              if (strcmp(arg_list->word, "c") == 0)
+      .         		count(arg_list->next_arg, table);
+      6 (0.0%)              else if (strcmp(arg_list->word, "l") == 0)
+      .                        	list_lines(arg_list->next_arg, table); 
+      8 (0.0%)              else if (strcmp(arg_list->word, "q") == 0) {
+      .                        	printf("Quitting concord\n");
+      1 (0.0%)  		not_quitted = FALSE;
+      .         	    }
+      .                     else
+      .                        	printf("Not a valid command.\n");
+      2 (0.0%)  	    kill_arg_list(arg_list);
+      .                 }
+      2 (0.0%)      } while (not_quitted);	/* Quits on flag */
+     11 (0.0%)  }
+      .         
+      .         /* Function takes an array containing a command line, and parses it, placing
+      .         ** actual word into a linked list.
+      .         */
+      .         Arg_Node *place_args_in_list(char command[])
+     10 (0.0%)  {
+      2 (0.0%)      int index1 = 0, index2 = 0, pos = BEFORE_WORD;
+      .             char token[MAX_WORD_LENGTH], c;
+      1 (0.0%)      Arg_Node *head = NULL;
+      .         
+      .             /* Non alphabetic characters are discarded.  Alphabetic characters are
+      .             ** copied into the array 'token'.  Once the current word has been copied
+      .             ** into 'token', 'append' is called, copying 'token' to a new node in the
+      .             ** linked list.
+      .             */
+     12 (0.0%)      while (command[index1] != '\0') {
+      8 (0.0%)          c = tolower(command[index1++]);
+     11 (0.0%)          if (islower(c)) {
+      3 (0.0%)              token[index2++] = c;
+      4 (0.0%)              pos = IN_WORD;
+      .                 }
+      2 (0.0%)          else if (c == '\'')
+      .                     token[index2] = c;
+      2 (0.0%)          else if (pos == IN_WORD) {
+      1 (0.0%)              pos = BEFORE_WORD;
+      2 (0.0%)              token[index2] = '\0';
+      4 (0.0%)              head = append(token, head);
+      2 (0.0%)              index2 = 0;
+      .                 }
+      .             }
+      .         
+      .             return head;
+     11 (0.0%)  }
+      .         
+      .         /* Function takes a word, and appends a new node containing that word to the
+      .         ** list.
+      .         */
+      .         Arg_Node *append(char *word, Arg_Node *head)
+      6 (0.0%)  {
+      .             Arg_Node *curr = head,
+      3 (0.0%)               *new = (Arg_Node *) create(sizeof(Arg_Node));
+      .         
+      6 (0.0%)      new->word = (char *) create(sizeof(char) * (strlen(word) + 1));
+      .             strcpy(new->word, word);
+      1 (0.0%)      new->line_list = NULL;
+      1 (0.0%)      new->next_arg = NULL;
+      .         
+      2 (0.0%)      if (head == NULL)
+      .                 return new;
+      .         
+      .             while (curr->next_arg != NULL)
+      .                 curr = curr->next_arg;
+      .             curr->next_arg = new;
+      .         
+      .             return head;
+      5 (0.0%)  }
+      .         
+      .         
+      .         /* Function displays the number of times a word has occurred. */
+      .         void count(Arg_Node *arg_list, Word_Node *table[])
+      .         {
+      .             int hash_pos = 0;		/* Only initialised to avoid gnuc warnings */
+      .             Word_Node *curr_word = NULL;  
+      .         
+-- line 375 ----------------------------------------
+-- line 514 ----------------------------------------
+      .                     *(master + index) = 0;
+      .         
+      .                 arg_head = arg_head->next_arg;
+      .             }
+      .         }
+      .         
+      .         /*  Function to free dynamic memory used by the arguments linked list. */
+      .         void kill_arg_list(Arg_Node *head)
+      5 (0.0%)  {
+      .             Arg_Node *temp;
+      .         
+      4 (0.0%)      while (head != NULL) {
+      .                 temp = head;
+      2 (0.0%)          head = head->next_arg;
+      2 (0.0%)          free(temp->word);
+      2 (0.0%)          free(temp);
+      .             }
+      4 (0.0%)  }
+      .         
+
+--------------------------------------------------------------------------------
+-- Annotated source file: /usr/include/ctype.h
+--------------------------------------------------------------------------------
+Ir____________ 
+
+-- line 201 ----------------------------------------
+      .         #   define isblank(c)	__isctype((c), _ISblank)
+      .         #  endif
+      .         # endif
+      .         
+      .         # ifdef __USE_EXTERN_INLINES
+      .         __extern_inline int
+      .         __NTH (tolower (int __c))
+      .         {
+456,071 (5.6%)    return __c >= -128 && __c < 256 ? (*__ctype_tolower_loc ())[__c] : __c;
+      .         }
+      .         
+      .         __extern_inline int
+      .         __NTH (toupper (int __c))
+      .         {
+      .           return __c >= -128 && __c < 256 ? (*__ctype_toupper_loc ())[__c] : __c;
+      .         }
+      .         # endif
+-- line 217 ----------------------------------------
+
+--------------------------------------------------------------------------------
+-- Annotation summary
+--------------------------------------------------------------------------------
+Ir_______________ 
+
+3,534,817 (43.1%)    annotated: files known & above threshold & readable, line numbers known
+        0            annotated: files known & above threshold & readable, line numbers unknown
+        0          unannotated: files known & above threshold & two or more non-identical
+4,132,126 (50.4%)  unannotated: files known & above threshold & unreadable 
+   59,950  (0.7%)  unannotated: files known & below threshold
+  468,163  (5.7%)  unannotated: files unknown
+
diff --git a/cachegrind/docs/concord.cgout b/cachegrind/docs/concord.cgout
new file mode 100644
index 0000000000..e14054df57
--- /dev/null
+++ b/cachegrind/docs/concord.cgout
@@ -0,0 +1,5573 @@
+cmd: ./concord ../cg_main.c
+events: Ir
+fl=./csu/../csu/libc-start.c
+fn=__libc_start_main@@GLIBC_2.34
+128 2
+134 3
+135 6
+138 1
+139 2
+142 3
+143 2
+144 11
+145 4
+242 12
+332 3
+333 3
+358 6
+361 2
+364 2
+371 2
+373 2
+381 4
+fl=./csu/../sysdeps/nptl/libc_start_call_main.h
+fn=(below main)
+29 9
+44 3
+46 2
+51 2
+52 2
+55 2
+58 6
+74 2
+fl=./csu/./csu/init-first.c
+fn=_init_first
+46 4
+51 2
+55 5
+61 1
+62 1
+63 2
+71 2
+72 2
+fl=./ctype/../include/ctype.h
+fn=__ctype_b_loc
+40 65242
+41 130484
+42 65242
+fn=__ctype_tolower_loc
+52 65243
+53 130486
+54 65243
+fl=./ctype/./ctype/ctype-info.c
+fn=__ctype_init
+29 1
+31 7
+33 4
+35 4
+36 1
+fl=./elf/../bits/stdlib-bsearch.h
+fn=intel_check_word.constprop.0
+27 102
+28 51
+29 792
+31 972
+32 370
+37 126
+fl=./elf/../elf/dl-tls.c
+fn=_dl_add_to_slotinfo
+1015 8
+1021 1
+1024 1
+1025 1
+1029 3
+1063 2
+1066 3
+1067 3
+1070 6
+fn=_dl_allocate_tls_init
+528 10
+529 2
+533 2
+535 1
+536 1
+539 2
+542 3
+554 1
+559 18
+565 5
+568 3
+569 2
+575 3
+576 2
+578 9
+581 2
+582 2
+585 2
+586 4
+588 2
+589 2
+598 1
+606 2
+608 8
+614 1
+620 2
+623 2
+626 9
+fn=_dl_allocate_tls_storage
+370 1
+374 1
+375 1
+376 2
+379 1
+385 2
+422 4
+424 1
+435 1
+436 1
+437 2
+445 6
+446 1
+450 305
+469 1
+475 5
+fn=_dl_assign_tls_modid
+131 2
+134 2
+147 1
+186 1
+188 1
+191 1
+192 2
+fn=_dl_count_modids
+197 1
+199 2
+200 1
+215 2
+fn=_dl_determine_tlsoffset
+221 8
+222 1
+223 1
+224 1
+227 3
+230 2
+264 1
+266 10
+268 5
+270 3
+271 2
+273 2
+275 5
+286 1
+291 9
+293 2
+294 1
+306 1
+307 7
+309 2
+359 1
+360 8
+fn=_dl_tls_static_surplus_init
+83 3
+97 6
+101 6
+102 5
+108 3
+110 4
+113 1
+115 1
+117 3
+118 5
+fl=./elf/../include/list.h
+fn=__tls_init_tp
+43 2
+44 2
+45 1
+47 1
+fl=./elf/../include/rtld-malloc.h
+fn=__minimal_calloc
+56 8
+fn=_dl_allocate_tls_storage
+44 2
+56 1
+fn=_dl_check_map_versions
+44 9
+fn=_dl_find_object_init
+56 2
+fn=_dl_important_hwcaps
+56 3
+fn=_dl_init_paths
+56 5
+fn=_dl_map_object_deps
+56 6
+fn=_dl_new_object
+44 9
+56 6
+fn=fillin_rpath.isra.0
+50 4
+51 2
+56 4
+fn=init_tls
+44 2
+fl=./elf/../include/scratch_buffer.h
+fn=_dl_map_object_deps
+77 2
+78 1
+85 3
+131 8
+fl=./elf/../misc/sbrk.c
+fn=sbrk
+37 5
+40 1
+58 2
+62 2
+78 5
+fl=./elf/../nptl/nptl-stack.h
+fn=__libc_early_init
+58 8
+fl=./elf/../sysdeps/generic/_itoa.h
+fn=_dl_check_map_versions
+76 6
+fl=./elf/../sysdeps/generic/dl-cache.h
+fn=_dl_load_cache_lookup
+125 5
+194 1
+195 5
+fl=./elf/../sysdeps/generic/dl-debug.h
+fn=dl_main
+29 4
+30 1
+fl=./elf/../sysdeps/generic/dl-hash.h
+fn=__rtld_malloc_init_real
+43 1
+44 1
+45 19
+48 18
+62 24
+67 1
+72 1
+fl=./elf/../sysdeps/generic/dl-new-hash.h
+fn=_dl_lookup_symbol_x
+69 106
+70 212
+74 783
+77 1566
+80 727
+81 1454
+87 50
+88 50
+89 100
+90 50
+99 677
+101 677
+102 1354
+104 677
+105 677
+fl=./elf/../sysdeps/generic/dl-protected.h
+fn=do_lookup_x
+29 396
+fl=./elf/../sysdeps/generic/ldsodefs.h
+fn=_dl_relocate_object
+80 27
+138 321
+fn=_dl_start
+80 6
+fn=do_lookup_x
+137 198
+138 297
+fl=./elf/../sysdeps/nptl/dl-mutex.c
+fn=__rtld_mutex_init
+30 4
+37 1
+40 7
+44 2
+45 9
+47 6
+51 2
+52 7
+53 4
+fl=./elf/../sysdeps/nptl/dl-tls_init_tp.c
+fn=__tls_init_tp
+68 3
+69 1
+72 1
+75 4
+76 2
+77 1
+82 2
+87 4
+90 1
+93 3
+106 4
+123 3
+130 2
+131 3
+fn=__tls_pre_init_tp
+53 3
+56 10
+61 2
+62 1
+64 1
+fn=rtld_mutex_dummy
+42 8
+44 16
+fl=./elf/../sysdeps/nptl/pthread_early_init.h
+fn=__libc_early_init
+33 6
+34 1
+38 5
+45 2
+46 1
+52 7
+53 1
+54 1
+57 1
+fl=./elf/../sysdeps/posix/dl-fileid.h
+fn=_dl_map_object_from_fd
+37 10
+40 8
+49 16
+fl=./elf/../sysdeps/unix/sysv/linux/brk.c
+fn=brk
+36 1
+37 1
+38 2
+44 1
+45 1
+fl=./elf/../sysdeps/unix/sysv/linux/brk_call.h
+fn=brk
+24 2
+fl=./elf/../sysdeps/unix/sysv/linux/dl-osinfo.h
+fn=dl_main
+39 2
+52 1
+fl=./elf/../sysdeps/unix/sysv/linux/dl-parse_auxv.h
+fn=_dl_sysdep_parse_arguments
+32 2
+33 1
+34 1
+41 85
+42 40
+43 40
+45 1
+46 2
+47 3
+48 2
+49 2
+50 2
+51 2
+52 2
+54 2
+fl=./elf/../sysdeps/unix/sysv/linux/dl-sysdep.c
+fn=_dl_sysdep_parse_arguments
+78 5
+79 2
+80 2
+81 3
+82 82
+83 167
+86 2
+90 56
+93 2
+94 2
+95 2
+96 5
+fn=_dl_sysdep_start
+102 4
+103 1
+106 2
+110 2
+113 1
+115 2
+122 3
+123 2
+125 5
+137 3
+140 5
+143 4
+fn=_dl_sysdep_start_cleanup
+147 1
+148 1
+fl=./elf/../sysdeps/unix/sysv/linux/dl-vdso-setup.h
+fn=dl_main
+30 1
+33 1
+36 1
+39 1
+45 1
+fl=./elf/../sysdeps/unix/sysv/linux/dl-vdso.h
+fn=dl_main
+40 2
+41 5
+55 2
+fl=./elf/../sysdeps/unix/sysv/linux/rseq-internal.h
+fn=__tls_init_tp
+32 3
+34 7
+37 2
+40 1
+fl=./elf/../sysdeps/unix/sysv/linux/x86/cpu-features.c
+fn=init_cpu_features.constprop.0
+27 6
+fl=./elf/../sysdeps/unix/sysv/linux/x86/dl-minsigstacksize.h
+fn=get_common_indices.constprop.0
+24 2
+27 2
+28 2
+65 3
+72 1
+74 1
+fl=./elf/../sysdeps/x86/cpu-features.c
+fn=get_common_indices.constprop.0
+325 1
+329 4
+332 5
+336 1
+337 4
+338 4
+339 4
+340 3
+341 2
+348 2
+350 9
+355 8
+362 2
+363 7
+369 2
+384 2
+fn=init_cpu_features.constprop.0
+303 3
+304 2
+305 6
+310 2
+311 6
+316 2
+317 7
+399 7
+402 1
+403 1
+404 1
+415 3
+418 6
+419 1
+422 2
+424 6
+429 1
+431 3
+433 5
+434 15
+489 3
+503 1
+513 9
+555 1
+556 1
+564 2
+571 2
+573 1
+577 2
+583 2
+585 1
+687 3
+688 1
+691 2
+692 1
+698 1
+699 1
+700 1
+701 2
+706 4
+710 3
+723 2
+724 4
+770 1
+771 2
+775 3
+792 2
+793 2
+795 3
+796 2
+798 3
+799 2
+802 2
+817 4
+819 4
+829 2
+874 8
+fn=update_active.constprop.0
+43 7
+47 1
+55 3
+56 3
+57 1
+65 4
+66 1
+74 4
+75 1
+76 3
+80 4
+83 3
+84 1
+87 3
+88 2
+92 4
+93 3
+94 3
+95 2
+97 4
+98 3
+100 2
+101 4
+104 4
+105 4
+109 2
+113 2
+115 4
+119 2
+121 1
+124 2
+126 2
+130 1
+131 1
+134 4
+140 4
+142 4
+144 3
+149 4
+198 3
+210 2
+211 1
+214 4
+218 2
+222 3
+223 2
+225 1
+226 1
+229 2
+231 1
+234 2
+285 3
+289 3
+296 1
+297 8
+fl=./elf/../sysdeps/x86/dl-cacheinfo.h
+fn=get_common_cache_info.constprop.0
+481 9
+496 1
+497 1
+498 2
+499 1
+505 1
+507 6
+510 4
+521 1
+526 3
+530 2
+535 7
+538 32
+545 8
+548 26
+553 2
+557 3
+558 2
+562 6
+566 3
+569 4
+570 1
+575 5
+581 4
+589 5
+591 2
+593 4
+595 13
+597 1
+598 1
+599 9
+601 4
+604 2
+609 1
+611 5
+612 3
+613 1
+616 2
+619 2
+626 1
+628 4
+629 1
+634 2
+639 3
+640 1
+641 2
+642 3
+681 4
+682 5
+686 2
+693 1
+694 1
+695 7
+fn=handle_intel.constprop.0
+250 96
+255 24
+263 12
+264 12
+266 4
+272 48
+279 24
+280 60
+284 60
+286 24
+289 60
+291 24
+294 5
+296 2
+299 5
+301 2
+305 3
+310 96
+fn=init_cpu_features.constprop.0
+706 1
+708 1
+722 2
+724 4
+725 3
+726 4
+729 3
+731 3
+732 1
+734 3
+736 3
+737 1
+739 3
+741 3
+744 3
+746 3
+748 3
+750 4
+840 1
+841 2
+842 2
+843 2
+844 2
+845 2
+846 1
+847 1
+848 1
+849 2
+850 1
+851 1
+863 10
+873 2
+874 1
+881 12
+898 2
+907 7
+909 2
+912 5
+914 2
+917 5
+919 2
+922 5
+923 3
+929 6
+932 10
+933 8
+940 9
+942 10
+944 9
+958 3
+960 2
+961 1
+962 1
+963 1
+964 2
+965 1
+fn=intel_check_word.constprop.0
+110 690
+113 345
+119 104
+123 78
+125 15
+129 156
+131 154
+133 63
+135 126
+143 126
+152 12
+155 138
+158 99
+162 64
+164 186
+167 120
+168 72
+169 60
+170 48
+172 11
+174 22
+176 12
+177 16
+178 12
+179 20
+180 14
+181 12
+183 8
+184 16
+187 21
+194 204
+202 175
+216 51
+246 104
+fl=./elf/../sysdeps/x86/dl-cet.c
+fn=_dl_cet_check
+41 2
+42 1
+44 3
+45 1
+48 4
+49 2
+51 1
+62 2
+252 8
+254 8
+fl=./elf/../sysdeps/x86/dl-get-cpu-features.c
+fn=__x86_cpu_features
+39 3
+71 3
+fn=_dl_x86_init_cpu_features
+37 1
+39 3
+41 1
+fl=./elf/../sysdeps/x86/dl-hwcap.h
+fn=_dl_important_hwcaps
+57 4
+fl=./elf/../sysdeps/x86/dl-procinfo.h
+fn=_dl_load_cache_lookup
+39 2
+42 7
+fl=./elf/../sysdeps/x86/dl-prop.h
+fn=_dl_map_object_from_fd
+95 16
+100 4
+101 2
+102 2
+103 2
+105 18
+108 6
+109 4
+110 4
+114 6
+118 6
+121 6
+122 4
+126 2
+127 10
+131 6
+132 3
+135 6
+138 3
+139 12
+144 12
+151 6
+164 6
+167 11
+185 6
+187 8
+192 2
+193 8
+198 16
+200 8
+201 2
+202 2
+203 4
+212 6
+213 2
+fn=dl_main
+36 2
+37 15
+40 11
+49 22
+53 17
+71 4
+95 8
+100 2
+101 1
+102 1
+103 1
+105 10
+108 3
+109 2
+110 2
+114 3
+118 3
+121 3
+122 2
+126 1
+127 5
+131 4
+132 2
+135 4
+138 2
+139 8
+144 8
+151 4
+164 4
+165 2
+167 2
+170 4
+185 3
+187 4
+192 1
+193 4
+198 9
+200 4
+201 1
+202 1
+203 2
+212 5
+213 2
+fl=./elf/../sysdeps/x86/get-isa-level.h
+fn=_dl_hwcaps_subdirs_active
+30 2
+31 3
+32 3
+36 3
+39 4
+40 4
+45 4
+48 2
+51 5
+53 4
+54 2
+55 2
+58 2
+61 2
+fn=update_active.constprop.0
+28 1
+30 2
+31 2
+32 3
+36 3
+38 1
+39 2
+40 4
+45 4
+47 1
+48 2
+51 5
+53 4
+54 2
+55 2
+57 1
+58 2
+61 2
+fl=./elf/../sysdeps/x86_64/dl-hwcaps-subdirs.c
+fn=_dl_hwcaps_subdirs_active
+29 1
+38 3
+43 1
+48 1
+52 1
+fl=./elf/../sysdeps/x86_64/dl-machine.h
+fn=_dl_fixup
+220 2
+fn=_dl_relocate_object
+72 12
+78 4
+82 3
+88 1
+98 2
+118 2
+121 2
+123 4
+253 246
+260 246
+264 246
+273 947
+276 450
+277 240
+278 241
+281 6
+282 9
+293 6
+300 12
+303 738
+384 51
+386 68
+390 51
+399 34
+407 210
+444 6
+448 8
+449 8
+450 10
+451 4
+460 4
+461 6
+462 2
+463 2
+482 18
+483 6
+487 12
+488 12
+498 106
+502 28
+505 42
+506 14
+532 78
+534 39
+535 39
+fn=_dl_start
+46 1
+72 1
+264 21
+273 59
+276 14
+277 28
+303 21
+307 6
+fn=_dl_sysdep_start
+206 1
+fl=./elf/../sysdeps/x86_64/dl-runtime.h
+fn=_dl_fixup
+28 2
+fl=./elf/../sysdeps/x86_64/dl-trampoline.h
+fn=_dl_runtime_resolve_xsave
+71 2
+76 2
+79 2
+81 2
+91 2
+97 2
+98 2
+99 2
+100 2
+101 2
+102 2
+103 2
+107 2
+108 2
+111 2
+112 2
+114 2
+115 2
+116 2
+117 2
+118 2
+119 2
+121 2
+128 2
+129 2
+130 2
+131 2
+136 2
+137 2
+138 2
+140 2
+141 2
+142 2
+143 2
+144 2
+145 2
+146 2
+148 2
+150 2
+154 2
+156 2
+fl=./elf/./dl-find_object.h
+fn=_dl_find_object_from_map
+95 8
+96 8
+97 4
+103 20
+104 119
+105 82
+107 12
+112 4
+fl=./elf/./dl-hwcaps.h
+fn=_dl_important_hwcaps
+54 15
+56 6
+57 1
+88 10
+89 9
+90 5
+fl=./elf/./dl-load.h
+fn=_dl_map_object_from_fd
+91 16
+92 6
+94 16
+95 4
+96 6
+97 4
+99 4
+fl=./elf/./dl-main.h
+fn=dl_main
+112 5
+fl=./elf/./dl-map-segments.h
+fn=_dl_map_object_from_fd
+28 6
+29 20
+101 2
+102 6
+105 6
+106 4
+108 4
+125 2
+127 2
+135 48
+137 24
+139 54
+140 12
+149 32
+155 4
+156 2
+157 10
+158 4
+165 4
+168 4
+176 14
+182 4
+186 8
+189 2
+194 8
+fl=./elf/./dl-sym-post.h
+fn=lookup_malloc_symbol
+41 8
+53 12
+fl=./elf/./dl-tunable-types.h
+fn=__GI___tunable_set_val
+90 10
+fl=./elf/./elf/dl-audit.c
+fn=_dl_audit_activity_map
+29 6
+30 1
+31 3
+37 6
+fn=_dl_audit_activity_nsid
+41 18
+45 12
+46 3
+47 12
+51 18
+fn=_dl_audit_objclose
+97 4
+98 16
+fn=_dl_audit_objopen
+77 2
+78 8
+fn=_dl_audit_preinit
+118 1
+119 4
+fl=./elf/./elf/dl-cache.c
+fn=_dl_cache_libcmp
+367 30
+368 122
+370 162
+372 58
+378 6
+379 6
+380 52
+382 52
+384 4
+390 104
+392 104
+393 24
+396 44
+397 44
+400 16
+fn=_dl_load_cache_lookup
+194 16
+208 1
+212 7
+218 10
+219 2
+221 8
+228 26
+230 40
+231 16
+235 24
+239 24
+240 16
+248 4
+250 2
+253 2
+255 6
+257 1
+267 4
+271 3
+272 6
+277 4
+278 3
+294 2
+303 2
+306 2
+311 11
+312 6
+340 2
+352 4
+356 7
+357 4
+359 3
+413 10
+415 2
+418 3
+421 5
+429 5
+430 8
+433 6
+434 1
+441 1
+442 2
+492 3
+495 3
+510 2
+522 17
+523 4
+524 1
+525 8
+fn=_dl_unload_cache
+534 2
+535 4
+537 2
+538 1
+542 1
+544 2
+fl=./elf/./elf/dl-call-libc-early-init.c
+fn=_dl_call_libc_early_init
+27 4
+29 2
+33 7
+37 2
+39 4
+40 2
+41 3
+fl=./elf/./elf/dl-debug.c
+fn=_dl_debug_initialize
+56 2
+57 3
+59 2
+63 3
+64 1
+85 3
+90 3
+91 2
+92 1
+95 3
+96 6
+99 2
+107 2
+fn=_dl_debug_state
+116 2
+117 2
+fn=_dl_debug_update
+38 3
+40 6
+41 3
+44 6
+48 3
+fl=./elf/./elf/dl-deps.c
+fn=_dl_map_object_deps
+128 6
+129 5
+130 6
+132 3
+136 8
+143 13
+144 20
+160 8
+161 1
+164 1
+182 2
+183 6
+184 1
+188 7
+190 12
+193 4
+197 12
+198 25
+201 8
+205 4
+208 16
+210 8
+215 2
+216 2
+217 4
+218 4
+221 224
+222 104
+228 24
+230 2
+232 8
+233 4
+242 2
+244 4
+249 14
+252 2
+253 2
+254 2
+255 4
+257 2
+259 10
+263 6
+264 3
+266 150
+417 8
+419 8
+422 6
+423 8
+429 2
+430 10
+431 10
+434 2
+435 4
+439 12
+441 4
+442 17
+448 6
+449 1
+452 4
+463 4
+465 3
+470 4
+471 1
+472 5
+474 16
+478 4
+483 8
+485 9
+490 4
+494 2
+495 3
+532 2
+533 2
+547 7
+552 9
+553 4
+557 1
+559 2
+560 1
+561 2
+568 3
+571 2
+574 8
+fn=openaux
+61 6
+64 22
+65 2
+68 2
+69 4
+fl=./elf/./elf/dl-environ.c
+fn=_dl_next_ld_env_entry
+28 3
+29 3
+32 250
+34 164
+35 20
+37 2
+40 4
+42 2
+45 80
+49 1
+fl=./elf/./elf/dl-error-skeleton.c
+fn=_dl_catch_error
+225 10
+227 2
+228 2
+229 2
+230 2
+232 5
+fn=_dl_catch_exception
+175 15
+178 6
+180 6
+185 3
+199 3
+200 6
+203 6
+206 18
+208 9
+209 6
+210 12
+219 6
+fn=_dl_receive_error
+238 6
+239 1
+240 1
+243 1
+244 1
+246 1
+248 1
+249 1
+250 4
+fl=./elf/./elf/dl-find_object.c
+fn=_dl_find_object_init
+561 4
+564 1
+566 2
+567 3
+579 2
+580 3
+582 2
+585 2
+590 1
+591 1
+594 3
+596 2
+599 6
+601 1
+604 4
+fn=_dlfo_process_initial
+474 4
+475 2
+477 2
+478 4
+504 24
+505 22
+506 8
+508 28
+511 24
+513 18
+515 6
+516 6
+517 12
+529 2
+531 8
+fn=_dlfo_sort_mappings
+536 1
+537 2
+540 12
+544 4
+545 19
+546 19
+553 8
+554 10
+555 4
+557 1
+fl=./elf/./elf/dl-fini.c
+fn=_dl_fini
+31 11
+48 1
+51 13
+54 2
+56 1
+59 2
+61 6
+66 1
+68 3
+73 17
+78 16
+80 8
+82 8
+84 8
+85 4
+86 4
+90 4
+92 6
+93 6
+99 5
+108 2
+113 18
+115 4
+117 12
+120 8
+123 12
+124 6
+127 4
+138 8
+139 2
+140 4
+141 20
+142 4
+146 6
+147 6
+153 8
+158 4
+162 5
+168 6
+170 1
+174 2
+181 8
+fl=./elf/./elf/dl-hwcaps.c
+fn=_dl_important_hwcaps
+55 16
+57 2
+58 6
+59 2
+80 16
+82 2
+83 16
+87 2
+89 6
+90 2
+91 2
+103 4
+105 3
+108 1
+111 2
+115 4
+128 12
+130 10
+131 4
+132 4
+133 2
+136 5
+145 20
+146 10
+148 1
+154 2
+158 7
+159 3
+160 4
+164 1
+165 3
+166 2
+175 14
+176 6
+178 2
+179 4
+204 2
+205 4
+215 2
+216 4
+220 18
+221 6
+222 4
+225 3
+228 20
+229 1
+230 6
+231 4
+233 5
+234 2
+235 8
+236 1
+238 3
+240 7
+241 2
+242 1
+245 6
+246 1
+249 2
+253 2
+254 3
+257 6
+258 3
+260 1
+261 11
+262 4
+263 3
+273 4
+277 9
+284 4
+285 2
+292 2
+293 3
+306 4
+317 2
+330 2
+333 2
+340 7
+343 2
+346 20
+349 14
+350 23
+351 5
+354 10
+356 15
+361 34
+362 8
+366 24
+369 32
+370 64
+371 36
+373 8
+376 7
+377 2
+378 10
+380 6
+381 2
+383 1
+384 7
+391 18
+392 4
+394 8
+397 4
+403 9
+fl=./elf/./elf/dl-hwcaps_split.c
+fn=_dl_hwcaps_split
+25 4
+26 3
+27 1
+47 4
+fn=_dl_hwcaps_split_masked
+26 44
+30 24
+34 78
+35 12
+36 24
+41 36
+42 18
+43 12
+45 6
+51 77
+55 5
+56 18
+57 27
+58 24
+60 6
+62 66
+67 12
+fl=./elf/./elf/dl-init.c
+fn=_dl_init
+31 30
+33 26
+77 12
+78 1
+79 1
+82 3
+85 2
+90 2
+115 1
+116 17
+117 8
+123 8
+fn=call_init.part.0
+26 36
+39 12
+42 12
+43 2
+47 6
+55 9
+56 6
+59 3
+60 6
+66 6
+68 4
+69 22
+70 12
+72 24
+fl=./elf/./elf/dl-load.c
+fn=_dl_dst_count
+238 12
+241 4
+244 4
+245 2
+264 14
+fn=_dl_init_paths
+706 13
+720 4
+725 1
+727 3
+734 3
+735 2
+739 4
+741 4
+747 1
+748 1
+756 9
+758 8
+759 4
+761 5
+762 7
+763 7
+766 9
+767 9
+768 18
+770 29
+776 1
+777 1
+782 1
+785 3
+787 4
+789 3
+806 1
+808 3
+822 2
+825 5
+827 23
+831 2
+832 84
+833 20
+834 40
+836 1
+837 1
+838 2
+844 6
+847 3
+853 1
+857 8
+fn=_dl_map_object
+682 10
+692 5
+696 7
+1971 33
+1973 2
+1979 6
+1980 12
+1983 42
+1988 49
+1990 35
+1994 12
+1995 18
+1998 10
+1999 6
+2000 8
+2017 14
+2018 6
+2026 6
+2041 2
+2043 10
+2047 4
+2049 2
+2056 4
+2060 1
+2061 1
+2065 3
+2066 1
+2080 2
+2081 7
+2082 1
+2091 2
+2106 2
+2107 16
+2113 5
+2114 1
+2122 1
+2136 2
+2138 3
+2142 3
+2144 2
+2148 2
+2156 3
+2179 8
+2183 3
+2184 1
+2201 2
+2207 1
+2208 4
+2209 2
+2210 3
+2214 8
+2217 3
+2229 8
+2274 4
+2275 34
+2277 27
+fn=_dl_map_object_from_fd
+944 28
+954 6
+961 8
+999 29
+1000 10
+1018 4
+1046 8
+1056 4
+1063 16
+1064 4
+1075 4
+1076 6
+1077 4
+1079 6
+1080 8
+1081 2
+1084 17
+1085 6
+1096 2
+1098 2
+1100 44
+1101 2
+1102 2
+1103 2
+1104 2
+1110 87
+1111 213
+1117 4
+1124 5
+1125 6
+1126 20
+1131 2
+1132 1
+1137 62
+1145 32
+1146 52
+1147 40
+1148 16
+1149 16
+1151 48
+1153 16
+1166 16
+1167 30
+1172 8
+1173 82
+1183 8
+1186 3
+1190 4
+1191 1
+1195 5
+1196 2
+1199 1
+1204 4
+1219 4
+1220 2
+1223 8
+1228 14
+1238 38
+1239 16
+1245 4
+1255 8
+1262 14
+1278 8
+1279 6
+1285 6
+1286 4
+1288 4
+1298 6
+1317 6
+1319 12
+1371 6
+1372 2
+1378 70
+1379 117
+1385 33
+1390 8
+1403 4
+1405 8
+1407 4
+1423 4
+1427 4
+1428 4
+1445 4
+1446 1
+1449 4
+1454 4
+1464 4
+1471 16
+1472 6
+1473 9
+1474 3
+1475 5
+1481 4
+1482 4
+1487 2
+1494 6
+1497 8
+1516 4
+1520 14
+1521 8
+1525 18
+fn=_dl_process_pt_gnu_property
+868 6
+869 6
+870 3
+876 6
+882 15
+885 9
+886 6
+887 6
+930 3
+fn=expand_dynamic_string_token
+241 6
+244 4
+385 18
+399 4
+410 14
+fn=fillin_rpath.isra.0
+468 16
+472 1
+474 18
+477 1
+478 1
+481 4
+483 4
+487 2
+492 3
+493 2
+500 5
+505 4
+509 33
+510 18
+528 8
+532 8
+534 4
+538 2
+539 4
+540 8
+541 2
+543 4
+549 8
+550 56
+551 18
+553 4
+554 6
+555 2
+561 4
+562 2
+565 6
+571 1
+574 12
+fn=open_path
+1819 11
+1820 2
+1823 1
+1824 2
+1826 2
+1829 2
+1831 22
+1834 6
+1838 4
+1843 12
+1850 10
+1851 78
+1854 60
+1859 220
+1862 60
+1868 60
+1869 4
+1871 160
+1873 80
+1875 20
+1880 20
+1881 70
+1887 40
+1889 62
+1890 2
+1892 37
+1899 40
+1901 20
+1940 14
+1945 2
+1947 8
+1950 4
+1964 9
+fn=open_verify.constprop.0
+1578 264
+1610 66
+1615 40
+1629 110
+1631 44
+1639 2
+1640 2
+1645 16
+1647 4
+1649 4
+1651 4
+1657 4
+1674 29
+1755 4
+1760 4
+1766 8
+1772 4
+1778 8
+1779 8
+1783 16
+1784 5
+1805 198
+fl=./elf/./elf/dl-lookup-direct.c
+fn=_dl_lookup_direct
+32 15
+33 9
+35 6
+36 6
+48 9
+51 6
+53 18
+57 6
+58 21
+59 15
+74 36
+76 6
+78 21
+79 6
+81 12
+84 25
+86 12
+93 6
+115 3
+116 27
+fl=./elf/./elf/dl-lookup.c
+fn=_dl_lookup_symbol_x
+756 1272
+758 212
+759 424
+762 106
+766 406
+768 1166
+769 212
+771 212
+775 247
+776 1802
+781 530
+783 42
+800 106
+801 21
+804 212
+805 578
+812 99
+840 417
+854 297
+855 1
+857 198
+858 1
+859 954
+fn=check_match
+71 1313
+74 707
+79 103
+87 495
+90 444
+94 99
+95 198
+97 194
+116 194
+117 776
+118 485
+145 4
+147 12
+148 8
+163 606
+fn=do_lookup_x
+172 40
+348 1484
+349 106
+354 106
+355 106
+359 660
+362 660
+366 884
+370 656
+374 656
+380 984
+384 328
+385 328
+388 1640
+389 656
+392 328
+393 656
+395 328
+396 212
+397 656
+399 742
+400 1312
+403 2296
+406 208
+407 312
+408 208
+410 208
+413 1165
+415 303
+416 2558
+417 202
+420 303
+423 406
+437 891
+454 10
+460 198
+467 760
+471 24
+483 198
+484 99
+485 198
+502 693
+505 7
+506 848
+fl=./elf/./elf/dl-minimal-malloc.c
+fn=__minimal_calloc
+78 40
+82 8
+85 8
+86 24
+fn=__minimal_free
+95 2
+97 8
+fn=__minimal_malloc
+35 125
+36 75
+41 3
+42 2
+43 2
+47 99
+50 167
+55 8
+56 8
+58 2
+59 16
+61 4
+63 4
+65 6
+68 25
+69 27
+71 100
+fl=./elf/./elf/dl-minimal.c
+fn=__rtld_malloc_init_real
+76 8
+86 2
+87 1
+89 1
+91 6
+92 5
+93 5
+94 4
+99 1
+100 1
+101 1
+102 1
+103 7
+fn=__rtld_malloc_init_stubs
+42 1
+43 2
+44 2
+45 2
+46 2
+47 1
+fn=lookup_malloc_symbol
+61 28
+63 4
+64 28
+68 28
+69 28
+71 4
+72 20
+fn=strsep
+239 3
+242 9
+244 3
+245 6
+249 126
+254 78
+256 152
+260 21
+267 2
+271 6
+fl=./elf/./elf/dl-misc.c
+fn=_dl_name_match_p
+68 84
+69 56
+70 3
+72 14
+74 58
+75 90
+80 15
+82 11
+83 56
+fn=_dl_sysdep_read_whole_file
+36 9
+39 3
+40 2
+42 6
+44 2
+47 2
+49 8
+60 3
+63 7
+fl=./elf/./elf/dl-object.c
+fn=_dl_add_to_namespace_list
+31 18
+33 9
+35 26
+38 20
+40 2
+42 2
+45 2
+46 9
+47 6
+48 6
+50 6
+51 9
+fn=_dl_new_object
+59 42
+62 6
+64 2
+66 3
+67 2
+71 6
+77 2
+80 4
+83 11
+87 4
+92 2
+95 6
+98 3
+99 3
+100 6
+103 6
+104 21
+106 3
+119 10
+123 2
+125 1
+127 21
+130 6
+131 6
+132 3
+136 3
+139 58
+141 19
+149 6
+150 3
+153 1
+155 18
+157 6
+160 8
+164 8
+168 12
+176 1
+179 6
+189 9
+191 10
+195 6
+200 8
+208 2
+212 2
+245 8
+250 84
+251 84
+253 4
+256 2
+259 2
+263 27
+fl=./elf/./elf/dl-reloc.c
+fn=_dl_relocate_object
+170 492
+171 214
+174 621
+175 49
+177 7
+178 21
+182 510
+183 100
+184 100
+186 826
+187 94
+188 564
+190 706
+193 100
+194 700
+196 7
+207 52
+217 6
+218 12
+221 4
+223 8
+225 4
+226 12
+245 8
+251 16
+252 8
+253 1
+255 8
+262 16
+301 11129
+304 24
+328 4
+331 8
+348 12
+350 32
+356 24
+359 8
+363 8
+364 24
+fl=./elf/./elf/dl-runtime.c
+fn=_dl_fixup
+46 16
+48 10
+49 6
+54 2
+55 8
+56 14
+58 4
+63 4
+67 4
+71 6
+75 6
+76 8
+77 6
+84 2
+85 6
+95 22
+99 10
+109 20
+124 8
+133 6
+159 6
+163 12
+fl=./elf/./elf/dl-setup_hash.c
+fn=_dl_setup_hash
+25 8
+28 12
+31 12
+32 8
+33 4
+34 4
+36 12
+37 4
+38 12
+40 4
+41 8
+43 4
+45 12
+50 4
+fl=./elf/./elf/dl-sort-maps.c
+fn=_dl_sort_maps
+145 40
+186 2
+187 4
+188 34
+189 16
+216 34
+221 4
+223 2
+224 10
+226 26
+228 8
+231 24
+249 8
+262 2
+269 6
+282 8
+304 26
+312 6
+316 18
+fn=_dl_sort_maps_init
+295 2
+296 4
+297 1
+298 3
+299 2
+fn=dfs_traversal.part.0
+140 64
+145 20
+148 8
+150 24
+152 60
+161 28
+176 24
+177 8
+178 48
+fl=./elf/./elf/dl-tunables.c
+fn=__GI___tunable_set_val
+102 45
+111 5
+112 20
+113 25
+116 10
+119 10
+123 25
+130 10
+131 10
+134 30
+135 5
+136 5
+137 5
+157 25
+161 5
+fn=__GI___tunables_init
+71 415
+74 82
+77 6594
+81 164
+86 492
+270 3
+279 9
+298 164
+302 11480
+304 5
+308 15826
+320 328
+354 8
+fn=__tunable_get_val
+402 33
+405 357
+414 43
+424 23
+425 23
+431 165
+433 33
+fl=./elf/./elf/dl-tunables.h
+fn=__GI___tunable_set_val
+121 10
+122 5
+131 5
+fn=__GI___tunables_init
+140 2744
+141 3524
+fl=./elf/./elf/dl-version.c
+fn=_dl_check_all_versions
+392 6
+394 2
+396 17
+397 4
+398 36
+401 7
+fn=_dl_check_map_versions
+36 24
+37 5
+38 35
+56 32
+64 24
+70 24
+86 8
+87 16
+89 8
+94 192
+108 136
+110 8
+113 56
+120 180
+124 60
+155 36
+156 4
+164 6
+170 12
+172 16
+174 4
+175 4
+177 8
+180 6
+184 12
+200 6
+204 4
+208 6
+213 6
+215 6
+217 24
+218 73
+221 16
+224 32
+228 4
+231 32
+234 24
+239 6
+243 6
+257 15
+260 4
+263 190
+266 140
+270 44
+274 9
+279 3
+280 12
+281 9
+291 3
+294 17
+296 6
+299 8
+303 6
+306 24
+308 32
+310 40
+311 16
+312 24
+313 24
+316 24
+321 6
+324 6
+334 6
+337 4
+341 44
+343 92
+347 44
+348 220
+349 132
+350 44
+353 138
+357 44
+365 4
+366 4
+367 4
+368 2
+370 4
+372 108
+373 52
+375 2
+376 7
+387 36
+fl=./elf/./elf/do-rel.h
+fn=_dl_relocate_object
+49 24
+50 24
+51 4
+53 48
+72 4
+73 6
+80 24
+83 166
+85 212
+87 117
+96 5
+97 128
+98 78
+114 21
+123 34
+124 12
+126 21
+129 20
+131 491
+133 595
+134 476
+135 357
+136 476
+138 357
+140 10
+143 2
+147 117
+150 368
+151 66
+163 18
+164 28
+165 6
+168 6
+170 16
+171 2
+172 4
+179 35
+181 42
+182 12
+184 18
+192 6
+195 18
+209 6
+fn=_dl_start
+49 6
+50 4
+51 1
+53 10
+56 4
+61 25
+63 21
+64 14
+65 7
+fl=./elf/./elf/get-dynamic-info.h
+fn=_dl_map_object_from_fd
+39 4
+43 2
+45 178
+49 86
+54 38
+55 25
+56 10
+58 2
+59 10
+62 8
+64 8
+68 43
+72 8
+84 7
+85 8
+86 8
+87 8
+88 7
+89 7
+90 7
+91 8
+103 16
+110 7
+115 2
+122 2
+123 6
+129 6
+130 3
+147 6
+152 2
+154 2
+156 2
+158 2
+162 6
+164 3
+165 2
+174 2
+175 3
+180 2
+184 4
+fl=./elf/./elf/libc_early_init.c
+fn=__libc_early_init
+33 7
+35 1
+38 2
+41 2
+47 4
+49 2
+fl=./elf/./elf/rtld.c
+fn=_dl_start
+84 8
+460 2
+478 3
+479 1
+480 2
+482 2
+491 1
+497 4
+499 2
+520 16
+546 1
+549 2
+550 1
+566 142
+568 1
+581 1
+588 9
+fn=dl_main
+84 17
+91 12
+92 2
+100 9
+196 2
+198 1
+223 3
+224 5
+225 4
+295 3
+301 1
+302 1
+843 1
+845 1
+854 1
+856 1
+861 1
+1121 1
+1122 1
+1124 4
+1127 2
+1129 2
+1131 1
+1132 1
+1147 1
+1150 46
+1151 117
+1155 3
+1160 3
+1161 9
+1170 2
+1171 2
+1173 1
+1180 3
+1198 2
+1206 8
+1207 16
+1208 8
+1209 1
+1211 12
+1212 14
+1216 8
+1217 8
+1218 4
+1219 10
+1220 1
+1224 12
+1252 2
+1256 4
+1258 1
+1262 26
+1263 61
+1269 17
+1275 4
+1278 3
+1280 3
+1282 3
+1357 13
+1362 1
+1368 1
+1384 3
+1385 4
+1654 10
+1656 2
+1657 3
+1658 2
+1659 3
+1663 2
+1664 2
+1689 3
+1690 5
+1691 3
+1692 3
+1695 1
+1697 1
+1698 1
+1700 2
+1701 2
+1705 2
+1707 2
+1715 4
+1722 2
+1725 2
+1750 4
+1752 1
+1757 2
+1760 4
+1761 4
+1762 3
+1763 1
+1764 1
+1765 1
+1776 2
+1777 2
+1779 2
+1781 1
+1782 3
+1787 5
+1788 5
+1790 4
+1796 2
+1805 2
+1809 1
+1810 2
+1834 2
+1840 1
+1841 1
+1842 1
+1846 3
+1851 3
+1853 1
+1855 2
+1859 5
+1864 2
+1880 6
+1956 4
+1960 19
+1961 1
+1964 3
+1965 1
+1966 2
+1967 2
+1972 3
+1982 9
+1988 19
+1989 12
+1992 3
+1993 2
+1994 1
+1996 14
+1997 6
+2001 2
+2007 1
+2009 3
+2010 2
+2012 2
+2014 3
+2016 3
+2017 1
+2018 2
+2030 2
+2031 2
+2032 3
+2043 5
+2044 4
+2045 4
+2055 2
+2056 2
+2057 4
+2059 2
+2064 3
+2263 3
+2267 3
+2273 2
+2276 2
+2284 2
+2295 3
+2298 1
+2303 1
+2304 21
+2306 8
+2311 8
+2313 10
+2315 1
+2316 1
+2319 4
+2321 8
+2322 24
+2326 10
+2327 3
+2336 2
+2340 4
+2342 1
+2349 4
+2352 2
+2362 3
+2364 2
+2374 1
+2379 3
+2382 1
+2388 1
+2389 5
+2397 3
+2404 1
+2408 3
+2413 3
+2414 1
+2415 1
+2416 1
+2420 1
+2425 8
+2558 2
+2560 1
+2564 6
+2566 18
+2568 2
+2570 107
+2571 19
+2579 13
+2600 2
+2607 4
+2609 2
+2610 1
+2655 3
+2656 5
+2658 2
+2659 2
+2660 1
+2697 4
+2722 2
+fn=handle_preload_list
+182 3
+812 1
+816 2
+817 1
+818 1
+820 1
+822 6
+823 3
+831 3
+875 11
+876 2
+880 7
+883 4
+884 3
+886 4
+887 2
+893 1
+895 2
+897 2
+898 1
+901 9
+fn=init_tls
+737 3
+739 2
+743 3
+749 2
+752 1
+753 1
+754 1
+758 1
+760 1
+764 2
+765 2
+766 12
+767 4
+768 8
+772 3
+774 1
+776 3
+779 2
+782 1
+789 2
+790 2
+796 2
+799 8
+802 1
+803 1
+806 5
+fn=map_doit
+644 3
+646 4
+647 6
+649 2
+fn=version_check_doit
+677 3
+679 6
+683 2
+fl=./elf/./get-dynamic-info.h
+fn=_dl_start
+45 83
+49 40
+54 17
+55 11
+56 5
+58 1
+59 5
+62 4
+64 4
+68 21
+72 3
+84 4
+85 4
+86 4
+87 4
+88 5
+89 4
+90 4
+91 4
+103 10
+110 3
+115 2
+122 1
+123 3
+129 3
+130 3
+133 2
+134 3
+139 3
+142 3
+fn=dl_main
+39 4
+43 1
+45 107
+49 52
+54 25
+55 17
+56 5
+58 1
+59 5
+62 4
+64 4
+68 26
+72 7
+84 3
+85 5
+86 5
+87 5
+88 4
+89 5
+90 5
+91 5
+103 9
+110 3
+115 2
+122 3
+123 4
+129 3
+147 4
+152 2
+154 2
+156 2
+158 2
+159 2
+162 4
+164 3
+165 2
+174 2
+175 3
+180 2
+181 3
+184 3
+fl=./elf/./setup-vdso.h
+fn=dl_main
+24 2
+fl=./io/../sysdeps/unix/sysv/linux/access.c
+fn=access
+25 1
+27 7
+31 1
+fl=./io/../sysdeps/unix/sysv/linux/close_nocancel.c
+fn=__GI___close_nocancel
+25 3
+26 12
+27 3
+fn=__close_nocancel
+25 1
+26 4
+27 1
+fl=./io/../sysdeps/unix/sysv/linux/fstat64.c
+fn=fstat
+29 12
+30 12
+35 18
+fl=./io/../sysdeps/unix/sysv/linux/fstatat64.c
+fn=fstatat
+99 32
+154 32
+168 18
+169 57
+170 7
+fl=./io/../sysdeps/unix/sysv/linux/open64.c
+fn=open
+30 10
+33 7
+41 9
+43 7
+fl=./io/../sysdeps/unix/sysv/linux/open64_nocancel.c
+fn=__open_nocancel
+28 46
+31 161
+39 221
+41 23
+fl=./io/../sysdeps/unix/sysv/linux/pread64_nocancel.c
+fn=__pread64_nocancel
+25 4
+26 8
+27 2
+fl=./io/../sysdeps/unix/sysv/linux/read.c
+fn=read
+25 18
+26 108
+27 18
+fl=./io/../sysdeps/unix/sysv/linux/read_nocancel.c
+fn=__read_nocancel
+25 2
+26 8
+27 2
+fl=./io/../sysdeps/unix/sysv/linux/stat64.c
+fn=stat
+28 20
+29 40
+fl=./io/../sysdeps/unix/sysv/linux/write.c
+fn=write
+25 2
+26 12
+27 2
+fl=./libio/../include/sys/sysmacros.h
+fn=_IO_file_doallocate
+47 12
+fl=./libio/./libio/filedoalloc.c
+fn=_IO_file_doallocate
+78 27
+84 27
+86 12
+89 4
+91 4
+94 4
+97 12
+101 9
+102 6
+104 12
+105 3
+106 24
+fl=./libio/./libio/fileops.c
+fn=_IO_do_write@@GLIBC_2.2.5
+423 20
+425 14
+426 16
+433 4
+440 8
+443 2
+448 10
+449 10
+451 6
+452 4
+453 2
+454 4
+455 8
+fn=_IO_file_close
+1164 1
+1167 2
+fn=_IO_file_close_it@@GLIBC_2.2.5
+128 4
+130 3
+133 2
+134 3
+137 1
+139 2
+142 7
+145 3
+153 5
+154 4
+157 2
+158 1
+159 1
+160 1
+162 2
+163 5
+fn=_IO_file_finish@@GLIBC_2.2.5
+168 5
+169 2
+175 3
+176 3
+fn=_IO_file_fopen@@GLIBC_2.2.5
+213 11
+214 2
+222 2
+224 6
+227 1
+228 1
+247 8
+280 5
+283 2
+286 4
+287 2
+356 12
+fn=_IO_file_open
+182 9
+184 2
+185 1
+188 2
+189 2
+191 1
+192 6
+195 3
+205 2
+206 1
+207 4
+fn=_IO_file_overflow@@GLIBC_2.2.5
+731 15
+732 12
+739 13
+742 2
+744 2
+745 6
+754 3
+763 3
+764 1
+765 1
+766 1
+767 1
+768 3
+770 3
+771 5
+772 2
+774 6
+775 2
+776 3
+777 4
+780 6
+781 6
+782 8
+783 4
+784 3
+786 2
+787 11
+fn=_IO_file_read
+1130 18
+1131 18
+1132 18
+1133 54
+fn=_IO_file_setbuf@@GLIBC_2.2.5
+381 6
+382 6
+385 2
+387 6
+389 4
+390 4
+fn=_IO_file_stat
+1146 3
+1147 6
+fn=_IO_file_sync@@GLIBC_2.2.5
+792 10
+797 8
+799 2
+800 4
+811 4
+815 8
+fn=_IO_file_underflow@@GLIBC_2.2.5
+461 162
+465 54
+466 2
+468 36
+474 54
+477 36
+480 6
+485 4
+489 39
+496 16
+498 40
+500 7
+505 36
+511 108
+516 90
+518 36
+521 7
+525 17
+531 1
+534 51
+536 34
+537 144
+fn=_IO_file_write@@GLIBC_2.2.5
+1173 14
+1174 4
+1175 12
+1179 6
+1180 6
+1181 4
+1186 2
+1187 2
+1189 6
+1190 6
+1193 12
+fn=_IO_file_xsputn@@GLIBC_2.2.5
+1197 8
+1203 4
+1204 1
+1210 4
+1212 4
+1213 2
+1216 35
+1218 48
+1228 1
+1233 1
+1235 5
+1236 1
+1237 2
+1239 2
+1267 9
+fn=_IO_new_file_init_internal
+106 3
+110 1
+111 1
+113 1
+114 1
+115 2
+fl=./libio/./libio/genops.c
+fn=_IO_cleanup
+786 6
+787 11
+790 12
+799 9
+801 6
+807 26
+815 10
+817 4
+819 8
+820 2
+824 10
+826 6
+830 10
+831 16
+838 3
+842 9
+843 2
+863 11
+866 3
+878 12
+fn=_IO_default_finish
+54 2
+601 3
+603 3
+609 3
+612 3
+624 2
+fn=_IO_default_setbuf
+330 10
+332 2
+333 2
+337 4
+452 18
+453 10
+455 8
+457 6
+458 4
+462 2
+466 8
+467 2
+468 12
+fn=_IO_default_uflow
+361 90
+362 54
+363 36
+365 68
+366 72
+fn=_IO_doallocbuf
+343 12
+344 6
+346 12
+347 15
+350 12
+fn=_IO_flush_all_lockp
+686 12
+687 1
+691 6
+692 12
+695 12
+697 3
+698 3
+701 21
+709 6
+711 3
+715 9
+716 2
+720 12
+fn=_IO_link_in
+87 20
+88 8
+90 2
+92 7
+93 12
+94 1
+95 13
+97 2
+98 1
+100 13
+101 1
+102 7
+103 2
+106 21
+fn=_IO_no_init
+563 10
+564 1
+565 1
+566 2
+568 1
+572 6
+579 1
+581 1
+587 1
+588 6
+fn=_IO_old_init
+531 1
+532 2
+534 6
+539 7
+544 1
+550 2
+555 3
+556 2
+558 1
+fn=_IO_setb
+329 40
+330 18
+331 1
+332 4
+333 4
+335 26
+338 24
+fn=_IO_switch_to_get_mode
+164 90
+165 54
+168 54
+172 36
+173 36
+176 18
+178 36
+180 36
+181 18
+182 72
+fn=_IO_un_link
+53 2
+54 4
+82 2
+fn=_IO_unsave_markers
+960 3
+962 2
+967 3
+969 2
+fn=__GI__IO_un_link.part.0
+52 9
+58 6
+59 12
+60 1
+61 14
+63 3
+65 2
+66 2
+74 3
+76 13
+77 1
+78 9
+79 2
+82 9
+fn=__overflow
+199 6
+201 6
+202 1
+203 6
+204 4
+fn=__uflow
+299 90
+300 89
+305 54
+308 72
+310 36
+316 36
+321 54
+323 72
+324 54
+fl=./libio/./libio/getc.c
+fn=getc
+34 326210
+37 130484
+38 391418
+43 260951
+fl=./libio/./libio/iofclose.c
+fn=fclose@@GLIBC_2.2.5
+34 5
+48 3
+49 1
+51 15
+52 3
+53 3
+57 4
+58 3
+71 2
+76 5
+fl=./libio/./libio/iofgets.c
+fn=fgets
+32 6
+37 4
+39 3
+47 14
+51 3
+52 2
+53 7
+56 5
+57 1
+60 1
+63 4
+66 7
+fl=./libio/./libio/iofopen.c
+fn=fopen@@GLIBC_2.2.5
+37 2
+65 2
+67 3
+70 2
+72 7
+73 2
+74 2
+75 7
+85 8
+87 7
+fl=./libio/./libio/iogetline.c
+fn=_IO_getline
+33 1
+34 2
+fn=_IO_getline_info
+49 14
+51 2
+53 3
+54 2
+55 6
+57 6
+58 4
+60 2
+61 2
+67 2
+77 2
+78 1
+83 1
+85 6
+86 2
+88 3
+89 2
+90 3
+92 1
+94 2
+96 5
+97 2
+98 1
+107 8
+fl=./libio/./libio/ioputs.c
+fn=puts
+33 8
+35 2
+36 15
+38 2
+39 4
+40 6
+41 8
+42 4
+46 7
+fl=./libio/./libio/libioP.h
+fn=_IO_cleanup
+940 4
+942 4
+943 4
+fn=_IO_default_setbuf
+940 6
+942 4
+943 4
+fn=_IO_default_uflow
+940 54
+942 36
+943 36
+fn=_IO_do_write@@GLIBC_2.2.5
+940 8
+942 4
+943 4
+fn=_IO_doallocbuf
+940 9
+942 6
+943 6
+fn=_IO_file_close_it@@GLIBC_2.2.5
+940 3
+942 2
+943 2
+fn=_IO_file_doallocate
+940 9
+942 6
+943 6
+fn=_IO_file_underflow@@GLIBC_2.2.5
+883 2
+884 9
+940 40
+942 74
+943 38
+fn=__overflow
+940 6
+942 4
+943 4
+fn=__uflow
+940 54
+942 36
+943 36
+fn=fclose@@GLIBC_2.2.5
+855 4
+856 4
+862 2
+883 5
+884 7
+940 3
+942 2
+943 2
+fn=fgets
+883 3
+884 9
+fn=putchar
+883 2
+884 10
+fn=puts
+883 2
+884 10
+940 3
+942 2
+943 2
+fl=./libio/./libio/putchar.c
+fn=putchar
+25 7
+27 15
+28 7
+31 6
+fl=./libio/./libio/vtables.c
+fn=check_stdfiles_vtables
+83 1
+84 4
+85 3
+86 3
+88 1
+fl=./malloc/./malloc/arena.c
+fn=free
+162 15
+fn=malloc
+162 22026
+315 1
+fn=ptmalloc_init.part.0
+313 6
+318 1
+343 2
+347 3
+352 4
+353 4
+354 4
+355 4
+356 4
+357 4
+358 4
+360 4
+361 4
+362 4
+365 4
+366 4
+367 2
+430 7
+fl=./malloc/./malloc/malloc.c
+fn=_int_free
+2006 3
+3175 8
+3177 16
+3178 4
+3179 8
+4417 55
+4427 15
+4433 25
+4434 10
+4438 20
+4445 10
+4446 35
+4449 4
+4455 12
+4475 20
+4478 4
+4489 2
+4565 2
+4571 5
+4574 1
+4578 3
+4581 4
+4583 3
+4586 3
+4589 2
+4590 2
+4591 2
+4597 2
+4606 2
+4611 2
+4615 1
+4623 1
+4624 4
+4625 2
+4627 1
+4629 2
+4631 2
+4634 1
+4635 1
+4637 3
+4638 2
+4668 2
+4688 3
+4698 55
+fn=_int_malloc
+1338 22028
+1357 44056
+1999 33042
+3766 99126
+3807 22028
+3834 26270
+3836 57570
+3839 12423
+3841 55051
+3897 10
+3899 6
+3900 22022
+3902 33033
+3959 20
+3960 9
+3978 11014
+3979 33042
+3980 55068
+3984 11014
+3989 4
+3990 44076
+3992 4
+3993 8
+3994 4
+3996 8
+3997 12
+3999 4
+4000 16
+4002 16
+4004 8
+4005 8
+4007 8
+4018 16
+4019 4
+4020 2
+4021 1
+4024 1
+4025 1
+4026 1
+4027 3
+4028 3
+4029 2
+4031 2
+4035 8
+4037 3
+4038 1
+4041 1
+4049 3
+4050 3
+4054 6
+4083 6
+4091 21
+4092 12
+4093 3
+4096 6
+4140 15
+4143 18
+4144 3
+4145 3
+4146 3
+4147 3
+4152 6
+4153 6
+4162 9
+4168 6
+4179 22026
+4181 9
+4184 9
+4252 11013
+4253 22026
+4254 22026
+4255 22026
+4256 33039
+4261 22026
+4265 88090
+4268 121112
+4270 12
+4271 6
+4275 42
+4277 18
+4279 36
+4283 3
+4286 6
+4295 6
+4298 6
+4300 9
+4303 6
+4306 9
+4316 3
+4321 12
+4322 6
+4325 3
+4326 3
+4327 3
+4330 6
+4331 1
+4332 6
+4334 6
+4337 24
+4339 9
+4340 3
+4343 3
+4365 11010
+4366 22020
+4368 22020
+4371 33033
+4373 11007
+4374 11007
+4375 11007
+4376 88056
+4378 22014
+4381 11007
+4388 9
+4403 12
+4404 6
+4409 99126
+fn=free
+3346 20
+3350 10
+3358 10
+3360 5
+3362 15
+3379 15
+3385 10
+3388 5
+3389 20
+fn=malloc
+1338 22026
+1357 53651
+3235 4
+3281 66078
+3288 22026
+3298 19198
+3300 44052
+3303 22026
+3304 4
+3305 33036
+3313 22026
+3315 44052
+3316 55065
+3341 55065
+fn=ptmalloc_init.part.0
+1960 385
+1963 381
+1971 1
+1972 1
+1974 1
+3156 7
+fn=sysmalloc
+2022 16
+2542 27
+2562 12
+2563 3
+2573 6
+2574 6
+2600 3
+2601 9
+2602 3
+2604 6
+2611 29
+2617 9
+2620 9
+2681 6
+2690 12
+2703 9
+2707 3
+2711 21
+2719 6
+2721 15
+2722 15
+2724 3
+2727 6
+2758 6
+2759 2
+2760 15
+2766 10
+2767 8
+2769 4
+2809 3
+2831 2
+2832 7
+2834 2
+2835 5
+2847 4
+2889 1
+2890 4
+2891 5
+2902 2
+2935 6
+2936 3
+2940 2
+2941 6
+2944 6
+2946 3
+2947 3
+2948 3
+2949 24
+2950 6
+2952 3
+2958 33
+fn=tcache_init.part.0
+3229 3
+3238 8
+3239 4
+3240 2
+3248 4
+3255 2
+3257 2
+3258 88
+3261 4
+fn=unlink_chunk.constprop.0
+1620 3
+1622 15
+1625 6
+1626 6
+1628 12
+1631 3
+1632 3
+1633 15
+1635 6
+1636 9
+1639 6
+1653 3
+1654 3
+1657 6
+fl=./malloc/./malloc/morecore.c
+fn=__glibc_morecore
+25 8
+26 8
+29 4
+30 8
+34 8
+fl=./malloc/./malloc/scratch_buffer_set_array_size.c
+fn=__libc_scratch_buffer_set_array_size
+30 20
+34 2
+35 4
+45 4
+46 2
+63 8
+fl=./misc/../sysdeps/unix/syscall-template.S
+fn=mprotect
+117 20
+122 4
+fn=munmap
+117 5
+122 1
+fl=./misc/../sysdeps/unix/sysv/linux/brk.c
+fn=brk
+36 4
+37 8
+38 8
+44 4
+45 4
+fl=./misc/../sysdeps/unix/sysv/linux/brk_call.h
+fn=brk
+24 8
+fl=./misc/../sysdeps/unix/sysv/linux/mmap64.c
+fn=mmap
+47 24
+50 24
+58 48
+60 12
+fl=./misc/./misc/init-misc.c
+fn=__init_misc
+30 5
+31 5
+33 3
+37 5
+38 3
+40 4
+fl=./misc/./misc/sbrk.c
+fn=sbrk
+37 20
+40 8
+43 8
+58 8
+59 4
+62 8
+63 1
+66 12
+74 9
+78 20
+fl=./nptl/../sysdeps/unix/sysv/linux/x86/elision-conf.c
+fn=__lll_elision_init
+96 6
+101 5
+103 4
+105 4
+107 4
+109 4
+113 3
+114 1
+115 6
+fl=./nptl/./nptl/libc-cleanup.c
+fn=__libc_cleanup_pop_restore
+54 4
+55 4
+57 8
+59 4
+60 12
+71 4
+fn=__libc_cleanup_push_defer
+24 4
+25 4
+27 8
+29 8
+32 8
+46 12
+48 4
+49 4
+fl=./nptl/./nptl/pthread_mutex_conf.c
+fn=__pthread_tunables_init
+50 6
+51 5
+53 4
+55 6
+fl=./nptl/./nptl/pthread_mutex_lock.c
+fn=pthread_mutex_lock@@GLIBC_2.2.5
+44 1
+45 7
+46 1
+77 3
+80 3
+82 1
+84 2
+88 2
+97 2
+108 4
+112 1
+115 2
+124 1
+130 3
+131 2
+179 1
+182 1
+184 1
+187 1
+190 3
+fl=./nptl/./nptl/pthread_mutex_unlock.c
+fn=pthread_mutex_unlock@@GLIBC_2.2.5
+39 1
+40 4
+41 2
+51 3
+52 2
+57 2
+62 1
+65 1
+70 1
+72 1
+74 2
+80 4
+84 3
+87 3
+367 2
+369 2
+fl=./posix/../malloc/dynarray-skeleton.c
+fn=__unregister_atfork
+243 2
+fl=./posix/../sysdeps/unix/sysv/linux/_exit.c
+fn=_Exit
+27 2
+30 3
+31 2
+fl=./posix/./posix/register-atfork.c
+fn=__unregister_atfork
+71 4
+82 6
+83 8
+109 8
+110 4
+fl=./resource/../sysdeps/unix/sysv/linux/getrlimit64.c
+fn=getrlimit
+38 2
+39 7
+40 1
+fl=./setjmp/../sysdeps/x86_64/bsd-_setjmp.S
+fn=_setjmp
+28 1
+30 1
+32 1
+fl=./setjmp/../sysdeps/x86_64/setjmp.S
+fn=__sigsetjmp
+30 4
+32 4
+41 4
+42 8
+43 4
+47 4
+48 4
+49 4
+50 4
+51 4
+53 8
+55 4
+56 4
+57 4
+59 8
+61 4
+66 1
+67 1
+69 3
+72 3
+73 3
+80 3
+81 3
+84 1
+fl=./setjmp/./setjmp/sigjmp.c
+fn=__sigjmp_save
+28 3
+29 1
+30 2
+34 3
+fl=./stdlib/../sysdeps/unix/sysv/linux/getrandom.c
+fn=getrandom
+28 1
+29 6
+30 1
+fl=./stdlib/./stdlib/cxa_atexit.c
+fn=__cxa_atexit
+41 2
+43 8
+44 2
+46 2
+53 3
+55 1
+56 1
+57 1
+58 1
+59 4
+60 1
+69 6
+71 6
+fn=__new_exitfn
+82 5
+83 1
+88 2
+93 9
+95 4
+103 1
+124 1
+125 1
+136 1
+138 1
+139 1
+143 5
+fl=./stdlib/./stdlib/cxa_finalize.c
+fn=__cxa_finalize
+30 18
+33 12
+36 12
+40 12
+94 14
+98 12
+105 4
+106 4
+107 8
+108 16
+fl=./stdlib/./stdlib/cxa_thread_atexit_impl.c
+fn=__call_tls_dtors
+149 4
+150 4
+168 4
+fl=./stdlib/./stdlib/exit.c
+fn=__run_exit_handlers
+40 11
+45 2
+46 2
+48 5
+56 1
+58 3
+62 1
+66 6
+68 2
+71 7
+98 1
+105 2
+106 1
+107 1
+109 2
+112 4
+113 3
+114 4
+124 2
+125 2
+131 4
+133 2
+134 9
+136 2
+fn=exit
+142 4
+143 4
+fl=./string/../include/rtld-malloc.h
+fn=strdup
+56 6
+fl=./string/../string/strcspn.c
+fn=strcspn
+32 4
+33 3
+34 2
+39 6
+40 4
+41 4
+42 4
+44 1
+47 9
+48 6
+51 4
+52 4
+53 4
+54 4
+56 3
+61 17
+62 34
+63 34
+64 34
+65 17
+67 102
+69 1
+70 4
+71 3
+fl=./string/../string/strstr.c
+fn=__GI_strstr
+77 11
+82 3
+84 5
+85 2
+129 2
+161 12
+fl=./string/../sysdeps/x86/cacheinfo.c
+fn=__x86_cacheinfo
+86 3
+fl=./string/../sysdeps/x86/cacheinfo.h
+fn=__x86_cacheinfo
+61 1
+64 3
+66 3
+67 1
+73 3
+75 3
+76 1
+80 2
+82 2
+83 2
+84 2
+86 2
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/memcmp-sse2.S
+fn=bcmp
+59 1
+71 1
+72 1
+94 1
+95 1
+140 1
+141 1
+143 1
+144 1
+146 1
+147 1
+149 1
+150 1
+197 1
+198 1
+199 1
+200 1
+201 1
+202 1
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/memset-vec-unaligned-erms.S
+fn=memset
+140 7
+141 35
+146 7
+147 7
+150 6
+151 6
+189 4
+190 4
+192 4
+252 6
+253 6
+265 6
+268 6
+269 6
+281 2
+282 2
+295 2
+296 2
+298 2
+301 1
+304 2
+307 29
+308 29
+309 29
+310 29
+311 29
+312 29
+313 29
+316 2
+317 2
+318 2
+319 2
+324 2
+358 1
+360 1
+361 1
+400 1
+401 1
+403 1
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/strchr-sse2.S
+fn=index
+32 15
+33 15
+34 15
+35 15
+36 15
+37 15
+38 15
+39 15
+40 15
+41 15
+42 15
+43 15
+44 15
+45 15
+46 15
+47 15
+48 15
+49 15
+50 12
+54 12
+55 12
+56 12
+57 12
+59 12
+63 3
+64 3
+65 3
+66 3
+67 3
+68 3
+69 3
+70 3
+71 3
+72 3
+73 3
+74 3
+75 3
+76 3
+77 3
+78 3
+79 3
+80 3
+81 3
+82 3
+83 3
+84 3
+85 3
+86 5
+91 1
+92 1
+95 1
+96 1
+97 1
+98 1
+99 1
+100 1
+101 1
+102 1
+103 1
+104 1
+105 1
+106 1
+107 1
+108 1
+109 1
+110 1
+111 1
+112 1
+114 1
+115 1
+117 1
+118 1
+119 1
+120 1
+121 1
+122 1
+123 1
+124 1
+126 1
+127 1
+128 1
+129 1
+130 1
+131 1
+132 1
+133 1
+134 1
+135 1
+138 3
+142 3
+143 3
+144 3
+145 3
+147 3
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/strcmp-sse2.S
+fn=strcmp
+98 189
+131 189
+132 189
+134 189
+135 189
+156 189
+157 189
+158 109
+159 109
+160 78
+161 78
+162 78
+163 78
+180 78
+181 78
+182 78
+183 78
+184 78
+185 78
+186 78
+191 8
+192 24
+201 119
+202 119
+203 119
+204 119
+205 119
+206 119
+207 119
+208 119
+209 59
+210 36
+211 36
+212 36
+214 59
+215 59
+216 59
+217 59
+218 59
+219 59
+229 60
+230 60
+231 60
+233 60
+239 60
+240 60
+241 60
+242 60
+243 60
+248 60
+250 58
+251 58
+252 116
+260 58
+261 58
+264 58
+265 58
+266 58
+267 58
+268 58
+269 58
+300 2
+301 2
+302 2
+303 2
+304 2
+306 2
+307 2
+308 2
+309 2
+310 2
+311 2
+312 2
+313 2
+316 2
+317 2
+318 2
+324 2
+325 2
+326 4
+330 2
+331 2
+334 2
+335 2
+336 2
+338 2
+339 2
+340 2
+344 2
+345 2
+346 2
+347 2
+348 2
+349 2
+424 1
+425 1
+426 1
+427 1
+428 1
+430 1
+431 1
+432 1
+433 1
+434 1
+435 1
+436 1
+437 1
+440 1
+441 1
+442 1
+448 1
+449 1
+450 2
+454 1
+455 1
+458 1
+459 1
+460 1
+462 1
+463 1
+464 1
+468 1
+469 1
+470 1
+471 1
+472 1
+473 1
+661 1
+662 1
+663 1
+664 1
+665 1
+667 1
+668 1
+669 1
+670 1
+671 1
+672 1
+673 1
+674 1
+678 1
+679 1
+680 1
+686 1
+687 1
+688 2
+692 1
+693 1
+696 1
+697 1
+698 1
+700 1
+701 1
+702 1
+706 1
+707 1
+708 1
+709 1
+710 1
+711 1
+780 2
+781 2
+782 2
+783 2
+784 2
+786 2
+787 2
+788 2
+789 2
+790 2
+791 2
+792 2
+793 2
+797 2
+798 2
+799 2
+805 2
+806 2
+807 4
+811 2
+812 2
+815 2
+816 2
+817 2
+819 2
+820 2
+821 2
+825 2
+826 2
+827 2
+828 2
+829 2
+830 2
+837 1
+838 1
+840 1
+841 1
+843 1
+844 1
+845 1
+847 1
+848 1
+849 1
+853 1
+854 1
+855 1
+856 1
+857 1
+858 1
+899 1
+900 1
+901 1
+902 1
+903 1
+905 1
+906 1
+907 1
+908 1
+909 1
+910 1
+911 1
+1137 4
+1138 4
+1139 4
+1140 4
+1141 4
+1143 4
+1144 4
+1145 4
+1146 4
+1147 4
+1148 4
+1149 4
+1256 4
+1257 4
+1258 4
+1259 4
+1260 4
+1262 4
+1263 4
+1264 4
+1265 4
+1266 4
+1267 4
+1268 4
+1269 2
+1273 2
+1274 2
+1275 2
+1281 2
+1282 2
+1283 4
+1287 2
+1288 2
+1291 2
+1292 2
+1293 2
+1295 2
+1296 2
+1297 2
+1301 2
+1302 2
+1303 2
+1304 2
+1305 2
+1306 2
+1375 2
+1376 2
+1377 2
+1378 2
+1379 2
+1381 2
+1382 2
+1383 2
+1384 2
+1385 2
+1386 2
+1387 2
+1388 2
+1392 2
+1393 2
+1394 2
+1400 2
+1401 2
+1402 4
+1406 2
+1407 2
+1410 2
+1411 2
+1412 2
+1414 2
+1415 2
+1416 2
+1420 2
+1421 2
+1422 2
+1423 2
+1424 2
+1425 2
+1432 1
+1433 1
+1435 1
+1436 1
+1438 1
+1439 1
+1440 1
+1442 1
+1443 1
+1444 1
+1448 1
+1449 1
+1450 1
+1451 1
+1452 1
+1453 1
+1494 3
+1495 3
+1496 3
+1497 3
+1498 3
+1500 3
+1501 3
+1502 3
+1503 3
+1504 3
+1505 3
+1506 3
+1507 1
+1511 1
+1512 1
+1513 1
+1519 1
+1520 1
+1521 2
+1525 1
+1526 1
+1529 1
+1530 1
+1531 1
+1533 1
+1534 1
+1535 1
+1539 1
+1540 1
+1541 1
+1542 1
+1543 1
+1544 1
+1613 23
+1614 23
+1615 23
+1616 23
+1617 23
+1619 23
+1620 23
+1621 23
+1622 23
+1623 23
+1624 23
+1625 23
+1626 21
+1630 21
+1631 21
+1632 21
+1638 21
+1639 21
+1640 42
+1644 21
+1645 21
+1648 21
+1649 21
+1650 21
+1652 21
+1653 21
+1654 21
+1658 21
+1659 21
+1660 21
+1661 21
+1662 21
+1663 21
+1732 8
+1733 8
+1734 8
+1735 8
+1736 8
+1738 8
+1739 8
+1740 8
+1741 8
+1742 8
+1743 8
+1744 8
+1745 8
+1749 8
+1750 8
+1751 8
+1757 8
+1758 8
+1759 16
+1763 8
+1764 8
+1767 8
+1768 8
+1769 8
+1771 8
+1772 8
+1773 8
+1777 8
+1778 8
+1779 8
+1780 8
+1781 8
+1782 8
+1851 5
+1852 5
+1853 5
+1854 5
+1855 5
+1857 5
+1858 5
+1859 5
+1860 5
+1861 5
+1862 5
+1863 5
+1864 4
+1868 4
+1869 4
+1870 4
+1876 4
+1877 4
+1878 8
+1882 4
+1883 4
+1886 4
+1887 4
+1888 4
+1890 4
+1891 4
+1892 4
+1896 4
+1897 4
+1898 4
+1899 4
+1900 4
+1901 4
+1970 3
+1971 3
+1972 3
+1973 3
+1974 3
+1976 3
+1977 3
+1978 3
+1979 3
+1980 3
+1981 3
+1982 3
+2093 102
+2095 119
+2096 119
+2097 119
+2098 119
+2099 72
+2104 189
+2110 189
+2111 189
+2119 189
+2120 189
+fn=strncmp
+98 1
+125 1
+126 1
+127 1
+128 1
+129 1
+131 1
+132 1
+134 1
+135 1
+156 1
+157 1
+158 1
+159 1
+160 1
+161 1
+162 1
+163 1
+180 1
+181 1
+182 1
+183 1
+184 1
+185 1
+186 1
+2104 1
+2107 1
+2108 1
+2110 1
+2111 1
+2119 1
+2120 1
+fl=./string/../sysdeps/x86_64/multiarch/../multiarch/strlen-sse2.S
+fn=strlen
+57 16
+101 16
+102 16
+103 16
+104 16
+105 16
+106 16
+107 16
+109 16
+111 16
+142 16
+143 16
+144 16
+145 16
+146 16
+147 7
+149 7
+153 9
+154 9
+155 9
+156 9
+157 9
+158 9
+159 9
+160 9
+161 9
+162 9
+163 9
+164 9
+169 66
+243 3
+244 3
+245 3
+246 3
+247 3
+248 3
+249 3
+250 3
+266 3
+268 3
+269 42
+271 3
+272 3
+273 3
+275 3
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-avx2.h
+fn=memrchr
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fn=rindex
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fn=strchrnul
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fn=strlen
+38 4
+40 10
+41 4
+42 6
+45 4
+46 4
+49 6
+52 12
+fn=strnlen
+38 4
+40 10
+41 4
+42 6
+45 4
+46 4
+49 6
+52 12
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-evex.h
+fn=memchr
+37 2
+38 5
+39 3
+42 2
+43 2
+45 2
+51 3
+54 4
+fn=rawmemchr
+37 2
+38 5
+39 3
+42 2
+43 2
+45 2
+51 3
+54 4
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-memcmp.h
+fn=bcmp
+34 5
+35 2
+36 2
+37 3
+40 2
+41 2
+44 3
+47 4
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-memmove.h
+fn=memcpy@@GLIBC_2.14
+56 2
+57 3
+60 3
+61 2
+74 2
+77 2
+85 2
+93 2
+96 5
+fn=memmove
+56 2
+57 3
+60 3
+61 2
+74 2
+77 2
+85 2
+93 2
+96 5
+fn=mempcpy
+56 2
+57 3
+60 3
+61 2
+74 2
+77 2
+85 2
+93 2
+96 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-memset.h
+fn=memset
+54 5
+57 3
+58 2
+64 2
+73 2
+75 2
+85 2
+93 2
+96 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-sse4_2.h
+fn=strcspn
+36 5
+fn=strpbrk
+36 5
+fn=strspn
+36 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
+fn=strcasecmp
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fn=strcasecmp_l
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fn=strncasecmp
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fn=strncasecmp_l
+36 2
+37 4
+38 3
+41 2
+42 2
+45 3
+48 5
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-strcpy.h
+fn=stpcpy
+38 4
+39 3
+42 2
+43 2
+46 3
+49 3
+fn=strcat
+38 4
+39 3
+42 2
+43 2
+46 3
+49 3
+fn=strcpy
+38 8
+39 6
+42 4
+43 4
+46 6
+49 6
+fl=./string/../sysdeps/x86_64/multiarch/ifunc-strncpy.h
+fn=stpncpy
+35 5
+36 3
+39 2
+40 2
+43 3
+46 4
+fn=strncpy
+35 5
+36 3
+39 2
+40 2
+43 3
+46 4
+fl=./string/../sysdeps/x86_64/multiarch/memchr-avx2.S
+fn=__memchr_avx2
+61 1
+68 1
+70 1
+73 1
+74 1
+76 1
+77 1
+78 1
+79 1
+82 1
+83 1
+86 1
+87 1
+101 1
+106 1
+113 1
+114 1
+115 1
+116 1
+fl=./string/../sysdeps/x86_64/multiarch/memchr.c
+fn=memchr
+29 2
+fl=./string/../sysdeps/x86_64/multiarch/memcmp.c
+fn=bcmp
+29 4
+fl=./string/../sysdeps/x86_64/multiarch/memcpy.c
+fn=memcpy@@GLIBC_2.14
+29 1
+fl=./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
+fn=__memcpy_avx_unaligned_erms
+263 1
+264 1
+270 2
+271 2
+313 2
+314 2
+316 1
+317 1
+319 1
+323 1
+324 1
+325 1
+326 1
+327 1
+331 1
+333 1
+350 1
+351 1
+352 1
+353 1
+355 1
+fn=__mempcpy_avx_unaligned_erms
+250 1
+251 1
+252 1
+253 1
+fn=memcpy
+218 18
+219 18
+225 72
+226 72
+228 23
+229 23
+230 23
+232 19
+233 19
+234 19
+236 19
+316 49
+317 49
+319 14
+323 14
+324 7
+325 7
+326 3
+327 3
+328 2
+329 2
+331 3
+333 7
+339 7
+340 7
+341 7
+342 7
+343 7
+372 35
+373 35
+374 35
+375 35
+376 35
+400 4
+401 4
+403 4
+404 4
+405 4
+407 4
+408 4
+409 4
+410 4
+411 4
+412 4
+413 4
+414 4
+415 4
+416 4
+417 4
+418 4
+419 4
+420 4
+421 4
+fn=mempcpy
+205 54
+206 54
+207 54
+208 54
+fl=./string/../sysdeps/x86_64/multiarch/memmove.c
+fn=memmove
+29 1
+fl=./string/../sysdeps/x86_64/multiarch/mempcpy.c
+fn=mempcpy
+33 1
+fl=./string/../sysdeps/x86_64/multiarch/memrchr.c
+fn=memrchr
+29 3
+fl=./string/../sysdeps/x86_64/multiarch/memset.c
+fn=memset
+29 1
+fl=./string/../sysdeps/x86_64/multiarch/rawmemchr.c
+fn=rawmemchr
+31 2
+fl=./string/../sysdeps/x86_64/multiarch/stpcpy.c
+fn=stpcpy
+33 3
+fl=./string/../sysdeps/x86_64/multiarch/stpncpy.c
+fn=stpncpy
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcasecmp.c
+fn=strcasecmp
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcasecmp_l.c
+fn=strcasecmp_l
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcat.c
+fn=strcat
+29 3
+fl=./string/../sysdeps/x86_64/multiarch/strchr-avx2.S
+fn=__strchr_avx2
+53 1
+55 1
+56 1
+57 1
+58 1
+59 1
+62 1
+63 1
+67 1
+68 1
+69 1
+70 1
+71 1
+72 1
+73 1
+74 1
+77 1
+85 1
+93 1
+94 2
+fl=./string/../sysdeps/x86_64/multiarch/strchr.c
+fn=index
+42 2
+43 4
+44 3
+47 2
+48 2
+51 3
+54 3
+65 3
+fl=./string/../sysdeps/x86_64/multiarch/strchrnul.c
+fn=strchrnul
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S
+fn=__strcmp_avx2
+206 24179
+249 24179
+274 24179
+275 24179
+276 24179
+278 24179
+279 24179
+283 23917
+287 23917
+289 23917
+291 23917
+294 23917
+303 23917
+304 23917
+308 23917
+318 23917
+319 23917
+322 23917
+326 47834
+953 24179
+954 24179
+957 1799
+958 1799
+959 1799
+960 1799
+962 1799
+965 1799
+966 1799
+970 525
+971 525
+980 38
+984 38
+985 70
+993 234
+994 234
+995 234
+996 234
+997 234
+998 234
+1000 234
+1035 262
+1036 262
+1047 262
+1048 262
+1051 262
+1052 262
+1053 262
+1056 524
+1076 1274
+1077 1274
+1080 224
+1081 224
+1082 224
+1083 224
+1089 224
+1091 224
+1094 224
+1095 224
+1100 28
+1101 28
+1103 28
+1104 28
+1105 28
+1106 28
+1107 28
+1108 28
+1109 28
+fl=./string/../sysdeps/x86_64/multiarch/strcmp.c
+fn=strcmp
+47 4
+48 8
+49 6
+52 4
+53 4
+56 6
+59 6
+79 6
+fl=./string/../sysdeps/x86_64/multiarch/strcpy-avx2.S
+fn=__strcpy_avx2
+56 1668
+62 1668
+64 1668
+69 1668
+71 1668
+72 1668
+73 1668
+75 637
+76 637
+78 637
+79 637
+80 637
+94 637
+95 637
+307 1031
+308 1031
+309 1031
+310 1031
+320 1031
+321 1031
+352 637
+354 1668
+356 1668
+357 1668
+358 1668
+359 1668
+360 1665
+361 1665
+362 1414
+363 1414
+364 828
+365 828
+366 718
+367 718
+368 656
+552 656
+553 656
+562 1312
+566 62
+567 62
+568 62
+577 124
+581 110
+582 110
+591 220
+595 586
+596 586
+597 586
+598 586
+608 1172
+612 251
+613 251
+614 251
+615 251
+625 502
+629 3
+630 3
+631 3
+632 3
+642 6
+fl=./string/../sysdeps/x86_64/multiarch/strcpy.c
+fn=strcpy
+29 6
+fl=./string/../sysdeps/x86_64/multiarch/strcspn.c
+fn=strcspn
+29 2
+fl=./string/../sysdeps/x86_64/multiarch/strlen-avx2.S
+fn=__strlen_avx2
+52 1669
+65 1669
+66 1669
+67 1669
+70 1669
+72 1669
+73 1669
+76 1669
+77 1669
+85 1669
+86 1669
+87 1669
+92 3338
+fl=./string/../sysdeps/x86_64/multiarch/strlen.c
+fn=strlen
+29 6
+fl=./string/../sysdeps/x86_64/multiarch/strncase.c
+fn=strncasecmp
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strncase_l.c
+fn=strncasecmp_l
+31 3
+fl=./string/../sysdeps/x86_64/multiarch/strncmp.c
+fn=strncmp
+43 2
+44 4
+45 3
+48 2
+49 2
+52 3
+55 5
+67 3
+fl=./string/../sysdeps/x86_64/multiarch/strncpy.c
+fn=strncpy
+29 3
+fl=./string/../sysdeps/x86_64/multiarch/strnlen.c
+fn=strnlen
+31 6
+fl=./string/../sysdeps/x86_64/multiarch/strpbrk.c
+fn=strpbrk
+29 2
+fl=./string/../sysdeps/x86_64/multiarch/strrchr-avx2.S
+fn=__strrchr_avx2
+53 1
+54 1
+55 1
+57 1
+58 1
+62 1
+63 1
+64 1
+67 1
+69 1
+70 1
+71 1
+72 1
+75 1
+76 1
+78 1
+79 1
+80 1
+81 1
+82 1
+91 2
+fl=./string/../sysdeps/x86_64/multiarch/strrchr.c
+fn=rindex
+28 3
+fl=./string/../sysdeps/x86_64/multiarch/strspn.c
+fn=strspn
+29 2
+fl=./string/./string/strdup.c
+fn=strdup
+40 15
+41 6
+44 6
+47 12
+48 9
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-avx2.h
+fn=wcschr
+38 4
+40 10
+41 4
+42 6
+45 4
+46 4
+49 6
+52 12
+fn=wcscmp
+38 2
+40 5
+41 2
+42 3
+45 2
+46 2
+49 3
+52 6
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-evex.h
+fn=wmemchr
+37 4
+38 10
+39 6
+42 4
+43 4
+45 4
+51 6
+54 8
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-memcmp.h
+fn=wmemcmp
+34 5
+35 2
+36 2
+37 3
+40 2
+41 2
+44 3
+47 4
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-wcslen.h
+fn=wcslen
+40 2
+41 4
+42 3
+45 2
+46 2
+49 3
+52 3
+fn=wcsnlen
+40 2
+41 4
+42 3
+45 2
+46 2
+49 3
+52 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/ifunc-wmemset.h
+fn=wmemset
+36 10
+37 6
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcschr.c
+fn=wcschr
+31 6
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcscmp.c
+fn=wcscmp
+30 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcscpy.c
+fn=wcscpy
+38 5
+44 2
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcslen.c
+fn=wcslen
+29 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wcsnlen.c
+fn=wcsnlen
+30 3
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wmemchr.c
+fn=wmemchr
+31 4
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wmemcmp.c
+fn=wmemcmp
+29 4
+fl=./wcsmbs/../sysdeps/x86_64/multiarch/wmemset.c
+fn=wmemset
+31 4
+fl=/home/njn/grind/ws1/cachegrind/docs/concord.c
+fn=add_existing
+264 34400
+269 13760
+270 18015
+271 12010
+272 6005
+273 6005
+277 40405
+fn=append
+350 6
+352 3
+354 6
+356 1
+357 1
+359 2
+367 5
+fn=create
+109 22018
+112 22018
+113 22018
+119 22018
+fn=get_word
+168 87021
+169 15822
+175 529847
+177 260964
+178 391446
+179 64844
+180 15820
+181 7910
+183 32422
+185 147034
+191 1
+194 15822
+197 63288
+fn=hash
+125 7910
+126 7910
+128 161328
+129 453908
+fn=init_hash_table
+136 8
+139 2
+142 3
+145 2993
+146 997
+149 4
+150 2
+157 55377
+158 31640
+160 2
+161 2
+162 6
+fn=insert
+201 102830
+202 7910
+204 7910
+209 31640
+210 3185
+211 7910
+217 118410
+219 28372
+225 35420
+226 4120
+227 1030
+233 6880
+234 27520
+236 78070
+fn=interact
+281 12
+296 3
+297 2
+298 7
+300 6
+302 8
+304 1
+308 2
+310 2
+311 11
+fn=kill_arg_list
+522 5
+525 4
+527 2
+528 2
+529 2
+531 4
+fn=main
+89 8
+94 2
+99 4
+100 2
+105 7
+fn=new_word_node
+241 10002
+244 5001
+245 8335
+246 1667
+248 1667
+249 1667
+251 5001
+252 1667
+253 1667
+254 1667
+257 8335
+fn=place_args_in_list
+317 10
+318 2
+320 1
+327 12
+328 8
+329 11
+330 3
+331 4
+333 2
+335 2
+336 1
+337 2
+338 4
+339 2
+344 11
+fl=/usr/include/ctype.h
+fn=get_word
+209 456694
+fl=/usr/include/x86_64-linux-gnu/bits/stdio2.h
+fn=interact
+86 4
+213 5
+fl=/usr/include/x86_64-linux-gnu/bits/string_fortified.h
+fn=append
+79 2
+fn=new_word_node
+79 3334
+fl=???
+fn=(below main)
+0 12
+fn=???
+0 468701
+summary: 8201333