<para>
So, you've managed to profile your program with Cachegrind. Now what?
What's the best way to actually act on the information it provides to speed
-up your program?</para>
+up your program? Here are some rules of thumb that we have found to be
+useful.</para>
<para>
First of all, the global hit/miss rate numbers are not that useful. If you
have multiple programs or multiple runs of a program, comparing the numbers
-might identify if any are outliers. Otherwise, they're not enough to act
-on.</para>
+might identify if any are outliers and worthy of closer investigation.
+Otherwise, they're not enough to act on.</para>
<para>
-The source code annotations are much more useful. In our experience, the
-best place to start is by looking at the <computeroutput>Ir</computeroutput>
-numbers. They simply measure how many instructions were executed for each
-line, and don't include any cache information, but they can still be very
-useful for identifying bottlenecks.</para>
+The line-by-line source code annotations are much more useful. In our
+experience, the best place to start is by looking at the
+<computeroutput>Ir</computeroutput> numbers. They simply measure how many
+instructions were executed for each line, and don't include any cache
+information, but they can still be very useful for identifying
+bottlenecks.</para>
<para>
After that, we have found that L2 misses are typically a much bigger source
of slow-downs than L1 misses. So it's worth looking for any snippets of
-code that cause a lot of L2 misses. If you find any, it's still not always
-easy to work out how to improve things. You need to have a reasonable
-understanding of how caches work, the principles of locality, and your
-program's data access patterns. </para>
+code that cause a high proportion of the L2 misses. If you find any, it's
+still not always easy to work out how to improve things. You need to have a
+reasonable understanding of how caches work, the principles of locality, and
+your program's data access patterns. Improving things may require
+redesigning a data structure, for example.</para>
<para>
In short, Cachegrind can tell you where some of the bottlenecks in your code