From: Julian Seward <jseward@acm.org>
Date: Tue, 15 Nov 2005 19:51:04 +0000 (+0000)
Subject: Update manual for 3.1.0, sections <= manual-core.html.
X-Git-Tag: svn/VALGRIND_3_1_0~97
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=86c998a8def23ecdf279dd05baf3565845313b31;p=thirdparty%2Fvalgrind.git

Update manual for 3.1.0, sections <= manual-core.html.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5135
---

diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml
index 27845b0634..a493240efd 100644
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
@@ -102,9 +102,8 @@ with C++, is <computeroutput>-fno-inline</computeroutput>.  That
 makes it easier to see the function-call chain, which can help
 reduce confusion when navigating around large C++ apps.  For
 whatever it's worth, debugging OpenOffice.org with Memcheck is a
-bit easier when using this flag.</para>
-
-<para>You don't have to do this, but doing so helps Valgrind
+bit easier when using this flag.
+You don't have to do this, but doing so helps Valgrind
 produce more accurate and less confusing error reports.  Chances
 are you're set up like this already, if you intended to debug
 your program with GNU gdb, or some other debugger.</para>
@@ -169,7 +168,9 @@ whatever reason.</para>
 the commentary, so as to avoid flooding you with information of
 secondary importance.  If you want more information about what is
 happening, re-run, passing the
-<computeroutput>-v</computeroutput> flag to Valgrind.</para>
+<computeroutput>-v</computeroutput> flag to Valgrind.
+A second <computeroutput>-v</computeroutput> gives yet more detail.
+</para>
 
 <para>You can direct the commentary to three different
 places:</para>
@@ -183,6 +184,11 @@ places:</para>
    want to send it to some other file descriptor, for example
    number 9, you can specify
    <computeroutput>--log-fd=9</computeroutput>.</para>
+   <para>This is the simplest and most common arrangement, but can
+   cause problems when Valgrinding entire trees of
+   processes which expect specific file descriptors, particularly
+   stdin/stdout/stderr, to be available for their own use.
+   </para>
   </listitem>
 
   <listitem id="manual-core.out2file" 
@@ -204,15 +210,21 @@ places:</para>
 
    <para>If you want to specify precisely the file name to use,
    without the trailing
-   <computeroutput>.12345</computeroutput>part, you can instead use
+   <computeroutput>.12345</computeroutput> part, you can instead use
    <computeroutput>--log-file-exactly=filename</computeroutput>.
    </para>
 
    <para>You can also use the
    <computeroutput>--log-file-qualifier=&lt;VAR&gt;</computeroutput> option
-   to specify the filename via the environment variable
-   <computeroutput>$VAR</computeroutput>.  This is rarely needed, but
+   to modify the filename via according to the environment variable
+   <computeroutput>VAR</computeroutput>.  This is rarely needed, but
    very useful in certain circumstances (eg. when running MPI programs).
+   In this case, the trailing <computeroutput>.12345</computeroutput>
+   part is replaced by the contents of
+   <computeroutput>$VAR</computeroutput>.  The idea is that you
+   specify a variable which will be set differently for each process
+   in the job, for example <computeroutput>BPROC_RANK</computeroutput>
+   or whatever is applicable in your MPI setup.
    </para>
   </listitem>
 
@@ -347,7 +359,7 @@ problem.</para>
 <para>The process of detecting duplicate errors is quite an
 expensive one and can become a significant performance overhead
 if your program generates huge quantities of errors.  To avoid
-serious problems here, Valgrind will simply stop collecting
+serious problems, Valgrind will simply stop collecting
 errors after 1000 different errors have been seen, or 100000 errors
 in total have been seen.  In this situation you might as well
 stop your program and fix it, because Valgrind won't tell you
@@ -359,7 +371,7 @@ if necessary.</para>
 <para>To avoid this cutoff you can use the
 <computeroutput>--error-limit=no</computeroutput> flag.  Then
 Valgrind will always show errors, regardless of how many there
-are.  Use this flag carefully, since it may have a dire effect on
+are.  Use this flag carefully, since it may have a bad effect on
 performance.</para>
 
 </sect1>
@@ -626,6 +638,17 @@ categories.</para>
     Repeating the flag increases the verbosity level.</para>
    </listitem>
 
+   <listitem>
+    <para><computeroutput>-d</computeroutput></para>
+    <para>Emit information for debugging Valgrind itself.  This
+    is usually only of interest to the Valgrind developers.
+    Repeating the flag produces more detailed output.  If you
+    want to send us a bug report, a log of the output
+    generated by <computeroutput>-v -v -d -d</computeroutput>
+    will make your report more useful.
+    </para>
+   </listitem>
+
    <listitem id="trace_children">
     <para><computeroutput>--trace-children=no</computeroutput>
     [default]</para>
@@ -684,10 +707,12 @@ categories.</para>
 
    <listitem id="log2file_qualifier">
     <para><computeroutput>--log-file-qualifer=&lt;VAR&gt;</computeroutput></para>
-    <para>Specifies that Valgrind should send all of its messages
-    to the file named by the environment variable
-    <computeroutput>$VAR</computeroutput>.  This is useful when running
+    <para>When used in conjunction with
+    <computeroutput>--log-file=</computeroutput>, causes the log file
+    name to be qualified using the contents of environment variable
+    <computeroutput>VAR</computeroutput>.  This is useful when running
     MPI programs.
+    For further details, see <xref linkend="manual-core.comment"/>.
     </para>
    </listitem>
 
@@ -737,7 +762,7 @@ errors, e.g. Memcheck, but not Cachegrind.</para>
     <para><computeroutput>--demangle=yes</computeroutput> [default]</para>
     <para>Disable/enable automatic demangling (decoding) of C++
     names.  Enabled by default.  When enabled, Valgrind will
-    attempt to translate encoded C++ procedure names back to
+    attempt to translate encoded C++ names back to
     something approaching the original.  The demangler handles
     symbols mangled by g++ versions 2.X and 3.X.</para>
 
@@ -751,8 +776,8 @@ errors, e.g. Memcheck, but not Cachegrind.</para>
    </listitem>
 
    <listitem id="num_callers">
-    <para><computeroutput>--num-callers=&lt;number&gt;</computeroutput> [default=4]</para>
-    <para>By default, Valgrind shows four levels of function call
+    <para><computeroutput>--num-callers=&lt;number&gt;</computeroutput> [default=12]</para>
+    <para>By default, Valgrind shows twelve levels of function call
     names to help you identify program locations.  You can change
     that number with this option.  This can help in determining
     the program's location in deeply-nested call chains.  Note
@@ -868,7 +893,7 @@ errors, e.g. Memcheck, but not Cachegrind.</para>
      this situation.</para>
     </formalpara>
     <para>1 May 2002: this is a historical relic which could be
-    easily fixed if it gets in your way.  Mail me and complain if
+    easily fixed if it gets in your way.  Mail us and complain if
     this is a problem for you.</para> <para>Nov 2002: if you're
     sending output to a logfile or to a network socket, I guess
     this option doesn't make any sense.  Caveat emptor.</para>
@@ -1150,12 +1175,11 @@ don't understand
 <para>Valgrind has a trapdoor mechanism via which the client
 program can pass all manner of requests and queries to Valgrind
 and the current tool.  Internally, this is used extensively to
-make malloc, free, signals, threads, etc, work, although you
-don't see that.</para>
+make malloc, free, etc, work, although you don't see that.</para>
 
 <para>For your convenience, a subset of these so-called client
 requests is provided to allow you to tell Valgrind facts about
-the behaviour of your program, and conversely to make queries.
+the behaviour of your program, and also to make queries.
 In particular, your program can tell Valgrind about changes in
 memory range permissions that Valgrind would not otherwise know
 about, and so allows clients to get Valgrind to do arbitrary
@@ -1177,7 +1201,9 @@ are not forced to run your program under Valgrind just because you
 use the macros in this file.  Also, you are not required to link your
 program with any extra supporting libraries.</para>
 
-<para>The code left in your binary has negligible performance impact.
+<para>The code left in your binary has negligible performance impact:
+on x86, amd64 and ppc32, the overhead is 6 simple integer instructions
+and is probably undetectable except in tight loops.
 However, if you really wish to compile out the client requests, you can
 compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
 <computeroutput>-DNDEBUG</computeroutput>'s effect on
@@ -1187,7 +1213,7 @@ compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
 <para>You are encouraged to copy the <filename>valgrind/*.h</filename> headers
 into your project's include directory, so your program doesn't have a
 compile-time dependency on Valgrind being installed.  The Valgrind headers,
-unlike the rest of the code, is under a BSD-style license so you may include
+unlike the rest of the code, are under a BSD-style license so you may include
 them without worrying about license incompatibility.</para>
 
 <para>Here is a brief description of the macros available in
@@ -1201,8 +1227,8 @@ tool-specific macros).</para>
    <term><computeroutput>RUNNING_ON_VALGRIND</computeroutput>:</term>
    <listitem>
     <para>returns 1 if running on Valgrind, 0 if running on the
-    real CPU.  If you are running Valgrind under itself, it will return the
-    number of layers of Valgrind emulation we're running under.
+    real CPU.  If you are running Valgrind on itself, it will return the
+    number of layers of Valgrind emulation we're running on.
     </para>
    </listitem>
   </varlistentry>
@@ -1215,19 +1241,19 @@ tool-specific macros).</para>
     dynamic code generation system.  After this call, attempts to
     execute code in the invalidated address range will cause
     Valgrind to make new translations of that code, which is
-    probably the semantics you want.  Note that this is
-    implemented naively, and involves checking all 200191 entries
-    in the translation table to see if any of them overlap the
-    specified address range.  So try not to call it often, or
-    performance will nosedive.  Note that you can be clever about
+    probably the semantics you want.  Note that code invalidations
+    are expensive because finding all the relevant translations
+    quickly is very difficult.  So try not to call it often.
+    Note that you can be clever about
     this: you only need to call it when an area which previously
     contained code is overwritten with new code.  You can choose
-    to write coode into fresh memory, and just call this
+    to write code into fresh memory, and just call this
     occasionally to discard large chunks of old code all at
     once.</para>
-
-    <para><command>Warning:</command> minimally tested,
-    especially for tools other than Memcheck.</para>
+    <para>
+    Alternatively, for transparent self-modifying-code support,
+    use<computeroutput>--smc-check=all</computeroutput>.
+    </para>
    </listitem>
   </varlistentry>
 
@@ -1415,27 +1441,14 @@ of thread executions than when run natively.  This in itself may
 cause your program to behave differently if you have some kind of
 concurrency, critical race, locking, or similar, bugs.</para>
 
-<!--
-<para>It works as follows: threaded apps are (dynamically) linked
-against <literal>libpthread.so</literal>.  Usually this is the
-one installed with your Linux distribution.  Valgrind, however,
-supplies its own <literal>libpthread.so</literal> and
-automatically connects your program to it instead.</para>
-
-<para>The fake <literal>libpthread.so</literal> and Valgrind
-cooperate to implement a user-space pthreads package.  This
-approach avoids the horrible implementation problems of
-implementing a truly multiprocessor version of Valgrind, but it
-does mean that threaded apps run only on one CPU, even if you
-have a multiprocessor machine.</para>
-
 <para>Your program will use the native
 <computeroutput>libpthread</computeroutput>, but not all of its facilities
-will work.  In particular, process-shared synchronization WILL NOT
-WORK.  They rely on special atomic instruction sequences which
-Valgrind does not emulate in a way which works between processes.
+will work.  In particular, synchonisation of processes via shared-memory
+segments will not work.  This relies on special atomic instruction sequences 
+which Valgrind does not emulate in a way which works between processes.
 Unfortunately there's no way for Valgrind to warn when this is happening,
-and such calls will mostly work; it's only when there's a race will it fail.
+and such calls will mostly work; it's only when there's a race that
+it will fail.
 </para>
 
 <para>Valgrind also supports direct use of the
@@ -1444,64 +1457,12 @@ and such calls will mostly work; it's only when there's a race will it fail.
 <computeroutput>clone()</computeroutput> is supported where either
 everything is shared (a thread) or nothing is shared (fork-like); partial
 sharing will fail.  Again, any use of atomic instruction sequences in shared
-memory between processes will not work.
+memory between processes will not work reliably.
 </para>
 
-<para>Valgrind schedules your threads in a round-robin fashion,
-with all threads having equal priority.  It switches threads
-every 50000 basic blocks (on x86, typically around 300000
-instructions), which means you'll get a much finer interleaving
-of thread executions than when run natively.  This in itself may
-cause your program to behave differently if you have some kind of
-concurrency, critical race, locking, or similar, bugs.</para>
-
-<para>As of the Valgrind-1.0 release, the state of pthread
-support was as follows:</para>
-
- <itemizedlist>
-
-  <listitem>
-   <para>Mutexes, condition variables, thread-specific data,
-   <computeroutput>pthread_once</computeroutput>, reader-writer
-   locks, semaphores, cleanup stacks, cancellation and thread
-   detaching currently work.  Various attribute-like calls are
-   handled but ignored; you get a warning message.</para>
-  </listitem>
-
-  <listitem>
-   <para>Currently the following syscalls are thread-safe
-   (nonblocking): <literal>write</literal>,
-   <literal>read</literal>, <literal>nanosleep</literal>,
-   <literal>sleep</literal>, <literal>select</literal>,
-   <literal>poll</literal>, <literal>recvmsg</literal> and
-   <literal>accept</literal>.</para>
-  </listitem>
-
-  <listitem>
-   <para>Signals in pthreads are now handled properly(ish):
-   <literal>pthread_sigmask</literal>,
-   <literal>pthread_kill</literal>, <literal>sigwait</literal>
-   and <literal>raise</literal> are now implemented.  Each thread
-   has its own signal mask, as POSIX requires.  It's a bit
-   kludgey - there's a system-wide pending signal set, rather
-   than one for each thread.  But hey.</para>
-  </listitem>
-
- </itemizedlist>
-
-<formalpara>
-<title>Note:</title> 
-<para>As of 18 May 2002, the following threaded programs now work
-fine on my RedHat 7.2 box: Opera 6.0Beta2, KNode in KDE 3.0,
-Mozilla-0.9.2.1 and Galeon-0.11.3, both as supplied with RedHat
-7.2.  Also Mozilla 1.0RC2.  OpenOffice 1.0.  MySQL 3.something
-(the current stable release).</para>
-</formalpara>
--->
 
 </sect1>
 
-
 <sect1 id="manual-core.signals" xreflabel="Handling of Signals">
 <title>Handling of Signals</title>
 
@@ -1511,7 +1472,8 @@ able to cope with any valid use of signals.</para>
 <para>If you're using signals in clever ways (for example, catching
 SIGSEGV, modifying page state and restarting the instruction), you're
 probably relying on precise exceptions.  In this case, you will need
-to use <computeroutput>--single-step=yes</computeroutput>.</para>
+to use <computeroutput>--vex-iropt-precise-memory-exns=yes</computeroutput>.
+</para>
 
 <para>If your program dies as a result of a fatal core-dumping signal,
 Valgrind will generate its own core file
@@ -1532,22 +1494,21 @@ similar.  (Note: it will not generate a core if your core dump size limit is
 <computeroutput>make</computeroutput>, <computeroutput>make
 install</computeroutput> mechanism, and we have attempted to
 ensure that it works on machines with kernel 2.4 or 2.6 and glibc
-2.2.X, 2.3.X, 2.4.X.</para>
+2.2.X or 2.3.X.  You may then want to run the regression tests
+with <computeroutput>make regtest</computeroutput>.
+</para>
 
-<para>There are two options (in addition to the usual
+<para>There are three options (in addition to the usual
 <computeroutput>--prefix=</computeroutput> which affect how Valgrind is built:
 <itemizedlist>
  <listitem>
-  <para><computeroutput>--enable-pie</computeroutput></para>
-  <para>PIE stands for "position-independent executable".
-  PIE allows Valgrind to place itself as high as possible in memory,
-  giving your program as much address space as possible.  It also allows
-  Valgrind to run under itself.  If PIE is disabled, Valgrind loads at a
-  default address which is suitable for most systems.  This is also
-  useful for debugging Valgrind itself.  It's not on by default because
-  it caused problems for some people.  Note that not all toolchaines
-  support PIEs, you need fairly recent version of the compiler, linker,
-  etc.</para>
+  <para><computeroutput>--enable-inner</computeroutput></para>
+  <para>This builds Valgrind with some special magic hacks which
+   make it possible to run it on a standard build of Valgrind
+   (what the developers call "self-hosting").  Ordinarily you
+   should not use this flag as various kinds of safety checks
+   are disabled.
+   </para>
  </listitem>
 
  <listitem>
@@ -1557,6 +1518,14 @@ ensure that it works on machines with kernel 2.4 or 2.6 and glibc
   if TLS is supported and enable this option.  Sometimes it cannot test for
   TLS, so this option allows you to override the automatic test.</para>
  </listitem>
+
+ <listitem>
+  <para><computeroutput>--with-vex=</computeroutput></para>
+  <para>Specifies the path to the underlying VEX dynamic-translation 
+   library.  By default this is taken to be in the VEX directory
+   off the root of the source tree.
+   </para>
+ </listitem>
 </itemizedlist>
 </para>
 
@@ -1589,9 +1558,9 @@ know if you have build problems.</para>
 limitations of Valgrind, and for a list of programs which are
 known not to work on it.</para>
 
-<para>The translator/instrumentor has a lot of assertions in it.
-They are permanently enabled, and I have no plans to disable
-them.  If one of these breaks, please mail us!</para>
+<para>All parts of the system make heavy use of assertions and 
+internal self-checks.  They are permanently enabled, and we have no 
+plans to disable them.  If one of them breaks, please mail us!</para>
 
 <para>If you get an assertion failure on the expression
 <computeroutput>blockSane(ch)</computeroutput> in
@@ -1613,24 +1582,31 @@ more advice about common problems, crashes, etc.</para>
 <sect1 id="manual-core.limits" xreflabel="Limitations">
 <title>Limitations</title>
 
-<para>The following list of limitations seems depressingly long.
+<para>The following list of limitations seems long.
 However, most programs actually work fine.</para>
 
-<para>Valgrind will run x86/Linux ELF dynamically linked
-binaries, on a kernel 2.4.X or 2.6.X system, subject to
+<para>Valgrind will run Linux ELF 
+binaries, on a kernel 2.4.X or 2.6.X system, on the x86, amd64
+and ppc32 architectures, subject to
 the following constraints:</para>
 
  <itemizedlist>
   <listitem>
-   <para>On x86 and AMD64, there is no support for 3DNow! instructions.  If
+   <para>On x86 and amd64, there is no support for 3DNow! instructions.  If
    the translator encounters these, Valgrind will generate a SIGILL when the
-   instruction is executed.  The same is true for Intel's SSE3 SIMD
-   instructions.</para>
+   instruction is executed.  Apart from that, on x86 and amd64, 
+   essentially all instructions are supported, up to and including SSE2.
+   Version 3.1.0 includes limited support for SSE3 on x86.  This could be 
+   improved
+   if necessary.</para>
+   <para>On ppc32, almost all integer, floating point and Altivec instructions
+   are supported.</para>
   </listitem>
 
   <listitem>
-   <para>Atomic instruction sequences are not supported, which will affect
-   any use of synchronization objects being shared between processes.  They
+   <para>Atomic instruction sequences are not properly supported, in the
+   sense that their atomicity is not preserved.  This will affect
+   any use of synchronization via memory shared between processes.  They
    will appear to work, but fail sporadically.</para>
   </listitem>
 
@@ -1639,7 +1615,8 @@ the following constraints:</para>
    than using malloc/new/free/delete, it should still work, but
    Valgrind's error checking won't be so effective.  If you
    describe your program's memory management scheme using "client
-   requests" (Section 3.7 of this manual), Memcheck can do
+   requests" (see <xref linkend="manual-core.clientreq"/>), 
+   Memcheck can do
    better.  Nevertheless, using malloc/new and free/delete is
    still the best approach.</para>
   </listitem>
@@ -1668,8 +1645,8 @@ the following constraints:</para>
    amount of administrative information maintained behind the
    scenes.  Another cause is that Valgrind dynamically translates
    the original executable.  Translated, instrumented code is
-   14-16 times larger than the original (!) so you can easily end
-   up with 30+ MB of translations when running (eg) a web
+   12-18 times larger than the original so you can easily end
+   up with 50+ MB of translations when running (eg) a web
    browser.</para>
   </listitem>
 
@@ -1690,7 +1667,8 @@ the following constraints:</para>
    </para>
 
    <para>Precision: There is no support for 80 bit arithmetic.
-   Internally, Valgrind represents all FP numbers in 64 bits, and so
+   Internally, Valgrind represents all such "long double"
+   numbers in 64 bits, and so
    there may be some differences in results.  Whether or not this is
    critical remains to be seen.  Note, the x86/amd64 fldt/fstpt
    instructions (read/write 80-bit numbers) are correctly simulated,
@@ -1764,213 +1742,17 @@ the following constraints:</para>
  <itemizedlist>
   <listitem>
    <para>emacs starts up but immediately concludes it is out of
-   memory and aborts.  Emacs has it's own memory-management
-   scheme, but I don't understand why this should interact so
-   badly with Valgrind.  Emacs works fine if you build it to use
+   memory and aborts.  It may be that Memcheck does not provide
+   a good enough emulation of the 
+   <computeroutput>mallinfo</computeroutput> function.
+   Emacs works fine if you build it to use
    the standard malloc/free routines.</para>
   </listitem>
  </itemizedlist>
 
-
- <para>Known platform-specific limitations, as of release 2.4.0:</para>
- <itemizedlist>
-  <listitem>
-   <para>(none)</para>
-  </listitem>
- </itemizedlist>
-
 </sect1>
 
 
-
-<sect1 id="manual-core.howworks" xreflabel="How It Works - A Rough Overview">
-<title>How It Works -- A Rough Overview</title>
-
-<para>Some gory details, for those with a passion for gory
-details.  You don't need to read this section if all you want to
-do is use Valgrind.  What follows is an outline of the machinery.
-It is out of date, as the JITter has been completey rewritten in
-version 3.0, and so it works quite differently.
-A more detailed (and even more out of date) description is to be
-found <xref linkend="mc-tech-docs"/>.</para>
-
-<sect2 id="manual-core.startb" xreflabel="Getting Started">
-<title>Getting started</title>
-
-<para>Valgrind is compiled into two executables:
-<computeroutput>valgrind</computeroutput>, and
-<computeroutput>stage2</computeroutput>.
-<computeroutput>valgrind</computeroutput> is a statically-linked executable
-which loads at the normal address (0x8048000).
-<computeroutput>stage2</computeroutput> is a normal dynamically-linked
-executable; it is either linked to load at a high address (0xb8000000) or is
-a Position Independent Executable.</para>
-
-<para><computeroutput>Valgrind</computeroutput> (also known as <computeroutput>stage1</computeroutput>):
-<orderedlist>
- <listitem><para>Decides where to load stage2.</para></listitem>
- <listitem><para>Pads the address space with
-    <computeroutput>mmap</computeroutput>, leaving holes only where stage2
-    should load.</para></listitem>
- <listitem><para>Loads stage2 in the same manner as
-    <computeroutput>execve()</computeroutput> would, but
-    "manually".</para></listitem> 
- <listitem><para>Jumps to the start of stage2.</para></listitem>
-</orderedlist></para>
-
-<para>Once stage2 is loaded, it uses
-<computeroutput>dlopen()</computeroutput> to load the tool, unmaps all
-traces of stage1, initializes the client's state, and starts the synthetic
-CPU.</para>
-
-<para>Each thread runs in its own kernel thread, and loops in
-<computeroutput>VG_(schedule)</computeroutput> as it runs.  When the thread
-terminates, <computeroutput>VG_(schedule)</computeroutput> returns.  Once
-all the threads have terminated, Valgrind as a whole exits.</para>
-
-<para>Each thread also has two stacks.  One is the client's stack, which
-is manipulated with the client's instructions.  The other is
-Valgrind's internal stack, which is used by all Valgrind's code on
-behalf of that thread.  It is important to not get them confused.</para>
-
-</sect2>
-
-
-<sect2 id="manual-core.engine" 
-       xreflabel="The translation/instrumentation engine">
-<title>The translation/instrumentation engine</title>
-
-<para>Valgrind does not directly run any of the original
-program's code.  Only instrumented translations are run.
-Valgrind maintains a translation table, which allows it to find
-the translation quickly for any branch target (code address).  If
-no translation has yet been made, the translator - a just-in-time
-translator - is summoned.  This makes an instrumented
-translation, which is added to the collection of translations.
-Subsequent jumps to that address will use this
-translation.</para>
-
-<para>Valgrind no longer directly supports detection of
-self-modifying code.  Such checking is expensive, and in practice
-(fortunately) almost no applications need it.  However, to help
-people who are debugging dynamic code generation systems, there
-is a Client Request (basically a macro you can put in your
-program) which directs Valgrind to discard translations in a
-given address range.  So Valgrind can still work in this
-situation provided the client tells it when code has become
-out-of-date and needs to be retranslated.</para>
-
-<para>The JITter translates basic blocks -- blocks of
-straight-line-code -- as single entities.  To minimise the
-considerable difficulties of dealing with the x86 instruction
-set, x86 instructions are first translated to a RISC-like
-intermediate code, similar to sparc code, but with an infinite
-number of virtual integer registers.  Initially each insn is
-translated seperately, and there is no attempt at
-instrumentation.</para>
-
-<para>The intermediate code is improved, mostly so as to try and
-cache the simulated machine's registers in the real machine's
-registers over several simulated instructions.  This is often
-very effective.  Also, we try to remove redundant updates of the
-simulated machines's condition-code register.</para>
-
-<para>The intermediate code is then instrumented, giving more
-intermediate code.  There are a few extra intermediate-code
-operations to support instrumentation; it is all refreshingly
-simple.  After instrumentation there is a cleanup pass to remove
-redundant value checks.</para>
-
-<para>This gives instrumented intermediate code which mentions
-arbitrary numbers of virtual registers.  A linear-scan register
-allocator is used to assign real registers and possibly generate
-spill code.  All of this is still phrased in terms of the
-intermediate code.  This machinery is inspired by the work of
-Reuben Thomas (Mite).</para>
-
-<para>Then, and only then, is the final x86 code emitted.  The
-intermediate code is carefully designed so that x86 code can be
-generated from it without need for spare registers or other
-inconveniences.</para>
-
-<para>The translations are managed using a traditional LRU-based
-caching scheme.  The translation cache has a default size of
-about 14MB.</para>
-
-</sect2>
-
-
-<sect2 id="manual-core.track" 
-       xreflabel="Tracking the Status of Memory">
-<title>Tracking the Status of Memory</title>
-
-<para>Each byte in the process' address space has nine bits
-associated with it: one A bit and eight V bits.  The A and V bits
-for each byte are stored using a sparse array, which flexibly and
-efficiently covers arbitrary parts of the 32-bit address space
-without imposing significant space or performance overheads for
-the parts of the address space never visited.  The scheme used,
-and speedup hacks, are described in detail at the top of the
-source file <filename>coregrind/vg_memory.c</filename>, so you
-should read that for the gory details.</para>
-
-</sect2>
-
-
-
-<sect2 id="manual-core.syscalls" xreflabel="System calls">
-<title>System calls</title>
-
-<para>All system calls are intercepted.  The memory status map is
-consulted before and updated after each call.  It's all rather
-tiresome.  See <filename>coregrind/vg_syscalls.c</filename> for
-details.</para>
-
-</sect2>
-
-
-<sect2 id="manual-core.syssignals" xreflabel="Signals">
-<title>Signals</title>
-
-<para>All signal-related system calls are intercepted.  If the client
-program is trying to set a signal handler, Valgrind makes a note of the
-handler address and which signal it is for.  Valgrind then arranges for the
-same signal to be delivered to its own handler.</para>
-
-<para>When such a signal arrives, Valgrind's own handler catches
-it, and notes the fact.  At a convenient safe point in execution,
-Valgrind builds a signal delivery frame on the client's stack and
-runs its handler.  If the handler longjmp()s, there is nothing
-more to be said.  If the handler returns, Valgrind notices this,
-zaps the delivery frame, and carries on where it left off before
-delivering the signal.</para>
-
-<para>The purpose of this nonsense is that setting signal
-handlers essentially amounts to giving callback addresses to the
-Linux kernel.  We can't allow this to happen, because if it did,
-signal handlers would run on the real CPU, not the simulated one.
-This means the checking machinery would not operate during the
-handler run, and, worse, memory permissions maps would not be
-updated, which could cause spurious error reports once the
-handler had returned.</para>
-
-<para>An even worse thing would happen if the signal handler
-longjmp'd rather than returned: Valgrind would completely lose
-control of the client program.</para>
-
-<para>Upshot: we can't allow the client to install signal
-handlers directly.  Instead, Valgrind must catch, on behalf of
-the client, any signal the client asks to catch, and must
-delivery it to the client on the simulated CPU, not the real one.
-This involves considerable gruesome fakery; see
-<filename>coregrind/vg_signals.c</filename> for details.</para>
-
-</sect2>
-
-</sect1>
-
-
-
 <sect1 id="manual-core.example" xreflabel="An Example Run">
 <title>An Example Run</title>
 
@@ -1993,7 +1775,6 @@ sewardj@phoenix:~/newmat10$
 ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
 ==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
 ==25832== reading syms from /proc/self/exe
-==25832== loaded 5950 symbols, 142333 line number locations
 ==25832== 
 ==25832== Invalid read of size 4
 ==25832==    at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
@@ -2023,29 +1804,29 @@ shipped.</para>
  <itemizedlist>
 
   <listitem>
-   <para><computeroutput>More than 50 errors detected.
+   <para><computeroutput>More than 100 errors detected.
    Subsequent errors will still be recorded, but in less detail
    than before.</computeroutput></para>
-   <para>After 50 different errors have been shown, Valgrind
+   <para>After 100 different errors have been shown, Valgrind
    becomes more conservative about collecting them.  It then
    requires only the program counters in the top two stack frames
    to match when deciding whether or not two errors are really
    the same one.  Prior to this point, the PCs in the top four
    frames are required to match.  This hack has the effect of
-   slowing down the appearance of new errors after the first 50.
-   The 50 constant can be changed by recompiling Valgrind.</para>
+   slowing down the appearance of new errors after the first 100.
+   The 100 constant can be changed by recompiling Valgrind.</para>
   </listitem>
 
   <listitem>
-   <para><computeroutput>More than 300 errors detected.  I'm not
+   <para><computeroutput>More than 1000 errors detected.  I'm not
    reporting any more.  Final error counts may be inaccurate.  Go
    fix your program!</computeroutput></para>
-   <para>After 300 different errors have been detected, Valgrind
+   <para>After 1000 different errors have been detected, Valgrind
    ignores any more.  It seems unlikely that collecting even more
    different ones would be of practical help to anybody, and it
    avoids the danger that Valgrind spends more and more of its
    time comparing new errors against an ever-growing collection.
-   As above, the 300 number is a compile-time constant.</para>
+   As above, the 1000 number is a compile-time constant.</para>
   </listitem>
 
   <listitem>
diff --git a/docs/xml/manual-intro.xml b/docs/xml/manual-intro.xml
index b49f3b304f..e7c716cbf0 100644
--- a/docs/xml/manual-intro.xml
+++ b/docs/xml/manual-intro.xml
@@ -8,8 +8,9 @@
 <sect1 id="manual-intro.overview" xreflabel="An Overview of Valgrind">
 <title>An Overview of Valgrind</title>
 
-<para>Valgrind is a flexible system for debugging and profiling
-Linux executables.  The system consists of a core, which
+<para>Valgrind is a suite of simulation-based debugging and
+profiling tools for programs running on Linux (x86, amd64 and ppc32).
+The system consists of a core, which
 provides a synthetic CPU in software, and a series of tools,
 each of which performs some kind of debugging, profiling, or
 similar task.  The architecture is modular, so that new tools can
@@ -56,9 +57,7 @@ summary, these are:</para>
       <para>Overlapping <computeroutput>src</computeroutput> and
       <computeroutput>dst</computeroutput> pointers in
       <computeroutput>memcpy()</computeroutput> and related
-      functions</para></listitem> <listitem><para>Some misuses of
-      the POSIX pthreads API</para>
-     </listitem>
+      functions</para></listitem> 
     </itemizedlist>
 
     <para>Problems like these can be difficult to find by other
@@ -91,6 +90,10 @@ summary, these are:</para>
     stellar, it's quite usable, and it seems plausible to run KDE
     for long periods at a time like this, collecting up all the
     addressing errors that appear.</para>
+
+    <para>NOTE: Addrcheck is not available in Valgrind 3.1.X.  We hope
+    to reinstate its functionality in later releases.  For now, use 
+    Memcheck instead.</para>
    </listitem>
 
    <listitem>
@@ -135,27 +138,34 @@ summary, these are:</para>
     <para>Helgrind has been hacked on extensively by Jeremy
     Fitzhardinge, and we have him to thank for getting it to a
     releasable state.</para>
+
+    <para>NOTE: Helgrind is, unfortunately, not available in Valgrind 3.1.X,
+    as a result of threading changes that happened in the 2.4.0 release.
+    We hope to reinstate its functionality in a future 3.2.0 release.</para>
    </listitem>
 
 </orderedlist>
   
 
-<para>A number of minor tools (<command>Corecheck</command>,
-<command>Lackey</command> and <command>Nulgrind</command>) are
+<para>A couple of minor tools (<command>Lackey</command> 
+and <command>Nulgrind</command>) are
 also supplied.  These aren't particularly useful -- they exist to
 illustrate how to create simple tools and to help the valgrind
-developers in various ways.</para>
+developers in various ways.  Nulgrind is the null tool -- it adds 
+no instrumentation.  Lackey is a simple example tool 
+which counts instructions, memory accesses, and the number of
+integer and floating point operations your program does.</para>
 
 <para>Valgrind is closely tied to details of the CPU and operating
 system, and to a lesser extent, the compiler and basic C libraries.
-Nonetheless, as of version 3.0.0 it supports several platforms:  x86/Linux
-(mature), AMD64/Linux (immature but works well), and PPC32/Linux (very
-preliminary).  Valgrind uses the standard Unix
+Nonetheless, as of version 3.1.0 it supports several platforms: x86/Linux
+(mature), AMD64/Linux (maturing), and PPC32/Linux (immature but works well).
+Valgrind uses the standard Unix
 <computeroutput>./configure</computeroutput>,
 <computeroutput>make</computeroutput>, <computeroutput>make
 install</computeroutput> mechanism, and we have attempted to
 ensure that it works on machines with kernel 2.4 or 2.6 and glibc
-2.2.X--2.4.X.</para>
+2.2.X--2.3.X.</para>
 
 <para>Valgrind is licensed under the <xref linkend="license.gpl"/>,
 version 2.  The <computeroutput>valgrind/*.h</computeroutput> headers that
diff --git a/docs/xml/quick-start-guide.xml b/docs/xml/quick-start-guide.xml
index ea0d74ffe4..d652c23545 100644
--- a/docs/xml/quick-start-guide.xml
+++ b/docs/xml/quick-start-guide.xml
@@ -40,10 +40,13 @@ right for earlier versions.</para>
 <title>Preparing your program</title>
 <para>Compile your program with <computeroutput>-g</computeroutput> to include
 debugging information so that Memcheck's error messages include exact line
-numbers.  Using <computeroutput>-O0</computeroutput> is also a good idea;
-with <computeroutput>-O1</computeroutput> line numbers in error messages can
-be inaccurate, and with <computeroutput>-O2</computeroutput> Memcheck
-occasionally reports undefined error messages incorrectly.</para>
+numbers.  Using <computeroutput>-O0</computeroutput> is also a good idea, if
+you can tolerate the slowdown.  With <computeroutput>-O1</computeroutput> 
+line numbers in error messages can inaccurate, although generally speaking
+Memchecking code compiled at <computeroutput>-O1</computeroutput> works 
+fairly well.  Use of <computeroutput>-O2</computeroutput> and above is
+not recommended as Memcheck occasionally reports uninitialised-value
+errors which don't really exist.</para>
 </sect1>
 
 <sect1 id="quick-start.mcrun" 
diff --git a/docs/xml/vg-entities.xml b/docs/xml/vg-entities.xml
index a9217a1af0..aaf5894821 100644
--- a/docs/xml/vg-entities.xml
+++ b/docs/xml/vg-entities.xml
@@ -7,6 +7,6 @@
 
 <!-- valgrind release + version stuff -->
 <!ENTITY rel-type    "Release">
-<!ENTITY rel-version "3.0.0">
-<!ENTITY rel-date    "August 3 2005">
+<!ENTITY rel-version "3.1.0">
+<!ENTITY rel-date    "November 15 2005">