From: Julian Seward <jseward@acm.org>
Date: Sun, 25 Nov 2007 00:55:11 +0000 (+0000)
Subject: Create a new chapter in the Valgrind Manual: a chapter containing info
X-Git-Tag: svn/VALGRIND_3_3_0~83
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=5738a8c1ac246ecdba072ee9c7ac63227ffaff1f;p=thirdparty%2Fvalgrind.git

Create a new chapter in the Valgrind Manual: a chapter containing info
on some advanced aspects of the core (client requests, function
wrapping) and move stuff from the main core manual into it.



git-svn-id: svn://svn.valgrind.org/valgrind/trunk@7208
---

diff --git a/docs/xml/manual-core-adv.xml b/docs/xml/manual-core-adv.xml
new file mode 100644
index 0000000000..9263c5db36
--- /dev/null
+++ b/docs/xml/manual-core-adv.xml
@@ -0,0 +1,667 @@
+<?xml version="1.0"?> <!-- -*- sgml -*- -->
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
+[ <!ENTITY % vg-entities SYSTEM "vg-entities.xml"> %vg-entities; ]>
+
+
+<chapter id="manual-core-adv" xreflabel="Valgrind's core: advanced topics">
+<title>Using and understanding the Valgrind core: Advanced Topics</title>
+
+<para>This chapter describes advanced aspects of the Valgrind core
+services, which are mostly of interest to power users who wish to
+customise and modify Valgrind's default behaviours in certain useful
+ways.  The subjects covered are:</para>
+
+<itemizedlist>
+  <listitem><para>The "Client Request" mechanism</para></listitem>
+  <listitem><para>Function Wrapping</para></listitem>
+</itemizedlist>
+
+
+
+<sect1 id="manual-core-adv.clientreq" 
+       xreflabel="The Client Request mechanism">
+<title>The Client Request mechanism</title>
+
+<para>Valgrind has a trapdoor mechanism via which the client
+program can pass all manner of requests and queries to Valgrind
+and the current tool.  Internally, this is used extensively to
+make malloc, free, etc, work, although you don't see that.</para>
+
+<para>For your convenience, a subset of these so-called client
+requests is provided to allow you to tell Valgrind facts about
+the behaviour of your program, and also to make queries.
+In particular, your program can tell Valgrind about changes in
+memory range permissions that Valgrind would not otherwise know
+about, and so allows clients to get Valgrind to do arbitrary
+custom checks.</para>
+
+<para>Clients need to include a header file to make this work.
+Which header file depends on which client requests you use.  Some
+client requests are handled by the core, and are defined in the
+header file <filename>valgrind/valgrind.h</filename>.  Tool-specific
+header files are named after the tool, e.g.
+<filename>valgrind/memcheck.h</filename>.  All header files can be found
+in the <literal>include/valgrind</literal> directory of wherever Valgrind
+was installed.</para>
+
+<para>The macros in these header files have the magical property
+that they generate code in-line which Valgrind can spot.
+However, the code does nothing when not run on Valgrind, so you
+are not forced to run your program under Valgrind just because you
+use the macros in this file.  Also, you are not required to link your
+program with any extra supporting libraries.</para>
+
+<para>The code added to your binary has negligible performance impact:
+on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions
+and is probably undetectable except in tight loops.
+However, if you really wish to compile out the client requests, you can
+compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
+<computeroutput>-DNDEBUG</computeroutput>'s effect on
+<computeroutput>assert()</computeroutput>).
+</para>
+
+<para>You are encouraged to copy the <filename>valgrind/*.h</filename> headers
+into your project's include directory, so your program doesn't have a
+compile-time dependency on Valgrind being installed.  The Valgrind headers,
+unlike most of the rest of the code, are under a BSD-style license so you may
+include them without worrying about license incompatibility.</para>
+
+<para>Here is a brief description of the macros available in
+<filename>valgrind.h</filename>, which work with more than one
+tool (see the tool-specific documentation for explanations of the
+tool-specific macros).</para>
+
+ <variablelist>
+
+  <varlistentry>
+   <term><command><computeroutput>RUNNING_ON_VALGRIND</computeroutput></command>:</term>
+   <listitem>
+    <para>Returns 1 if running on Valgrind, 0 if running on the
+    real CPU.  If you are running Valgrind on itself, returns the
+    number of layers of Valgrind emulation you're running on.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_DISCARD_TRANSLATIONS</computeroutput>:</command></term>
+   <listitem>
+    <para>Discards translations of code in the specified address
+    range.  Useful if you are debugging a JIT compiler or some other
+    dynamic code generation system.  After this call, attempts to
+    execute code in the invalidated address range will cause
+    Valgrind to make new translations of that code, which is
+    probably the semantics you want.  Note that code invalidations
+    are expensive because finding all the relevant translations
+    quickly is very difficult.  So try not to call it often.
+    Note that you can be clever about
+    this: you only need to call it when an area which previously
+    contained code is overwritten with new code.  You can choose
+    to write code into fresh memory, and just call this
+    occasionally to discard large chunks of old code all at
+    once.</para>
+    <para>
+    Alternatively, for transparent self-modifying-code support,
+    use<computeroutput>--smc-check=all</computeroutput>, or run
+    on ppc32/Linux or ppc64/Linux.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_COUNT_ERRORS</computeroutput>:</command></term>
+   <listitem>
+    <para>Returns the number of errors found so far by Valgrind.  Can be
+    useful in test harness code when combined with the
+    <option>--log-fd=-1</option> option; this runs Valgrind silently,
+    but the client program can detect when errors occur.  Only useful
+    for tools that report errors, e.g. it's useful for Memcheck, but for
+    Cachegrind it will always return zero because Cachegrind doesn't
+    report errors.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>:</command></term>
+   <listitem>
+    <para>If your program manages its own memory instead of using
+    the standard <computeroutput>malloc()</computeroutput> /
+    <computeroutput>new</computeroutput> /
+    <computeroutput>new[]</computeroutput>, tools that track
+    information about heap blocks will not do nearly as good a
+    job.  For example, Memcheck won't detect nearly as many
+    errors, and the error messages won't be as informative.  To
+    improve this situation, use this macro just after your custom
+    allocator allocates some new memory.  See the comments in
+    <filename>valgrind.h</filename> for information on how to use
+    it.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_FREELIKE_BLOCK</computeroutput>:</command></term>
+   <listitem>
+    <para>This should be used in conjunction with
+    <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>.
+    Again, see <filename>memcheck/memcheck.h</filename> for
+    information on how to use it.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>:</command></term>
+   <listitem>
+    <para>This is similar to
+    <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>,
+    but is tailored towards code that uses memory pools.  See the
+    comments in <filename>valgrind.h</filename> for information
+    on how to use it.</para>
+   </listitem>
+  </varlistentry>
+  
+  <varlistentry>
+  <term><command><computeroutput>VALGRIND_DESTROY_MEMPOOL</computeroutput>:</command></term>
+   <listitem>
+    <para>This should be used in conjunction with
+    <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
+    Again, see the comments in <filename>valgrind.h</filename> for
+    information on how to use it.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_MEMPOOL_ALLOC</computeroutput>:</command></term>
+   <listitem>
+    <para>This should be used in conjunction with
+    <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
+    Again, see the comments in <filename>valgrind.h</filename> for
+    information on how to use it.</para>
+   </listitem>
+  </varlistentry>
+   
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_MEMPOOL_FREE</computeroutput>:</command></term>
+   <listitem>
+    <para>This should be used in conjunction with
+    <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
+    Again, see the comments in <filename>valgrind.h</filename> for
+    information on how to use it.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_NON_SIMD_CALL[0123]</computeroutput>:</command></term>
+   <listitem>
+    <para>Executes a function of 0, 1, 2 or 3 args in the client
+    program on the <emphasis>real</emphasis> CPU, not the virtual
+    CPU that Valgrind normally runs code on.  These are used in
+    various ways internally to Valgrind.  They might be useful to
+    client programs.</para> 
+
+    <para><command>Warning:</command> Only use these if you
+    <emphasis>really</emphasis> know what you are doing.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_PRINTF(format, ...)</computeroutput>:</command></term>
+   <listitem>
+    <para>printf a message to the log file when running under
+    Valgrind.  Nothing is output if not running under Valgrind.
+    Returns the number of characters output.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_PRINTF_BACKTRACE(format, ...)</computeroutput>:</command></term>
+   <listitem>
+    <para>printf a message to the log file along with a stack
+    backtrace when running under Valgrind.  Nothing is output if
+    not running under Valgrind.  Returns the number of characters
+    output.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_STACK_REGISTER(start, end)</computeroutput>:</command></term>
+   <listitem>
+    <para>Registers a new stack.  Informs Valgrind that the memory range
+    between start and end is a unique stack.  Returns a stack identifier
+    that can be used with other
+    <computeroutput>VALGRIND_STACK_*</computeroutput> calls.</para>
+    <para>Valgrind will use this information to determine if a change to
+    the stack pointer is an item pushed onto the stack or a change over
+    to a new stack.  Use this if you're using a user-level thread package
+    and are noticing spurious errors from Valgrind about uninitialized
+    memory reads.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_STACK_DEREGISTER(id)</computeroutput>:</command></term>
+   <listitem>
+    <para>Deregisters a previously registered stack.  Informs
+    Valgrind that previously registered memory range with stack id
+    <computeroutput>id</computeroutput> is no longer a stack.</para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><command><computeroutput>VALGRIND_STACK_CHANGE(id, start, end)</computeroutput>:</command></term>
+   <listitem>
+    <para>Changes a previously registered stack.  Informs
+    Valgrind that the previously registered stack with stack id
+    <computeroutput>id</computeroutput> has changed its start and end
+    values.  Use this if your user-level thread package implements
+    stack growth.</para>
+   </listitem>
+  </varlistentry>
+
+ </variablelist>
+
+<para>Note that <filename>valgrind.h</filename> is included by
+all the tool-specific header files (such as
+<filename>memcheck.h</filename>), so you don't need to include it
+in your client if you include a tool-specific header.</para>
+
+</sect1>
+
+
+
+
+
+<sect1 id="manual-core-adv.wrapping" xreflabel="Function Wrapping">
+<title>Function wrapping</title>
+
+<para>
+Valgrind versions 3.2.0 and above can do function wrapping on all
+supported targets.  In function wrapping, calls to some specified
+function are intercepted and rerouted to a different, user-supplied
+function.  This can do whatever it likes, typically examining the
+arguments, calling onwards to the original, and possibly examining the
+result.  Any number of functions may be wrapped.</para>
+
+<para>
+Function wrapping is useful for instrumenting an API in some way.  For
+example, wrapping functions in the POSIX pthreads API makes it
+possible to notify Valgrind of thread status changes, and wrapping
+functions in the MPI (message-passing) API allows notifying Valgrind
+of memory status changes associated with message arrival/departure.
+Such information is usually passed to Valgrind by using client
+requests in the wrapper functions, although that is not of relevance
+here.</para>
+
+<sect2 id="manual-core-adv.wrapping.example" xreflabel="A Simple Example">
+<title>A Simple Example</title>
+
+<para>Supposing we want to wrap some function</para>
+
+<programlisting><![CDATA[
+int foo ( int x, int y ) { return x + y; }]]></programlisting>
+
+<para>A wrapper is a function of identical type, but with a special name
+which identifies it as the wrapper for <computeroutput>foo</computeroutput>.
+Wrappers need to include
+supporting macros from <computeroutput>valgrind.h</computeroutput>.
+Here is a simple wrapper which prints the arguments and return value:</para>
+
+<programlisting><![CDATA[
+#include <stdio.h>
+#include "valgrind.h"
+int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y )
+{
+   int    result;
+   OrigFn fn;
+   VALGRIND_GET_ORIG_FN(fn);
+   printf("foo's wrapper: args %d %d\n", x, y);
+   CALL_FN_W_WW(result, fn, x,y);
+   printf("foo's wrapper: result %d\n", result);
+   return result;
+}
+]]></programlisting>
+
+<para>To become active, the wrapper merely needs to be present in a text
+section somewhere in the same process' address space as the function
+it wraps, and for its ELF symbol name to be visible to Valgrind.  In
+practice, this means either compiling to a 
+<computeroutput>.o</computeroutput> and linking it in, or
+compiling to a <computeroutput>.so</computeroutput> and 
+<computeroutput>LD_PRELOAD</computeroutput>ing it in.  The latter is more
+convenient in that it doesn't require relinking.</para>
+
+<para>All wrappers have approximately the above form.  There are three
+crucial macros:</para>
+
+<para><computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>: 
+this generates the real name of the wrapper.
+This is an encoded name which Valgrind notices when reading symbol
+table information.  What it says is: I am the wrapper for any function
+named <computeroutput>foo</computeroutput> which is found in 
+an ELF shared object with an empty
+("<computeroutput>NONE</computeroutput>") soname field.  The specification 
+mechanism is powerful in
+that wildcards are allowed for both sonames and function names.  
+The details are discussed below.</para>
+
+<para><computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>: 
+once in the the wrapper, the first priority is
+to get hold of the address of the original (and any other supporting
+information needed).  This is stored in a value of opaque 
+type <computeroutput>OrigFn</computeroutput>.
+The information is acquired using 
+<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>.  It is crucial
+to make this macro call before calling any other wrapped function
+in the same thread.</para>
+
+<para><computeroutput>CALL_FN_W_WW</computeroutput>: eventually we will
+want to call the function being
+wrapped.  Calling it directly does not work, since that just gets us
+back to the wrapper and tends to kill the program in short order by
+stack overflow.  Instead, the result lvalue, 
+<computeroutput>OrigFn</computeroutput> and arguments are
+handed to one of a family of macros of the form 
+<computeroutput>CALL_FN_*</computeroutput>.  These
+cause Valgrind to call the original and avoid recursion back to the
+wrapper.</para>
+</sect2>
+
+<sect2 id="manual-core-adv.wrapping.specs" xreflabel="Wrapping Specifications">
+<title>Wrapping Specifications</title>
+
+<para>This scheme has the advantage of being self-contained.  A library of
+wrappers can be compiled to object code in the normal way, and does
+not rely on an external script telling Valgrind which wrappers pertain
+to which originals.</para>
+
+<para>Each wrapper has a name which, in the most general case says: I am the
+wrapper for any function whose name matches FNPATT and whose ELF
+"soname" matches SOPATT.  Both FNPATT and SOPATT may contain wildcards
+(asterisks) and other characters (spaces, dots, @, etc) which are not 
+generally regarded as valid C identifier names.</para> 
+
+<para>This flexibility is needed to write robust wrappers for POSIX pthread
+functions, where typically we are not completely sure of either the
+function name or the soname, or alternatively we want to wrap a whole
+set of functions at once.</para> 
+
+<para>For example, <computeroutput>pthread_create</computeroutput> 
+in GNU libpthread is usually a
+versioned symbol - one whose name ends in, eg, 
+<computeroutput>@GLIBC_2.3</computeroutput>.  Hence we
+are not sure what its real name is.  We also want to cover any soname
+of the form <computeroutput>libpthread.so*</computeroutput>.
+So the header of the wrapper will be</para>
+
+<programlisting><![CDATA[
+int I_WRAP_SONAME_FNNAME_ZZ(libpthreadZdsoZd0,pthreadZucreateZAZa)
+  ( ... formals ... )
+  { ... body ... }
+]]></programlisting>
+
+<para>In order to write unusual characters as valid C function names, a
+Z-encoding scheme is used.  Names are written literally, except that
+a capital Z acts as an escape character, with the following encoding:</para>
+
+<programlisting><![CDATA[
+     Za   encodes    *
+     Zp              +
+     Zc              :
+     Zd              .
+     Zu              _
+     Zh              -
+     Zs              (space)
+     ZA              @
+     ZZ              Z
+     ZL              (       # only in valgrind 3.3.0 and later
+     ZR              )       # only in valgrind 3.3.0 and later
+]]></programlisting>
+
+<para>Hence <computeroutput>libpthreadZdsoZd0</computeroutput> is an 
+encoding of the soname <computeroutput>libpthread.so.0</computeroutput>
+and <computeroutput>pthreadZucreateZAZa</computeroutput> is an encoding 
+of the function name <computeroutput>pthread_create@*</computeroutput>.
+</para>
+
+<para>The macro <computeroutput>I_WRAP_SONAME_FNNAME_ZZ</computeroutput> 
+constructs a wrapper name in which
+both the soname (first component) and function name (second component)
+are Z-encoded.  Encoding the function name can be tiresome and is
+often unnecessary, so a second macro,
+<computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>, can be
+used instead.  The <computeroutput>_ZU</computeroutput> variant is 
+also useful for writing wrappers for
+C++ functions, in which the function name is usually already mangled
+using some other convention in which Z plays an important role.  Having
+to encode a second time quickly becomes confusing.</para>
+
+<para>Since the function name field may contain wildcards, it can be
+anything, including just <computeroutput>*</computeroutput>.
+The same is true for the soname.
+However, some ELF objects - specifically, main executables - do not
+have sonames.  Any object lacking a soname is treated as if its soname
+was <computeroutput>NONE</computeroutput>, which is why the original 
+example above had a name
+<computeroutput>I_WRAP_SONAME_FNNAME_ZU(NONE,foo)</computeroutput>.</para>
+
+<para>Note that the soname of an ELF object is not the same as its
+file name, although it is often similar.  You can find the soname of
+an object <computeroutput>libfoo.so</computeroutput> using the command
+<computeroutput>readelf -a libfoo.so | grep soname</computeroutput>.</para>
+</sect2>
+
+<sect2 id="manual-core-adv.wrapping.semantics" xreflabel="Wrapping Semantics">
+<title>Wrapping Semantics</title>
+
+<para>The ability for a wrapper to replace an infinite family of functions
+is powerful but brings complications in situations where ELF objects
+appear and disappear (are dlopen'd and dlclose'd) on the fly.
+Valgrind tries to maintain sensible behaviour in such situations.</para>
+
+<para>For example, suppose a process has dlopened (an ELF object with
+soname) <computeroutput>object1.so</computeroutput>, which contains 
+<computeroutput>function1</computeroutput>.  It starts to use
+<computeroutput>function1</computeroutput> immediately.</para>
+
+<para>After a while it dlopens <computeroutput>wrappers.so</computeroutput>,
+which contains a wrapper
+for <computeroutput>function1</computeroutput> in (soname) 
+<computeroutput>object1.so</computeroutput>.  All subsequent calls to 
+<computeroutput>function1</computeroutput> are rerouted to the wrapper.</para>
+
+<para>If <computeroutput>wrappers.so</computeroutput> is 
+later dlclose'd, calls to <computeroutput>function1</computeroutput> are 
+naturally routed back to the original.</para>
+
+<para>Alternatively, if <computeroutput>object1.so</computeroutput>
+is dlclose'd but wrappers.so remains,
+then the wrapper exported by <computeroutput>wrapper.so</computeroutput>
+becomes inactive, since there
+is no way to get to it - there is no original to call any more.  However,
+Valgrind remembers that the wrapper is still present.  If 
+<computeroutput>object1.so</computeroutput> is
+eventually dlopen'd again, the wrapper will become active again.</para>
+
+<para>In short, valgrind inspects all code loading/unloading events to
+ensure that the set of currently active wrappers remains consistent.</para>
+
+<para>A second possible problem is that of conflicting wrappers.  It is 
+easily possible to load two or more wrappers, both of which claim
+to be wrappers for some third function.  In such cases Valgrind will
+complain about conflicting wrappers when the second one appears, and
+will honour only the first one.</para>
+</sect2>
+
+<sect2 id="manual-core-adv.wrapping.debugging" xreflabel="Debugging">
+<title>Debugging</title>
+
+<para>Figuring out what's going on given the dynamic nature of wrapping
+can be difficult.  The 
+<computeroutput>--trace-redir=yes</computeroutput> flag makes 
+this possible
+by showing the complete state of the redirection subsystem after
+every
+<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
+event affecting code (text).</para>
+
+<para>There are two central concepts:</para>
+
+<itemizedlist>
+
+  <listitem><para>A "redirection specification" is a binding of 
+  a (soname pattern, fnname pattern) pair to a code address.
+  These bindings are created by writing functions with names
+  made with the 
+  <computeroutput>I_WRAP_SONAME_FNNAME_{ZZ,_ZU}</computeroutput>
+  macros.</para></listitem>
+
+  <listitem><para>An "active redirection" is code-address to 
+  code-address binding currently in effect.</para></listitem>
+
+</itemizedlist>
+
+<para>The state of the wrapping-and-redirection subsystem comprises a set of
+specifications and a set of active bindings.  The specifications are
+acquired/discarded by watching all 
+<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
+events on code (text)
+sections.  The active binding set is (conceptually) recomputed from
+the specifications, and all known symbol names, following any change
+to the specification set.</para>
+
+<para><computeroutput>--trace-redir=yes</computeroutput> shows the contents 
+of both sets following any such event.</para>
+
+<para><computeroutput>-v</computeroutput> prints a line of text each 
+time an active specification is used for the first time.</para>
+
+<para>Hence for maximum debugging effectiveness you will need to use both
+flags.</para>
+
+<para>One final comment.  The function-wrapping facility is closely
+tied to Valgrind's ability to replace (redirect) specified
+functions, for example to redirect calls to 
+<computeroutput>malloc</computeroutput> to its
+own implementation.  Indeed, a replacement function can be
+regarded as a wrapper function which does not call the original.
+However, to make the implementation more robust, the two kinds
+of interception (wrapping vs replacement) are treated differently.
+</para>
+
+<para><computeroutput>--trace-redir=yes</computeroutput> shows 
+specifications and bindings for both
+replacement and wrapper functions.  To differentiate the 
+two, replacement bindings are printed using 
+<computeroutput>R-></computeroutput> whereas 
+wraps are printed using <computeroutput>W-></computeroutput>.
+</para>
+</sect2>
+
+
+<sect2 id="manual-core-adv.wrapping.limitations-cf" 
+       xreflabel="Limitations - control flow">
+<title>Limitations - control flow</title>
+
+<para>For the most part, the function wrapping implementation is robust.
+The only important caveat is: in a wrapper, get hold of
+the <computeroutput>OrigFn</computeroutput> information using 
+<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput> before calling any
+other wrapped function.  Once you have the 
+<computeroutput>OrigFn</computeroutput>, arbitrary
+calls between, recursion between, and longjumps out of wrappers
+should work correctly.  There is never any interaction between wrapped
+functions and merely replaced functions 
+(eg <computeroutput>malloc</computeroutput>), so you can call
+<computeroutput>malloc</computeroutput> etc safely from within wrappers.
+</para>
+
+<para>The above comments are true for {x86,amd64,ppc32}-linux.  On
+ppc64-linux function wrapping is more fragile due to the (arguably
+poorly designed) ppc64-linux ABI.  This mandates the use of a shadow
+stack which tracks entries/exits of both wrapper and replacement
+functions.  This gives two limitations: firstly, longjumping out of
+wrappers will rapidly lead to disaster, since the shadow stack will
+not get correctly cleared.  Secondly, since the shadow stack has
+finite size, recursion between wrapper/replacement functions is only
+possible to a limited depth, beyond which Valgrind has to abort the
+run.  This depth is currently 16 calls.</para>
+
+<para>For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above
+comments apply on a per-thread basis.  In other words, wrapping is
+thread-safe: each thread must individually observe the above
+restrictions, but there is no need for any kind of inter-thread
+cooperation.</para>
+</sect2>
+
+
+<sect2 id="manual-core-adv.wrapping.limitations-sigs" 
+       xreflabel="Limitations - original function signatures">
+<title>Limitations - original function signatures</title>
+
+<para>As shown in the above example, to call the original you must use a
+macro of the form <computeroutput>CALL_FN_*</computeroutput>.  
+For technical reasons it is impossible
+to create a single macro to deal with all argument types and numbers,
+so a family of macros covering the most common cases is supplied.  In
+what follows, 'W' denotes a machine-word-typed value (a pointer or a
+C <computeroutput>long</computeroutput>), 
+and 'v' denotes C's <computeroutput>void</computeroutput> type.
+The currently available macros are:</para>
+
+<programlisting><![CDATA[
+CALL_FN_v_v       -- call an original of type  void fn ( void )
+CALL_FN_W_v       -- call an original of type  long fn ( void )
+
+CALL_FN_v_W       -- void fn ( long )
+CALL_FN_W_W       -- long fn ( long )
+
+CALL_FN_v_WW      -- void fn ( long, long )
+CALL_FN_W_WW      -- long fn ( long, long )
+
+CALL_FN_v_WWW     -- void fn ( long, long, long )
+CALL_FN_W_WWW     -- long fn ( long, long, long )
+
+CALL_FN_W_WWWW    -- long fn ( long, long, long, long )
+CALL_FN_W_5W      -- long fn ( long, long, long, long, long )
+CALL_FN_W_6W      -- long fn ( long, long, long, long, long, long )
+and so on, up to 
+CALL_FN_W_12W
+]]></programlisting>
+
+<para>The set of supported types can be expanded as needed.  It is
+regrettable that this limitation exists.  Function wrapping has proven
+difficult to implement, with a certain apparently unavoidable level of
+ickyness.  After several implementation attempts, the present
+arrangement appears to be the least-worst tradeoff.  At least it works
+reliably in the presence of dynamic linking and dynamic code
+loading/unloading.</para>
+
+<para>You should not attempt to wrap a function of one type signature with a
+wrapper of a different type signature.  Such trickery will surely lead
+to crashes or strange behaviour.  This is not of course a limitation
+of the function wrapping implementation, merely a reflection of the
+fact that it gives you sweeping powers to shoot yourself in the foot
+if you are not careful.  Imagine the instant havoc you could wreak by
+writing a wrapper which matched any function name in any soname - in
+effect, one which claimed to be a wrapper for all functions in the
+process.</para>
+</sect2>
+
+<sect2 id="manual-core-adv.wrapping.examples" xreflabel="Examples">
+<title>Examples</title>
+
+<para>In the source tree, 
+<computeroutput>memcheck/tests/wrap[1-8].c</computeroutput> provide a series of
+examples, ranging from very simple to quite advanced.</para>
+
+<para><computeroutput>auxprogs/libmpiwrap.c</computeroutput> is an example 
+of wrapping a big, complex API (the MPI-2 interface).  This file defines 
+almost 300 different wrappers.</para>
+</sect2>
+
+</sect1>
+
+
+
+
+</chapter>
diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml
index 5ff421bfcc..86d1b0874e 100644
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
@@ -7,11 +7,18 @@
 <chapter id="manual-core" xreflabel="Valgrind's core">
 <title>Using and understanding the Valgrind core</title>
 
-<para>This section describes the Valgrind core services, flags and
+<para>This chapter describes the Valgrind core services, flags and
 behaviours.  That means it is relevant regardless of what particular
-tool you are using.  A point of terminology: most references to
-"Valgrind" in the rest of this section refer to the Valgrind
-core services.</para>
+tool you are using.  The information should be sufficient for you to
+make effective day-to-day use of Valgrind.  Advanced topics related to
+the Valgrind core are described in <xref linkend="manual-core-adv"/>.
+</para>
+
+<para>
+A point of terminology: most references to "Valgrind" in this chapter
+refer to the Valgrind core services.  </para>
+
+
 
 <sect1 id="manual-core.whatdoes" 
        xreflabel="What Valgrind does with your program">
@@ -1781,648 +1788,6 @@ shipped.</para>
 
 
 
-<sect1 id="manual-core.clientreq" 
-       xreflabel="The Client Request mechanism">
-<title>The Client Request mechanism</title>
-
-<para>Valgrind has a trapdoor mechanism via which the client
-program can pass all manner of requests and queries to Valgrind
-and the current tool.  Internally, this is used extensively to
-make malloc, free, etc, work, although you don't see that.</para>
-
-<para>For your convenience, a subset of these so-called client
-requests is provided to allow you to tell Valgrind facts about
-the behaviour of your program, and also to make queries.
-In particular, your program can tell Valgrind about changes in
-memory range permissions that Valgrind would not otherwise know
-about, and so allows clients to get Valgrind to do arbitrary
-custom checks.</para>
-
-<para>Clients need to include a header file to make this work.
-Which header file depends on which client requests you use.  Some
-client requests are handled by the core, and are defined in the
-header file <filename>valgrind/valgrind.h</filename>.  Tool-specific
-header files are named after the tool, e.g.
-<filename>valgrind/memcheck.h</filename>.  All header files can be found
-in the <literal>include/valgrind</literal> directory of wherever Valgrind
-was installed.</para>
-
-<para>The macros in these header files have the magical property
-that they generate code in-line which Valgrind can spot.
-However, the code does nothing when not run on Valgrind, so you
-are not forced to run your program under Valgrind just because you
-use the macros in this file.  Also, you are not required to link your
-program with any extra supporting libraries.</para>
-
-<para>The code added to your binary has negligible performance impact:
-on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions
-and is probably undetectable except in tight loops.
-However, if you really wish to compile out the client requests, you can
-compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
-<computeroutput>-DNDEBUG</computeroutput>'s effect on
-<computeroutput>assert()</computeroutput>).
-</para>
-
-<para>You are encouraged to copy the <filename>valgrind/*.h</filename> headers
-into your project's include directory, so your program doesn't have a
-compile-time dependency on Valgrind being installed.  The Valgrind headers,
-unlike most of the rest of the code, are under a BSD-style license so you may
-include them without worrying about license incompatibility.</para>
-
-<para>Here is a brief description of the macros available in
-<filename>valgrind.h</filename>, which work with more than one
-tool (see the tool-specific documentation for explanations of the
-tool-specific macros).</para>
-
- <variablelist>
-
-  <varlistentry>
-   <term><command><computeroutput>RUNNING_ON_VALGRIND</computeroutput></command>:</term>
-   <listitem>
-    <para>Returns 1 if running on Valgrind, 0 if running on the
-    real CPU.  If you are running Valgrind on itself, returns the
-    number of layers of Valgrind emulation you're running on.
-    </para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_DISCARD_TRANSLATIONS</computeroutput>:</command></term>
-   <listitem>
-    <para>Discards translations of code in the specified address
-    range.  Useful if you are debugging a JIT compiler or some other
-    dynamic code generation system.  After this call, attempts to
-    execute code in the invalidated address range will cause
-    Valgrind to make new translations of that code, which is
-    probably the semantics you want.  Note that code invalidations
-    are expensive because finding all the relevant translations
-    quickly is very difficult.  So try not to call it often.
-    Note that you can be clever about
-    this: you only need to call it when an area which previously
-    contained code is overwritten with new code.  You can choose
-    to write code into fresh memory, and just call this
-    occasionally to discard large chunks of old code all at
-    once.</para>
-    <para>
-    Alternatively, for transparent self-modifying-code support,
-    use<computeroutput>--smc-check=all</computeroutput>, or run
-    on ppc32/Linux or ppc64/Linux.
-    </para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_COUNT_ERRORS</computeroutput>:</command></term>
-   <listitem>
-    <para>Returns the number of errors found so far by Valgrind.  Can be
-    useful in test harness code when combined with the
-    <option>--log-fd=-1</option> option; this runs Valgrind silently,
-    but the client program can detect when errors occur.  Only useful
-    for tools that report errors, e.g. it's useful for Memcheck, but for
-    Cachegrind it will always return zero because Cachegrind doesn't
-    report errors.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>:</command></term>
-   <listitem>
-    <para>If your program manages its own memory instead of using
-    the standard <computeroutput>malloc()</computeroutput> /
-    <computeroutput>new</computeroutput> /
-    <computeroutput>new[]</computeroutput>, tools that track
-    information about heap blocks will not do nearly as good a
-    job.  For example, Memcheck won't detect nearly as many
-    errors, and the error messages won't be as informative.  To
-    improve this situation, use this macro just after your custom
-    allocator allocates some new memory.  See the comments in
-    <filename>valgrind.h</filename> for information on how to use
-    it.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_FREELIKE_BLOCK</computeroutput>:</command></term>
-   <listitem>
-    <para>This should be used in conjunction with
-    <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>.
-    Again, see <filename>memcheck/memcheck.h</filename> for
-    information on how to use it.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>:</command></term>
-   <listitem>
-    <para>This is similar to
-    <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>,
-    but is tailored towards code that uses memory pools.  See the
-    comments in <filename>valgrind.h</filename> for information
-    on how to use it.</para>
-   </listitem>
-  </varlistentry>
-  
-  <varlistentry>
-  <term><command><computeroutput>VALGRIND_DESTROY_MEMPOOL</computeroutput>:</command></term>
-   <listitem>
-    <para>This should be used in conjunction with
-    <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
-    Again, see the comments in <filename>valgrind.h</filename> for
-    information on how to use it.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_MEMPOOL_ALLOC</computeroutput>:</command></term>
-   <listitem>
-    <para>This should be used in conjunction with
-    <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
-    Again, see the comments in <filename>valgrind.h</filename> for
-    information on how to use it.</para>
-   </listitem>
-  </varlistentry>
-   
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_MEMPOOL_FREE</computeroutput>:</command></term>
-   <listitem>
-    <para>This should be used in conjunction with
-    <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
-    Again, see the comments in <filename>valgrind.h</filename> for
-    information on how to use it.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_NON_SIMD_CALL[0123]</computeroutput>:</command></term>
-   <listitem>
-    <para>Executes a function of 0, 1, 2 or 3 args in the client
-    program on the <emphasis>real</emphasis> CPU, not the virtual
-    CPU that Valgrind normally runs code on.  These are used in
-    various ways internally to Valgrind.  They might be useful to
-    client programs.</para> 
-
-    <para><command>Warning:</command> Only use these if you
-    <emphasis>really</emphasis> know what you are doing.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_PRINTF(format, ...)</computeroutput>:</command></term>
-   <listitem>
-    <para>printf a message to the log file when running under
-    Valgrind.  Nothing is output if not running under Valgrind.
-    Returns the number of characters output.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_PRINTF_BACKTRACE(format, ...)</computeroutput>:</command></term>
-   <listitem>
-    <para>printf a message to the log file along with a stack
-    backtrace when running under Valgrind.  Nothing is output if
-    not running under Valgrind.  Returns the number of characters
-    output.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_STACK_REGISTER(start, end)</computeroutput>:</command></term>
-   <listitem>
-    <para>Registers a new stack.  Informs Valgrind that the memory range
-    between start and end is a unique stack.  Returns a stack identifier
-    that can be used with other
-    <computeroutput>VALGRIND_STACK_*</computeroutput> calls.</para>
-    <para>Valgrind will use this information to determine if a change to
-    the stack pointer is an item pushed onto the stack or a change over
-    to a new stack.  Use this if you're using a user-level thread package
-    and are noticing spurious errors from Valgrind about uninitialized
-    memory reads.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_STACK_DEREGISTER(id)</computeroutput>:</command></term>
-   <listitem>
-    <para>Deregisters a previously registered stack.  Informs
-    Valgrind that previously registered memory range with stack id
-    <computeroutput>id</computeroutput> is no longer a stack.</para>
-   </listitem>
-  </varlistentry>
-
-  <varlistentry>
-   <term><command><computeroutput>VALGRIND_STACK_CHANGE(id, start, end)</computeroutput>:</command></term>
-   <listitem>
-    <para>Changes a previously registered stack.  Informs
-    Valgrind that the previously registered stack with stack id
-    <computeroutput>id</computeroutput> has changed its start and end
-    values.  Use this if your user-level thread package implements
-    stack growth.</para>
-   </listitem>
-  </varlistentry>
-
- </variablelist>
-
-<para>Note that <filename>valgrind.h</filename> is included by
-all the tool-specific header files (such as
-<filename>memcheck.h</filename>), so you don't need to include it
-in your client if you include a tool-specific header.</para>
-
-</sect1>
-
-
-
-
-
-<sect1 id="manual-core.wrapping" xreflabel="Function Wrapping">
-<title>Function wrapping</title>
-
-<para>
-Valgrind versions 3.2.0 and above can do function wrapping on all
-supported targets.  In function wrapping, calls to some specified
-function are intercepted and rerouted to a different, user-supplied
-function.  This can do whatever it likes, typically examining the
-arguments, calling onwards to the original, and possibly examining the
-result.  Any number of functions may be wrapped.</para>
-
-<para>
-Function wrapping is useful for instrumenting an API in some way.  For
-example, wrapping functions in the POSIX pthreads API makes it
-possible to notify Valgrind of thread status changes, and wrapping
-functions in the MPI (message-passing) API allows notifying Valgrind
-of memory status changes associated with message arrival/departure.
-Such information is usually passed to Valgrind by using client
-requests in the wrapper functions, although that is not of relevance
-here.</para>
-
-<sect2 id="manual-core.wrapping.example" xreflabel="A Simple Example">
-<title>A Simple Example</title>
-
-<para>Supposing we want to wrap some function</para>
-
-<programlisting><![CDATA[
-int foo ( int x, int y ) { return x + y; }]]></programlisting>
-
-<para>A wrapper is a function of identical type, but with a special name
-which identifies it as the wrapper for <computeroutput>foo</computeroutput>.
-Wrappers need to include
-supporting macros from <computeroutput>valgrind.h</computeroutput>.
-Here is a simple wrapper which prints the arguments and return value:</para>
-
-<programlisting><![CDATA[
-#include <stdio.h>
-#include "valgrind.h"
-int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y )
-{
-   int    result;
-   OrigFn fn;
-   VALGRIND_GET_ORIG_FN(fn);
-   printf("foo's wrapper: args %d %d\n", x, y);
-   CALL_FN_W_WW(result, fn, x,y);
-   printf("foo's wrapper: result %d\n", result);
-   return result;
-}
-]]></programlisting>
-
-<para>To become active, the wrapper merely needs to be present in a text
-section somewhere in the same process' address space as the function
-it wraps, and for its ELF symbol name to be visible to Valgrind.  In
-practice, this means either compiling to a 
-<computeroutput>.o</computeroutput> and linking it in, or
-compiling to a <computeroutput>.so</computeroutput> and 
-<computeroutput>LD_PRELOAD</computeroutput>ing it in.  The latter is more
-convenient in that it doesn't require relinking.</para>
-
-<para>All wrappers have approximately the above form.  There are three
-crucial macros:</para>
-
-<para><computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>: 
-this generates the real name of the wrapper.
-This is an encoded name which Valgrind notices when reading symbol
-table information.  What it says is: I am the wrapper for any function
-named <computeroutput>foo</computeroutput> which is found in 
-an ELF shared object with an empty
-("<computeroutput>NONE</computeroutput>") soname field.  The specification 
-mechanism is powerful in
-that wildcards are allowed for both sonames and function names.  
-The details are discussed below.</para>
-
-<para><computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>: 
-once in the the wrapper, the first priority is
-to get hold of the address of the original (and any other supporting
-information needed).  This is stored in a value of opaque 
-type <computeroutput>OrigFn</computeroutput>.
-The information is acquired using 
-<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>.  It is crucial
-to make this macro call before calling any other wrapped function
-in the same thread.</para>
-
-<para><computeroutput>CALL_FN_W_WW</computeroutput>: eventually we will
-want to call the function being
-wrapped.  Calling it directly does not work, since that just gets us
-back to the wrapper and tends to kill the program in short order by
-stack overflow.  Instead, the result lvalue, 
-<computeroutput>OrigFn</computeroutput> and arguments are
-handed to one of a family of macros of the form 
-<computeroutput>CALL_FN_*</computeroutput>.  These
-cause Valgrind to call the original and avoid recursion back to the
-wrapper.</para>
-</sect2>
-
-<sect2 id="manual-core.wrapping.specs" xreflabel="Wrapping Specifications">
-<title>Wrapping Specifications</title>
-
-<para>This scheme has the advantage of being self-contained.  A library of
-wrappers can be compiled to object code in the normal way, and does
-not rely on an external script telling Valgrind which wrappers pertain
-to which originals.</para>
-
-<para>Each wrapper has a name which, in the most general case says: I am the
-wrapper for any function whose name matches FNPATT and whose ELF
-"soname" matches SOPATT.  Both FNPATT and SOPATT may contain wildcards
-(asterisks) and other characters (spaces, dots, @, etc) which are not 
-generally regarded as valid C identifier names.</para> 
-
-<para>This flexibility is needed to write robust wrappers for POSIX pthread
-functions, where typically we are not completely sure of either the
-function name or the soname, or alternatively we want to wrap a whole
-set of functions at once.</para> 
-
-<para>For example, <computeroutput>pthread_create</computeroutput> 
-in GNU libpthread is usually a
-versioned symbol - one whose name ends in, eg, 
-<computeroutput>@GLIBC_2.3</computeroutput>.  Hence we
-are not sure what its real name is.  We also want to cover any soname
-of the form <computeroutput>libpthread.so*</computeroutput>.
-So the header of the wrapper will be</para>
-
-<programlisting><![CDATA[
-int I_WRAP_SONAME_FNNAME_ZZ(libpthreadZdsoZd0,pthreadZucreateZAZa)
-  ( ... formals ... )
-  { ... body ... }
-]]></programlisting>
-
-<para>In order to write unusual characters as valid C function names, a
-Z-encoding scheme is used.  Names are written literally, except that
-a capital Z acts as an escape character, with the following encoding:</para>
-
-<programlisting><![CDATA[
-     Za   encodes    *
-     Zp              +
-     Zc              :
-     Zd              .
-     Zu              _
-     Zh              -
-     Zs              (space)
-     ZA              @
-     ZZ              Z
-     ZL              (       # only in valgrind 3.3.0 and later
-     ZR              )       # only in valgrind 3.3.0 and later
-]]></programlisting>
-
-<para>Hence <computeroutput>libpthreadZdsoZd0</computeroutput> is an 
-encoding of the soname <computeroutput>libpthread.so.0</computeroutput>
-and <computeroutput>pthreadZucreateZAZa</computeroutput> is an encoding 
-of the function name <computeroutput>pthread_create@*</computeroutput>.
-</para>
-
-<para>The macro <computeroutput>I_WRAP_SONAME_FNNAME_ZZ</computeroutput> 
-constructs a wrapper name in which
-both the soname (first component) and function name (second component)
-are Z-encoded.  Encoding the function name can be tiresome and is
-often unnecessary, so a second macro,
-<computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>, can be
-used instead.  The <computeroutput>_ZU</computeroutput> variant is 
-also useful for writing wrappers for
-C++ functions, in which the function name is usually already mangled
-using some other convention in which Z plays an important role.  Having
-to encode a second time quickly becomes confusing.</para>
-
-<para>Since the function name field may contain wildcards, it can be
-anything, including just <computeroutput>*</computeroutput>.
-The same is true for the soname.
-However, some ELF objects - specifically, main executables - do not
-have sonames.  Any object lacking a soname is treated as if its soname
-was <computeroutput>NONE</computeroutput>, which is why the original 
-example above had a name
-<computeroutput>I_WRAP_SONAME_FNNAME_ZU(NONE,foo)</computeroutput>.</para>
-
-<para>Note that the soname of an ELF object is not the same as its
-file name, although it is often similar.  You can find the soname of
-an object <computeroutput>libfoo.so</computeroutput> using the command
-<computeroutput>readelf -a libfoo.so | grep soname</computeroutput>.</para>
-</sect2>
-
-<sect2 id="manual-core.wrapping.semantics" xreflabel="Wrapping Semantics">
-<title>Wrapping Semantics</title>
-
-<para>The ability for a wrapper to replace an infinite family of functions
-is powerful but brings complications in situations where ELF objects
-appear and disappear (are dlopen'd and dlclose'd) on the fly.
-Valgrind tries to maintain sensible behaviour in such situations.</para>
-
-<para>For example, suppose a process has dlopened (an ELF object with
-soname) <computeroutput>object1.so</computeroutput>, which contains 
-<computeroutput>function1</computeroutput>.  It starts to use
-<computeroutput>function1</computeroutput> immediately.</para>
-
-<para>After a while it dlopens <computeroutput>wrappers.so</computeroutput>,
-which contains a wrapper
-for <computeroutput>function1</computeroutput> in (soname) 
-<computeroutput>object1.so</computeroutput>.  All subsequent calls to 
-<computeroutput>function1</computeroutput> are rerouted to the wrapper.</para>
-
-<para>If <computeroutput>wrappers.so</computeroutput> is 
-later dlclose'd, calls to <computeroutput>function1</computeroutput> are 
-naturally routed back to the original.</para>
-
-<para>Alternatively, if <computeroutput>object1.so</computeroutput>
-is dlclose'd but wrappers.so remains,
-then the wrapper exported by <computeroutput>wrapper.so</computeroutput>
-becomes inactive, since there
-is no way to get to it - there is no original to call any more.  However,
-Valgrind remembers that the wrapper is still present.  If 
-<computeroutput>object1.so</computeroutput> is
-eventually dlopen'd again, the wrapper will become active again.</para>
-
-<para>In short, valgrind inspects all code loading/unloading events to
-ensure that the set of currently active wrappers remains consistent.</para>
-
-<para>A second possible problem is that of conflicting wrappers.  It is 
-easily possible to load two or more wrappers, both of which claim
-to be wrappers for some third function.  In such cases Valgrind will
-complain about conflicting wrappers when the second one appears, and
-will honour only the first one.</para>
-</sect2>
-
-<sect2 id="manual-core.wrapping.debugging" xreflabel="Debugging">
-<title>Debugging</title>
-
-<para>Figuring out what's going on given the dynamic nature of wrapping
-can be difficult.  The 
-<computeroutput>--trace-redir=yes</computeroutput> flag makes 
-this possible
-by showing the complete state of the redirection subsystem after
-every
-<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
-event affecting code (text).</para>
-
-<para>There are two central concepts:</para>
-
-<itemizedlist>
-
-  <listitem><para>A "redirection specification" is a binding of 
-  a (soname pattern, fnname pattern) pair to a code address.
-  These bindings are created by writing functions with names
-  made with the 
-  <computeroutput>I_WRAP_SONAME_FNNAME_{ZZ,_ZU}</computeroutput>
-  macros.</para></listitem>
-
-  <listitem><para>An "active redirection" is code-address to 
-  code-address binding currently in effect.</para></listitem>
-
-</itemizedlist>
-
-<para>The state of the wrapping-and-redirection subsystem comprises a set of
-specifications and a set of active bindings.  The specifications are
-acquired/discarded by watching all 
-<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
-events on code (text)
-sections.  The active binding set is (conceptually) recomputed from
-the specifications, and all known symbol names, following any change
-to the specification set.</para>
-
-<para><computeroutput>--trace-redir=yes</computeroutput> shows the contents 
-of both sets following any such event.</para>
-
-<para><computeroutput>-v</computeroutput> prints a line of text each 
-time an active specification is used for the first time.</para>
-
-<para>Hence for maximum debugging effectiveness you will need to use both
-flags.</para>
-
-<para>One final comment.  The function-wrapping facility is closely
-tied to Valgrind's ability to replace (redirect) specified
-functions, for example to redirect calls to 
-<computeroutput>malloc</computeroutput> to its
-own implementation.  Indeed, a replacement function can be
-regarded as a wrapper function which does not call the original.
-However, to make the implementation more robust, the two kinds
-of interception (wrapping vs replacement) are treated differently.
-</para>
-
-<para><computeroutput>--trace-redir=yes</computeroutput> shows 
-specifications and bindings for both
-replacement and wrapper functions.  To differentiate the 
-two, replacement bindings are printed using 
-<computeroutput>R-></computeroutput> whereas 
-wraps are printed using <computeroutput>W-></computeroutput>.
-</para>
-</sect2>
-
-
-<sect2 id="manual-core.wrapping.limitations-cf" 
-       xreflabel="Limitations - control flow">
-<title>Limitations - control flow</title>
-
-<para>For the most part, the function wrapping implementation is robust.
-The only important caveat is: in a wrapper, get hold of
-the <computeroutput>OrigFn</computeroutput> information using 
-<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput> before calling any
-other wrapped function.  Once you have the 
-<computeroutput>OrigFn</computeroutput>, arbitrary
-calls between, recursion between, and longjumps out of wrappers
-should work correctly.  There is never any interaction between wrapped
-functions and merely replaced functions 
-(eg <computeroutput>malloc</computeroutput>), so you can call
-<computeroutput>malloc</computeroutput> etc safely from within wrappers.
-</para>
-
-<para>The above comments are true for {x86,amd64,ppc32}-linux.  On
-ppc64-linux function wrapping is more fragile due to the (arguably
-poorly designed) ppc64-linux ABI.  This mandates the use of a shadow
-stack which tracks entries/exits of both wrapper and replacement
-functions.  This gives two limitations: firstly, longjumping out of
-wrappers will rapidly lead to disaster, since the shadow stack will
-not get correctly cleared.  Secondly, since the shadow stack has
-finite size, recursion between wrapper/replacement functions is only
-possible to a limited depth, beyond which Valgrind has to abort the
-run.  This depth is currently 16 calls.</para>
-
-<para>For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above
-comments apply on a per-thread basis.  In other words, wrapping is
-thread-safe: each thread must individually observe the above
-restrictions, but there is no need for any kind of inter-thread
-cooperation.</para>
-</sect2>
-
-
-<sect2 id="manual-core.wrapping.limitations-sigs" 
-       xreflabel="Limitations - original function signatures">
-<title>Limitations - original function signatures</title>
-
-<para>As shown in the above example, to call the original you must use a
-macro of the form <computeroutput>CALL_FN_*</computeroutput>.  
-For technical reasons it is impossible
-to create a single macro to deal with all argument types and numbers,
-so a family of macros covering the most common cases is supplied.  In
-what follows, 'W' denotes a machine-word-typed value (a pointer or a
-C <computeroutput>long</computeroutput>), 
-and 'v' denotes C's <computeroutput>void</computeroutput> type.
-The currently available macros are:</para>
-
-<programlisting><![CDATA[
-CALL_FN_v_v       -- call an original of type  void fn ( void )
-CALL_FN_W_v       -- call an original of type  long fn ( void )
-
-CALL_FN_v_W       -- void fn ( long )
-CALL_FN_W_W       -- long fn ( long )
-
-CALL_FN_v_WW      -- void fn ( long, long )
-CALL_FN_W_WW      -- long fn ( long, long )
-
-CALL_FN_v_WWW     -- void fn ( long, long, long )
-CALL_FN_W_WWW     -- long fn ( long, long, long )
-
-CALL_FN_W_WWWW    -- long fn ( long, long, long, long )
-CALL_FN_W_5W      -- long fn ( long, long, long, long, long )
-CALL_FN_W_6W      -- long fn ( long, long, long, long, long, long )
-and so on, up to 
-CALL_FN_W_12W
-]]></programlisting>
-
-<para>The set of supported types can be expanded as needed.  It is
-regrettable that this limitation exists.  Function wrapping has proven
-difficult to implement, with a certain apparently unavoidable level of
-ickyness.  After several implementation attempts, the present
-arrangement appears to be the least-worst tradeoff.  At least it works
-reliably in the presence of dynamic linking and dynamic code
-loading/unloading.</para>
-
-<para>You should not attempt to wrap a function of one type signature with a
-wrapper of a different type signature.  Such trickery will surely lead
-to crashes or strange behaviour.  This is not of course a limitation
-of the function wrapping implementation, merely a reflection of the
-fact that it gives you sweeping powers to shoot yourself in the foot
-if you are not careful.  Imagine the instant havoc you could wreak by
-writing a wrapper which matched any function name in any soname - in
-effect, one which claimed to be a wrapper for all functions in the
-process.</para>
-</sect2>
-
-<sect2 id="manual-core.wrapping.examples" xreflabel="Examples">
-<title>Examples</title>
-
-<para>In the source tree, 
-<computeroutput>memcheck/tests/wrap[1-8].c</computeroutput> provide a series of
-examples, ranging from very simple to quite advanced.</para>
-
-<para><computeroutput>auxprogs/libmpiwrap.c</computeroutput> is an example 
-of wrapping a big, complex API (the MPI-2 interface).  This file defines 
-almost 300 different wrappers.</para>
-</sect2>
-
-</sect1>
-
 
 
 
diff --git a/docs/xml/manual.xml b/docs/xml/manual.xml
index 2b61c6e44f..2514199718 100644
--- a/docs/xml/manual.xml
+++ b/docs/xml/manual.xml
@@ -22,6 +22,8 @@
       xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="manual-core.xml" parse="xml"  
       xmlns:xi="http://www.w3.org/2001/XInclude" />
+  <xi:include href="manual-core-adv.xml" parse="xml"  
+      xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="../../memcheck/docs/mc-manual.xml" parse="xml"  
       xmlns:xi="http://www.w3.org/2001/XInclude" />
   <xi:include href="../../cachegrind/docs/cg-manual.xml" parse="xml"