chances of false positives or false negatives from Memcheck. Also, you
should compile your code with <computeroutput>-Wall</computeroutput> because
it can identify some or all of the problems that Valgrind can miss at the
-higher optimisations levels. (Using <computeroutput>-Wall</computeroutput>
+higher optimisation levels. (Using <computeroutput>-Wall</computeroutput>
is also a good idea in general.) All other tools (as far as we know) are
unaffected by optimisation level.</para>
</listitem>
</varlistentry>
+ <varlistentry id="opt.child-silent-after-fork"
+ xreflabel="--child-silent-after-fork">
+ <term>
+ <option><![CDATA[--child-silent-after-fork=<yes|no> [default: no] ]]></option>
+ </term>
+ <listitem>
+ <para>When enabled, Valgrind will not show any debugging or
+ logging output for the child process resulting from
+ a <varname>fork</varname> call. This can make the output less
+ confusing (although more misleading) when dealing with processes
+ that create children. It is particularly useful in conjunction
+ with <varname>--trace-children=</varname>. Use of this flag is also
+ strongly recommended if you are requesting XML output
+ (<varname>--xml=yes</varname>), since otherwise the XML from child and
+ parent may become mixed up, which usually makes it useless.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="opt.track-fds" xreflabel="--track-fds">
<term>
<option><![CDATA[--track-fds=<yes|no> [default: no] ]]></option>
process to be debugged and each instance of <literal>%f</literal>
expands to the path to the executable for the process to be
debugged.</para>
+
+ <para>Since <computeroutput><command></computeroutput> is likely
+ to contain spaces, you will need to put this entire flag in
+ quotes to ensure it is correctly handled by the shell.</para>
</listitem>
</varlistentry>
</sect1>
-<sect1 id="manual-core.clientreq"
- xreflabel="The Client Request mechanism">
-<title>The Client Request mechanism</title>
-
-<para>Valgrind has a trapdoor mechanism via which the client
-program can pass all manner of requests and queries to Valgrind
-and the current tool. Internally, this is used extensively to
-make malloc, free, etc, work, although you don't see that.</para>
-
-<para>For your convenience, a subset of these so-called client
-requests is provided to allow you to tell Valgrind facts about
-the behaviour of your program, and also to make queries.
-In particular, your program can tell Valgrind about changes in
-memory range permissions that Valgrind would not otherwise know
-about, and so allows clients to get Valgrind to do arbitrary
-custom checks.</para>
-
-<para>Clients need to include a header file to make this work.
-Which header file depends on which client requests you use. Some
-client requests are handled by the core, and are defined in the
-header file <filename>valgrind/valgrind.h</filename>. Tool-specific
-header files are named after the tool, e.g.
-<filename>valgrind/memcheck.h</filename>. All header files can be found
-in the <literal>include/valgrind</literal> directory of wherever Valgrind
-was installed.</para>
-
-<para>The macros in these header files have the magical property
-that they generate code in-line which Valgrind can spot.
-However, the code does nothing when not run on Valgrind, so you
-are not forced to run your program under Valgrind just because you
-use the macros in this file. Also, you are not required to link your
-program with any extra supporting libraries.</para>
-
-<para>The code added to your binary has negligible performance impact:
-on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions
-and is probably undetectable except in tight loops.
-However, if you really wish to compile out the client requests, you can
-compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
-<computeroutput>-DNDEBUG</computeroutput>'s effect on
-<computeroutput>assert()</computeroutput>).
-</para>
-
-<para>You are encouraged to copy the <filename>valgrind/*.h</filename> headers
-into your project's include directory, so your program doesn't have a
-compile-time dependency on Valgrind being installed. The Valgrind headers,
-unlike most of the rest of the code, are under a BSD-style license so you may
-include them without worrying about license incompatibility.</para>
-
-<para>Here is a brief description of the macros available in
-<filename>valgrind.h</filename>, which work with more than one
-tool (see the tool-specific documentation for explanations of the
-tool-specific macros).</para>
-
- <variablelist>
-
- <varlistentry>
- <term><command><computeroutput>RUNNING_ON_VALGRIND</computeroutput></command>:</term>
- <listitem>
- <para>Returns 1 if running on Valgrind, 0 if running on the
- real CPU. If you are running Valgrind on itself, returns the
- number of layers of Valgrind emulation you're running on.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_DISCARD_TRANSLATIONS</computeroutput>:</command></term>
- <listitem>
- <para>Discards translations of code in the specified address
- range. Useful if you are debugging a JIT compiler or some other
- dynamic code generation system. After this call, attempts to
- execute code in the invalidated address range will cause
- Valgrind to make new translations of that code, which is
- probably the semantics you want. Note that code invalidations
- are expensive because finding all the relevant translations
- quickly is very difficult. So try not to call it often.
- Note that you can be clever about
- this: you only need to call it when an area which previously
- contained code is overwritten with new code. You can choose
- to write code into fresh memory, and just call this
- occasionally to discard large chunks of old code all at
- once.</para>
- <para>
- Alternatively, for transparent self-modifying-code support,
- use<computeroutput>--smc-check=all</computeroutput>, or run
- on ppc32/Linux or ppc64/Linux.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_COUNT_ERRORS</computeroutput>:</command></term>
- <listitem>
- <para>Returns the number of errors found so far by Valgrind. Can be
- useful in test harness code when combined with the
- <option>--log-fd=-1</option> option; this runs Valgrind silently,
- but the client program can detect when errors occur. Only useful
- for tools that report errors, e.g. it's useful for Memcheck, but for
- Cachegrind it will always return zero because Cachegrind doesn't
- report errors.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>:</command></term>
- <listitem>
- <para>If your program manages its own memory instead of using
- the standard <computeroutput>malloc()</computeroutput> /
- <computeroutput>new</computeroutput> /
- <computeroutput>new[]</computeroutput>, tools that track
- information about heap blocks will not do nearly as good a
- job. For example, Memcheck won't detect nearly as many
- errors, and the error messages won't be as informative. To
- improve this situation, use this macro just after your custom
- allocator allocates some new memory. See the comments in
- <filename>valgrind.h</filename> for information on how to use
- it.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_FREELIKE_BLOCK</computeroutput>:</command></term>
- <listitem>
- <para>This should be used in conjunction with
- <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>.
- Again, see <filename>memcheck/memcheck.h</filename> for
- information on how to use it.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>:</command></term>
- <listitem>
- <para>This is similar to
- <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>,
- but is tailored towards code that uses memory pools. See the
- comments in <filename>valgrind.h</filename> for information
- on how to use it.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_DESTROY_MEMPOOL</computeroutput>:</command></term>
- <listitem>
- <para>This should be used in conjunction with
- <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
- Again, see the comments in <filename>valgrind.h</filename> for
- information on how to use it.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_MEMPOOL_ALLOC</computeroutput>:</command></term>
- <listitem>
- <para>This should be used in conjunction with
- <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
- Again, see the comments in <filename>valgrind.h</filename> for
- information on how to use it.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_MEMPOOL_FREE</computeroutput>:</command></term>
- <listitem>
- <para>This should be used in conjunction with
- <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
- Again, see the comments in <filename>valgrind.h</filename> for
- information on how to use it.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_NON_SIMD_CALL[0123]</computeroutput>:</command></term>
- <listitem>
- <para>Executes a function of 0, 1, 2 or 3 args in the client
- program on the <emphasis>real</emphasis> CPU, not the virtual
- CPU that Valgrind normally runs code on. These are used in
- various ways internally to Valgrind. They might be useful to
- client programs.</para>
-
- <para><command>Warning:</command> Only use these if you
- <emphasis>really</emphasis> know what you are doing.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_PRINTF(format, ...)</computeroutput>:</command></term>
- <listitem>
- <para>printf a message to the log file when running under
- Valgrind. Nothing is output if not running under Valgrind.
- Returns the number of characters output.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_PRINTF_BACKTRACE(format, ...)</computeroutput>:</command></term>
- <listitem>
- <para>printf a message to the log file along with a stack
- backtrace when running under Valgrind. Nothing is output if
- not running under Valgrind. Returns the number of characters
- output.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_STACK_REGISTER(start, end)</computeroutput>:</command></term>
- <listitem>
- <para>Registers a new stack. Informs Valgrind that the memory range
- between start and end is a unique stack. Returns a stack identifier
- that can be used with other
- <computeroutput>VALGRIND_STACK_*</computeroutput> calls.</para>
- <para>Valgrind will use this information to determine if a change to
- the stack pointer is an item pushed onto the stack or a change over
- to a new stack. Use this if you're using a user-level thread package
- and are noticing spurious errors from Valgrind about uninitialized
- memory reads.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_STACK_DEREGISTER(id)</computeroutput>:</command></term>
- <listitem>
- <para>Deregisters a previously registered stack. Informs
- Valgrind that previously registered memory range with stack id
- <computeroutput>id</computeroutput> is no longer a stack.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term><command><computeroutput>VALGRIND_STACK_CHANGE(id, start, end)</computeroutput>:</command></term>
- <listitem>
- <para>Changes a previously registered stack. Informs
- Valgrind that the previously registered stack with stack id
- <computeroutput>id</computeroutput> has changed its start and end
- values. Use this if your user-level thread package implements
- stack growth.</para>
- </listitem>
- </varlistentry>
-
- </variablelist>
-
-<para>Note that <filename>valgrind.h</filename> is included by
-all the tool-specific header files (such as
-<filename>memcheck.h</filename>), so you don't need to include it
-in your client if you include a tool-specific header.</para>
-
-</sect1>
-
<sect1 id="manual-core.pthreads" xreflabel="Support for Threads">
<para>Valgrind supports programs which use POSIX pthreads.
Getting this to work was technically challenging but it now works
-well enough for significant threaded applications to work.</para>
+well enough for significant threaded applications to run.</para>
<para>The main thing to point out is that although Valgrind works
with the standard Linux threads library (eg. NPTL or LinuxThreads), it
instructions), which means you'll get a much finer interleaving
of thread executions than when run natively. This in itself may
cause your program to behave differently if you have some kind of
-concurrency, critical race, locking, or similar, bugs.</para>
+concurrency, critical race, locking, or similar, bugs. In that case
+you might consider using Valgrind's Helgrind tool to track them down.</para>
<para>Your program will use the native
<computeroutput>libpthread</computeroutput>, but not all of its facilities
-<sect1 id="manual-core.wrapping" xreflabel="Function Wrapping">
-<title>Function wrapping</title>
-<para>
-Valgrind versions 3.2.0 and above can do function wrapping on all
-supported targets. In function wrapping, calls to some specified
-function are intercepted and rerouted to a different, user-supplied
-function. This can do whatever it likes, typically examining the
-arguments, calling onwards to the original, and possibly examining the
-result. Any number of functions may be wrapped.</para>
-<para>
-Function wrapping is useful for instrumenting an API in some way. For
-example, wrapping functions in the POSIX pthreads API makes it
-possible to notify Valgrind of thread status changes, and wrapping
-functions in the MPI (message-passing) API allows notifying Valgrind
-of memory status changes associated with message arrival/departure.
-Such information is usually passed to Valgrind by using client
-requests in the wrapper functions, although that is not of relevance
-here.</para>
-<sect2 id="manual-core.wrapping.example" xreflabel="A Simple Example">
-<title>A Simple Example</title>
-<para>Supposing we want to wrap some function</para>
-<programlisting><![CDATA[
-int foo ( int x, int y ) { return x + y; }]]></programlisting>
+<sect1 id="manual-core.install" xreflabel="Building and Installing">
+<title>Building and Installing Valgrind</title>
-<para>A wrapper is a function of identical type, but with a special name
-which identifies it as the wrapper for <computeroutput>foo</computeroutput>.
-Wrappers need to include
-supporting macros from <computeroutput>valgrind.h</computeroutput>.
-Here is a simple wrapper which prints the arguments and return value:</para>
+<para>We use the standard Unix
+<computeroutput>./configure</computeroutput>,
+<computeroutput>make</computeroutput>, <computeroutput>make
+install</computeroutput> mechanism, and we have attempted to
+ensure that it works on machines with kernel 2.4 or 2.6 and glibc
+2.2.X to 2.5.X. Once you have completed
+<computeroutput>make install</computeroutput> you may then want
+to run the regression tests
+with <computeroutput>make regtest</computeroutput>.
+</para>
-<programlisting><![CDATA[
-#include <stdio.h>
-#include "valgrind.h"
-int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y )
-{
- int result;
- OrigFn fn;
- VALGRIND_GET_ORIG_FN(fn);
- printf("foo's wrapper: args %d %d\n", x, y);
- CALL_FN_W_WW(result, fn, x,y);
- printf("foo's wrapper: result %d\n", result);
- return result;
-}
-]]></programlisting>
+<para>There are five options (in addition to the usual
+<option>--prefix=</option> which affect how Valgrind is built:
+<itemizedlist>
-<para>To become active, the wrapper merely needs to be present in a text
-section somewhere in the same process' address space as the function
-it wraps, and for its ELF symbol name to be visible to Valgrind. In
-practice, this means either compiling to a
-<computeroutput>.o</computeroutput> and linking it in, or
-compiling to a <computeroutput>.so</computeroutput> and
-<computeroutput>LD_PRELOAD</computeroutput>ing it in. The latter is more
-convenient in that it doesn't require relinking.</para>
+ <listitem>
+ <para><option>--enable-inner</option></para>
+ <para>This builds Valgrind with some special magic hacks which make
+ it possible to run it on a standard build of Valgrind (what the
+ developers call "self-hosting"). Ordinarily you should not use
+ this flag as various kinds of safety checks are disabled.
+ </para>
+ </listitem>
-<para>All wrappers have approximately the above form. There are three
-crucial macros:</para>
+ <listitem>
+ <para><option>--enable-tls</option></para>
+ <para>TLS (Thread Local Storage) is a relatively new mechanism which
+ requires compiler, linker and kernel support. Valgrind tries to
+ automatically test if TLS is supported and if so enables this option.
+ Sometimes it cannot test for TLS, so this option allows you to
+ override the automatic test.</para>
+ </listitem>
-<para><computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>:
-this generates the real name of the wrapper.
-This is an encoded name which Valgrind notices when reading symbol
-table information. What it says is: I am the wrapper for any function
-named <computeroutput>foo</computeroutput> which is found in
-an ELF shared object with an empty
-("<computeroutput>NONE</computeroutput>") soname field. The specification
-mechanism is powerful in
-that wildcards are allowed for both sonames and function names.
-The details are discussed below.</para>
+ <listitem>
+ <para><option>--with-vex=</option></para>
+ <para>Specifies the path to the underlying VEX dynamic-translation
+ library. By default this is taken to be in the VEX directory off
+ the root of the source tree.
+ </para>
+ </listitem>
-<para><computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>:
-once in the the wrapper, the first priority is
-to get hold of the address of the original (and any other supporting
-information needed). This is stored in a value of opaque
-type <computeroutput>OrigFn</computeroutput>.
-The information is acquired using
-<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>. It is crucial
-to make this macro call before calling any other wrapped function
-in the same thread.</para>
+ <listitem>
+ <para><option>--enable-only64bit</option></para>
+ <para><option>--enable-only32bit</option></para>
+ <para>On 64-bit
+ platforms (amd64-linux, ppc64-linux), Valgrind is by default built
+ in such a way that both 32-bit and 64-bit executables can be run.
+ Sometimes this cleverness is a problem for a variety of reasons.
+ These two flags allow for single-target builds in this situation.
+ If you issue both, the configure script will complain. Note they
+ are ignored on 32-bit-only platforms (x86-linux, ppc32-linux).
+ </para>
+ </listitem>
-<para><computeroutput>CALL_FN_W_WW</computeroutput>: eventually we will
-want to call the function being
-wrapped. Calling it directly does not work, since that just gets us
-back to the wrapper and tends to kill the program in short order by
-stack overflow. Instead, the result lvalue,
-<computeroutput>OrigFn</computeroutput> and arguments are
-handed to one of a family of macros of the form
-<computeroutput>CALL_FN_*</computeroutput>. These
-cause Valgrind to call the original and avoid recursion back to the
-wrapper.</para>
-</sect2>
+</itemizedlist>
+</para>
-<sect2 id="manual-core.wrapping.specs" xreflabel="Wrapping Specifications">
-<title>Wrapping Specifications</title>
+<para>The <computeroutput>configure</computeroutput> script tests
+the version of the X server currently indicated by the current
+<computeroutput>$DISPLAY</computeroutput>. This is a known bug.
+The intention was to detect the version of the current X
+client libraries, so that correct suppressions could be selected
+for them, but instead the test checks the server version. This
+is just plain wrong.</para>
-<para>This scheme has the advantage of being self-contained. A library of
-wrappers can be compiled to object code in the normal way, and does
-not rely on an external script telling Valgrind which wrappers pertain
-to which originals.</para>
+<para>If you are building a binary package of Valgrind for
+distribution, please read <literal>README_PACKAGERS</literal>
+<xref linkend="dist.readme-packagers"/>. It contains some
+important information.</para>
-<para>Each wrapper has a name which, in the most general case says: I am the
-wrapper for any function whose name matches FNPATT and whose ELF
-"soname" matches SOPATT. Both FNPATT and SOPATT may contain wildcards
-(asterisks) and other characters (spaces, dots, @, etc) which are not
-generally regarded as valid C identifier names.</para>
+<para>Apart from that, there's not much excitement here. Let us
+know if you have build problems.</para>
-<para>This flexibility is needed to write robust wrappers for POSIX pthread
-functions, where typically we are not completely sure of either the
-function name or the soname, or alternatively we want to wrap a whole
-set of functions at once.</para>
+</sect1>
-<para>For example, <computeroutput>pthread_create</computeroutput>
-in GNU libpthread is usually a
-versioned symbol - one whose name ends in, eg,
-<computeroutput>@GLIBC_2.3</computeroutput>. Hence we
-are not sure what its real name is. We also want to cover any soname
-of the form <computeroutput>libpthread.so*</computeroutput>.
-So the header of the wrapper will be</para>
-<programlisting><![CDATA[
-int I_WRAP_SONAME_FNNAME_ZZ(libpthreadZdsoZd0,pthreadZucreateZAZa)
- ( ... formals ... )
- { ... body ... }
-]]></programlisting>
-<para>In order to write unusual characters as valid C function names, a
-Z-encoding scheme is used. Names are written literally, except that
-a capital Z acts as an escape character, with the following encoding:</para>
+<sect1 id="manual-core.problems" xreflabel="If You Have Problems">
+<title>If You Have Problems</title>
-<programlisting><![CDATA[
- Za encodes *
- Zp +
- Zc :
- Zd .
- Zu _
- Zh -
- Zs (space)
- ZA @
- ZZ Z
- ZL ( # only in valgrind 3.3.0 and later
- ZR ) # only in valgrind 3.3.0 and later
-]]></programlisting>
+<para>Contact us at <ulink url="&vg-url;">&vg-url;</ulink>.</para>
-<para>Hence <computeroutput>libpthreadZdsoZd0</computeroutput> is an
-encoding of the soname <computeroutput>libpthread.so.0</computeroutput>
-and <computeroutput>pthreadZucreateZAZa</computeroutput> is an encoding
-of the function name <computeroutput>pthread_create@*</computeroutput>.
-</para>
+<para>See <xref linkend="manual-core.limits"/> for the known
+limitations of Valgrind, and for a list of programs which are
+known not to work on it.</para>
-<para>The macro <computeroutput>I_WRAP_SONAME_FNNAME_ZZ</computeroutput>
-constructs a wrapper name in which
-both the soname (first component) and function name (second component)
-are Z-encoded. Encoding the function name can be tiresome and is
-often unnecessary, so a second macro,
-<computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>, can be
-used instead. The <computeroutput>_ZU</computeroutput> variant is
-also useful for writing wrappers for
-C++ functions, in which the function name is usually already mangled
-using some other convention in which Z plays an important role. Having
-to encode a second time quickly becomes confusing.</para>
+<para>All parts of the system make heavy use of assertions and
+internal self-checks. They are permanently enabled, and we have no
+plans to disable them. If one of them breaks, please mail us!</para>
-<para>Since the function name field may contain wildcards, it can be
-anything, including just <computeroutput>*</computeroutput>.
-The same is true for the soname.
-However, some ELF objects - specifically, main executables - do not
-have sonames. Any object lacking a soname is treated as if its soname
-was <computeroutput>NONE</computeroutput>, which is why the original
-example above had a name
-<computeroutput>I_WRAP_SONAME_FNNAME_ZU(NONE,foo)</computeroutput>.</para>
+<para>If you get an assertion failure
+in <filename>m_mallocfree.c</filename>, this may have happened because
+your program wrote off the end of a malloc'd block, or before its
+beginning. Valgrind hopefully will have emitted a proper message to that
+effect before dying in this way. This is a known problem which
+we should fix.</para>
-<para>Note that the soname of an ELF object is not the same as its
-file name, although it is often similar. You can find the soname of
-an object <computeroutput>libfoo.so</computeroutput> using the command
-<computeroutput>readelf -a libfoo.so | grep soname</computeroutput>.</para>
-</sect2>
+<para>Read the <xref linkend="FAQ"/> for more advice about common problems,
+crashes, etc.</para>
-<sect2 id="manual-core.wrapping.semantics" xreflabel="Wrapping Semantics">
-<title>Wrapping Semantics</title>
+</sect1>
-<para>The ability for a wrapper to replace an infinite family of functions
-is powerful but brings complications in situations where ELF objects
-appear and disappear (are dlopen'd and dlclose'd) on the fly.
-Valgrind tries to maintain sensible behaviour in such situations.</para>
-<para>For example, suppose a process has dlopened (an ELF object with
-soname) <computeroutput>object1.so</computeroutput>, which contains
-<computeroutput>function1</computeroutput>. It starts to use
-<computeroutput>function1</computeroutput> immediately.</para>
-<para>After a while it dlopens <computeroutput>wrappers.so</computeroutput>,
-which contains a wrapper
-for <computeroutput>function1</computeroutput> in (soname)
-<computeroutput>object1.so</computeroutput>. All subsequent calls to
-<computeroutput>function1</computeroutput> are rerouted to the wrapper.</para>
+<sect1 id="manual-core.limits" xreflabel="Limitations">
+<title>Limitations</title>
-<para>If <computeroutput>wrappers.so</computeroutput> is
-later dlclose'd, calls to <computeroutput>function1</computeroutput> are
-naturally routed back to the original.</para>
+<para>The following list of limitations seems long. However, most
+programs actually work fine.</para>
-<para>Alternatively, if <computeroutput>object1.so</computeroutput>
-is dlclose'd but wrappers.so remains,
-then the wrapper exported by <computeroutput>wrapper.so</computeroutput>
-becomes inactive, since there
-is no way to get to it - there is no original to call any more. However,
-Valgrind remembers that the wrapper is still present. If
-<computeroutput>object1.so</computeroutput> is
-eventually dlopen'd again, the wrapper will become active again.</para>
+<para>Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X
+system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the
+following constraints:</para>
-<para>In short, valgrind inspects all code loading/unloading events to
-ensure that the set of currently active wrappers remains consistent.</para>
+ <itemizedlist>
+ <listitem>
+ <para>On x86 and amd64, there is no support for 3DNow! instructions.
+ If the translator encounters these, Valgrind will generate a SIGILL
+ when the instruction is executed. Apart from that, on x86 and amd64,
+ essentially all instructions are supported, up to and including SSE3.
+ </para>
-<para>A second possible problem is that of conflicting wrappers. It is
-easily possible to load two or more wrappers, both of which claim
-to be wrappers for some third function. In such cases Valgrind will
-complain about conflicting wrappers when the second one appears, and
-will honour only the first one.</para>
-</sect2>
+ <para>On ppc32 and ppc64, almost all integer, floating point and Altivec
+ instructions are supported. Specifically: integer and FP insns that are
+ mandatory for PowerPC, the "General-purpose optional" group (fsqrt, fsqrts,
+ stfiwx), the "Graphics optional" group (fre, fres, frsqrte, frsqrtes), and
+ the Altivec (also known as VMX) SIMD instruction set, are supported.</para>
+ </listitem>
-<sect2 id="manual-core.wrapping.debugging" xreflabel="Debugging">
-<title>Debugging</title>
+ <listitem>
+ <para>Atomic instruction sequences are not properly supported, in the
+ sense that their atomicity is not preserved. This will affect any
+ use of synchronization via memory shared between processes. They
+ will appear to work, but fail sporadically.</para>
+ </listitem>
-<para>Figuring out what's going on given the dynamic nature of wrapping
-can be difficult. The
-<computeroutput>--trace-redir=yes</computeroutput> flag makes
-this possible
-by showing the complete state of the redirection subsystem after
-every
-<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
-event affecting code (text).</para>
+ <listitem>
+ <para>If your program does its own memory management, rather than
+ using malloc/new/free/delete, it should still work, but Memcheck's
+ error checking won't be so effective. If you describe your program's
+ memory management scheme using "client requests"
+ (see <xref linkend="manual-core.clientreq"/>), Memcheck can do
+ better. Nevertheless, using malloc/new and free/delete is still the
+ best approach.</para>
+ </listitem>
-<para>There are two central concepts:</para>
+ <listitem>
+ <para>Valgrind's signal simulation is not as robust as it could be.
+ Basic POSIX-compliant sigaction and sigprocmask functionality is
+ supplied, but it's conceivable that things could go badly awry if you
+ do weird things with signals. Workaround: don't. Programs that do
+ non-POSIX signal tricks are in any case inherently unportable, so
+ should be avoided if possible.</para>
+ </listitem>
-<itemizedlist>
+ <listitem>
+ <para>Machine instructions, and system calls, have been implemented
+ on demand. So it's possible, although unlikely, that a program will
+ fall over with a message to that effect. If this happens, please
+ report all the details printed out, so we can try and implement the
+ missing feature.</para>
+ </listitem>
- <listitem><para>A "redirection specification" is a binding of
- a (soname pattern, fnname pattern) pair to a code address.
- These bindings are created by writing functions with names
- made with the
- <computeroutput>I_WRAP_SONAME_FNNAME_{ZZ,_ZU}</computeroutput>
- macros.</para></listitem>
+ <listitem>
+ <para>Memory consumption of your program is majorly increased whilst
+ running under Valgrind. This is due to the large amount of
+ administrative information maintained behind the scenes. Another
+ cause is that Valgrind dynamically translates the original
+ executable. Translated, instrumented code is 12-18 times larger than
+ the original so you can easily end up with 50+ MB of translations
+ when running (eg) a web browser.</para>
+ </listitem>
- <listitem><para>An "active redirection" is code-address to
- code-address binding currently in effect.</para></listitem>
+ <listitem>
+ <para>Valgrind can handle dynamically-generated code just fine. If
+ you regenerate code over the top of old code (ie. at the same memory
+ addresses), if the code is on the stack Valgrind will realise the
+ code has changed, and work correctly. This is necessary to handle
+ the trampolines GCC uses to implemented nested functions. If you
+ regenerate code somewhere other than the stack, you will need to use
+ the <option>--smc-check=all</option> flag, and Valgrind will run more
+ slowly than normal.</para>
+ </listitem>
-</itemizedlist>
+ <listitem>
+ <para>As of version 3.0.0, Valgrind has the following limitations
+ in its implementation of x86/AMD64 floating point relative to
+ IEEE754.</para>
-<para>The state of the wrapping-and-redirection subsystem comprises a set of
-specifications and a set of active bindings. The specifications are
-acquired/discarded by watching all
-<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
-events on code (text)
-sections. The active binding set is (conceptually) recomputed from
-the specifications, and all known symbol names, following any change
-to the specification set.</para>
+ <para>Precision: There is no support for 80 bit arithmetic.
+ Internally, Valgrind represents all such "long double" numbers in 64
+ bits, and so there may be some differences in results. Whether or
+ not this is critical remains to be seen. Note, the x86/amd64
+ fldt/fstpt instructions (read/write 80-bit numbers) are correctly
+ simulated, using conversions to/from 64 bits, so that in-memory
+ images of 80-bit numbers look correct if anyone wants to see.</para>
-<para><computeroutput>--trace-redir=yes</computeroutput> shows the contents
-of both sets following any such event.</para>
+ <para>The impression observed from many FP regression tests is that
+ the accuracy differences aren't significant. Generally speaking, if
+ a program relies on 80-bit precision, there may be difficulties
+ porting it to non x86/amd64 platforms which only support 64-bit FP
+ precision. Even on x86/amd64, the program may get different results
+ depending on whether it is compiled to use SSE2 instructions (64-bits
+ only), or x87 instructions (80-bit). The net effect is to make FP
+ programs behave as if they had been run on a machine with 64-bit IEEE
+ floats, for example PowerPC. On amd64 FP arithmetic is done by
+ default on SSE2, so amd64 looks more like PowerPC than x86 from an FP
+ perspective, and there are far fewer noticeable accuracy differences
+ than with x86.</para>
-<para><computeroutput>-v</computeroutput> prints a line of text each
-time an active specification is used for the first time.</para>
+ <para>Rounding: Valgrind does observe the 4 IEEE-mandated rounding
+ modes (to nearest, to +infinity, to -infinity, to zero) for the
+ following conversions: float to integer, integer to float where
+ there is a possibility of loss of precision, and float-to-float
+ rounding. For all other FP operations, only the IEEE default mode
+ (round to nearest) is supported.</para>
-<para>Hence for maximum debugging effectiveness you will need to use both
-flags.</para>
+ <para>Numeric exceptions in FP code: IEEE754 defines five types of
+ numeric exception that can happen: invalid operation (sqrt of
+ negative number, etc), division by zero, overflow, underflow,
+ inexact (loss of precision).</para>
-<para>One final comment. The function-wrapping facility is closely
-tied to Valgrind's ability to replace (redirect) specified
-functions, for example to redirect calls to
-<computeroutput>malloc</computeroutput> to its
-own implementation. Indeed, a replacement function can be
-regarded as a wrapper function which does not call the original.
-However, to make the implementation more robust, the two kinds
-of interception (wrapping vs replacement) are treated differently.
-</para>
+ <para>For each exception, two courses of action are defined by IEEE754:
+ either (1) a user-defined exception handler may be called, or (2) a
+ default action is defined, which "fixes things up" and allows the
+ computation to proceed without throwing an exception.</para>
-<para><computeroutput>--trace-redir=yes</computeroutput> shows
-specifications and bindings for both
-replacement and wrapper functions. To differentiate the
-two, replacement bindings are printed using
-<computeroutput>R-></computeroutput> whereas
-wraps are printed using <computeroutput>W-></computeroutput>.
-</para>
-</sect2>
+ <para>Currently Valgrind only supports the default fixup actions.
+ Again, feedback on the importance of exception support would be
+ appreciated.</para>
+ <para>When Valgrind detects that the program is trying to exceed any
+ of these limitations (setting exception handlers, rounding mode, or
+ precision control), it can print a message giving a traceback of
+ where this has happened, and continue execution. This behaviour used
+ to be the default, but the messages are annoying and so showing them
+ is now disabled by default. Use <option>--show-emwarns=yes</option> to see
+ them.</para>
-<sect2 id="manual-core.wrapping.limitations-cf"
- xreflabel="Limitations - control flow">
-<title>Limitations - control flow</title>
+ <para>The above limitations define precisely the IEEE754 'default'
+ behaviour: default fixup on all exceptions, round-to-nearest
+ operations, and 64-bit precision.</para>
+ </listitem>
+
+ <listitem>
+ <para>As of version 3.0.0, Valgrind has the following limitations in
+ its implementation of x86/AMD64 SSE2 FP arithmetic, relative to
+ IEEE754.</para>
-<para>For the most part, the function wrapping implementation is robust.
-The only important caveat is: in a wrapper, get hold of
-the <computeroutput>OrigFn</computeroutput> information using
-<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput> before calling any
-other wrapped function. Once you have the
-<computeroutput>OrigFn</computeroutput>, arbitrary
-calls between, recursion between, and longjumps out of wrappers
-should work correctly. There is never any interaction between wrapped
-functions and merely replaced functions
-(eg <computeroutput>malloc</computeroutput>), so you can call
-<computeroutput>malloc</computeroutput> etc safely from within wrappers.
-</para>
+ <para>Essentially the same: no exceptions, and limited observance of
+ rounding mode. Also, SSE2 has control bits which make it treat
+ denormalised numbers as zero (DAZ) and a related action, flush
+ denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be
+ less accurate than IEEE requires. Valgrind detects, ignores, and can
+ warn about, attempts to enable either mode.</para>
+ </listitem>
-<para>The above comments are true for {x86,amd64,ppc32}-linux. On
-ppc64-linux function wrapping is more fragile due to the (arguably
-poorly designed) ppc64-linux ABI. This mandates the use of a shadow
-stack which tracks entries/exits of both wrapper and replacement
-functions. This gives two limitations: firstly, longjumping out of
-wrappers will rapidly lead to disaster, since the shadow stack will
-not get correctly cleared. Secondly, since the shadow stack has
-finite size, recursion between wrapper/replacement functions is only
-possible to a limited depth, beyond which Valgrind has to abort the
-run. This depth is currently 16 calls.</para>
+ <listitem>
+ <para>As of version 3.2.0, Valgrind has the following limitations
+ in its implementation of PPC32 and PPC64 floating point
+ arithmetic, relative to IEEE754.</para>
-<para>For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above
-comments apply on a per-thread basis. In other words, wrapping is
-thread-safe: each thread must individually observe the above
-restrictions, but there is no need for any kind of inter-thread
-cooperation.</para>
-</sect2>
+ <para>Scalar (non-Altivec): Valgrind provides a bit-exact emulation of
+ all floating point instructions, except for "fre" and "fres", which are
+ done more precisely than required by the PowerPC architecture specification.
+ All floating point operations observe the current rounding mode.
+ </para>
+ <para>However, fpscr[FPRF] is not set after each operation. That could
+ be done but would give measurable performance overheads, and so far
+ no need for it has been found.</para>
-<sect2 id="manual-core.wrapping.limitations-sigs"
- xreflabel="Limitations - original function signatures">
-<title>Limitations - original function signatures</title>
+ <para>As on x86/AMD64, IEEE754 exceptions are not supported: all floating
+ point exceptions are handled using the default IEEE fixup actions.
+ Valgrind detects, ignores, and can warn about, attempts to unmask
+ the 5 IEEE FP exception kinds by writing to the floating-point status
+ and control register (fpscr).
+ </para>
-<para>As shown in the above example, to call the original you must use a
-macro of the form <computeroutput>CALL_FN_*</computeroutput>.
-For technical reasons it is impossible
-to create a single macro to deal with all argument types and numbers,
-so a family of macros covering the most common cases is supplied. In
-what follows, 'W' denotes a machine-word-typed value (a pointer or a
-C <computeroutput>long</computeroutput>),
-and 'v' denotes C's <computeroutput>void</computeroutput> type.
-The currently available macros are:</para>
-
-<programlisting><![CDATA[
-CALL_FN_v_v -- call an original of type void fn ( void )
-CALL_FN_W_v -- call an original of type long fn ( void )
-
-CALL_FN_v_W -- void fn ( long )
-CALL_FN_W_W -- long fn ( long )
-
-CALL_FN_v_WW -- void fn ( long, long )
-CALL_FN_W_WW -- long fn ( long, long )
+ <para>Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2:
+ no exceptions, and limited observance of rounding mode.
+ For Altivec, FP arithmetic
+ is done in IEEE/Java mode, which is more accurate than the Linux default
+ setting. "More accurate" means that denormals are handled properly,
+ rather than simply being flushed to zero.</para>
+ </listitem>
+ </itemizedlist>
-CALL_FN_v_WWW -- void fn ( long, long, long )
-CALL_FN_W_WWW -- long fn ( long, long, long )
+ <para>Programs which are known not to work are:</para>
+ <itemizedlist>
+ <listitem>
+ <para>emacs starts up but immediately concludes it is out of
+ memory and aborts. It may be that Memcheck does not provide
+ a good enough emulation of the
+ <computeroutput>mallinfo</computeroutput> function.
+ Emacs works fine if you build it to use
+ the standard malloc/free routines.</para>
+ </listitem>
+ </itemizedlist>
-CALL_FN_W_WWWW -- long fn ( long, long, long, long )
-CALL_FN_W_5W -- long fn ( long, long, long, long, long )
-CALL_FN_W_6W -- long fn ( long, long, long, long, long, long )
-and so on, up to
-CALL_FN_W_12W
-]]></programlisting>
+</sect1>
-<para>The set of supported types can be expanded as needed. It is
-regrettable that this limitation exists. Function wrapping has proven
-difficult to implement, with a certain apparently unavoidable level of
-ickyness. After several implementation attempts, the present
-arrangement appears to be the least-worst tradeoff. At least it works
-reliably in the presence of dynamic linking and dynamic code
-loading/unloading.</para>
-<para>You should not attempt to wrap a function of one type signature with a
-wrapper of a different type signature. Such trickery will surely lead
-to crashes or strange behaviour. This is not of course a limitation
-of the function wrapping implementation, merely a reflection of the
-fact that it gives you sweeping powers to shoot yourself in the foot
-if you are not careful. Imagine the instant havoc you could wreak by
-writing a wrapper which matched any function name in any soname - in
-effect, one which claimed to be a wrapper for all functions in the
-process.</para>
-</sect2>
+<sect1 id="manual-core.example" xreflabel="An Example Run">
+<title>An Example Run</title>
-<sect2 id="manual-core.wrapping.examples" xreflabel="Examples">
-<title>Examples</title>
+<para>This is the log for a run of a small program using Memcheck.
+The program is in fact correct, and the reported error is as the
+result of a potentially serious code generation bug in GNU g++
+(snapshot 20010527).</para>
-<para>In the source tree,
-<computeroutput>memcheck/tests/wrap[1-8].c</computeroutput> provide a series of
-examples, ranging from very simple to quite advanced.</para>
+<programlisting><![CDATA[
+sewardj@phoenix:~/newmat10$ ~/Valgrind-6/valgrind -v ./bogon
+==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
+==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
+==25832== Startup, with flags:
+==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
+==25832== reading syms from /lib/ld-linux.so.2
+==25832== reading syms from /lib/libc.so.6
+==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
+==25832== reading syms from /lib/libm.so.6
+==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
+==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
+==25832== reading syms from /proc/self/exe
+==25832==
+==25832== Invalid read of size 4
+==25832== at 0x8048724: BandMatrix::ReSize(int,int,int) (bogon.cpp:45)
+==25832== by 0x80487AF: main (bogon.cpp:66)
+==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
+==25832==
+==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
+==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
+==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
+==25832== For a detailed leak analysis, rerun with: --leak-check=yes
+]]></programlisting>
-<para><computeroutput>auxprogs/libmpiwrap.c</computeroutput> is an example
-of wrapping a big, complex API (the MPI-2 interface). This file defines
-almost 300 different wrappers.</para>
-</sect2>
+<para>The GCC folks fixed this about a week before gcc-3.0
+shipped.</para>
</sect1>
+<sect1 id="manual-core.warnings" xreflabel="Warning Messages">
+<title>Warning Messages You Might See</title>
-<sect1 id="manual-core.install" xreflabel="Building and Installing">
-<title>Building and Installing Valgrind</title>
-
-<para>We use the standard Unix
-<computeroutput>./configure</computeroutput>,
-<computeroutput>make</computeroutput>, <computeroutput>make
-install</computeroutput> mechanism, and we have attempted to
-ensure that it works on machines with kernel 2.4 or 2.6 and glibc
-2.2.X to 2.5.X. Once you have completed
-<computeroutput>make install</computeroutput> you may then want
-to run the regression tests
-with <computeroutput>make regtest</computeroutput>.
-</para>
+<para>Most of these only appear if you run in verbose mode
+(enabled by <computeroutput>-v</computeroutput>):</para>
-<para>There are five options (in addition to the usual
-<option>--prefix=</option> which affect how Valgrind is built:
-<itemizedlist>
+ <itemizedlist>
<listitem>
- <para><option>--enable-inner</option></para>
- <para>This builds Valgrind with some special magic hacks which make
- it possible to run it on a standard build of Valgrind (what the
- developers call "self-hosting"). Ordinarily you should not use
- this flag as various kinds of safety checks are disabled.
- </para>
+ <para><computeroutput>More than 100 errors detected. Subsequent
+ errors will still be recorded, but in less detail than
+ before.</computeroutput></para>
+
+ <para>After 100 different errors have been shown, Valgrind becomes
+ more conservative about collecting them. It then requires only the
+ program counters in the top two stack frames to match when deciding
+ whether or not two errors are really the same one. Prior to this
+ point, the PCs in the top four frames are required to match. This
+ hack has the effect of slowing down the appearance of new errors
+ after the first 100. The 100 constant can be changed by recompiling
+ Valgrind.</para>
</listitem>
<listitem>
- <para><option>--enable-tls</option></para>
- <para>TLS (Thread Local Storage) is a relatively new mechanism which
- requires compiler, linker and kernel support. Valgrind tries to
- automatically test if TLS is supported and if so enables this option.
- Sometimes it cannot test for TLS, so this option allows you to
- override the automatic test.</para>
+ <para><computeroutput>More than 1000 errors detected. I'm not
+ reporting any more. Final error counts may be inaccurate. Go fix
+ your program!</computeroutput></para>
+
+ <para>After 1000 different errors have been detected, Valgrind
+ ignores any more. It seems unlikely that collecting even more
+ different ones would be of practical help to anybody, and it avoids
+ the danger that Valgrind spends more and more of its time comparing
+ new errors against an ever-growing collection. As above, the 1000
+ number is a compile-time constant.</para>
</listitem>
<listitem>
- <para><option>--with-vex=</option></para>
- <para>Specifies the path to the underlying VEX dynamic-translation
- library. By default this is taken to be in the VEX directory off
- the root of the source tree.
- </para>
+ <para><computeroutput>Warning: client switching stacks?</computeroutput></para>
+
+ <para>Valgrind spotted such a large change in the stack pointer
+ that it guesses the client is switching to
+ a different stack. At this point it makes a kludgey guess where the
+ base of the new stack is, and sets memory permissions accordingly.
+ You may get many bogus error messages following this, if Valgrind
+ guesses wrong. At the moment "large change" is defined as a change
+ of more that 2000000 in the value of the
+ stack pointer register.</para>
</listitem>
<listitem>
- <para><option>--enable-only64bit</option></para>
- <para><option>--enable-only32bit</option></para>
- <para>On 64-bit
- platforms (amd64-linux, ppc64-linux), Valgrind is by default built
- in such a way that both 32-bit and 64-bit executables can be run.
- Sometimes this cleverness is a problem for a variety of reasons.
- These two flags allow for single-target builds in this situation.
- If you issue both, the configure script will complain. Note they
- are ignored on 32-bit-only platforms (x86-linux, ppc32-linux).
- </para>
+ <para><computeroutput>Warning: client attempted to close Valgrind's
+ logfile fd <number></computeroutput></para>
+
+ <para>Valgrind doesn't allow the client to close the logfile,
+ because you'd never see any diagnostic information after that point.
+ If you see this message, you may want to use the
+ <option>--log-fd=<number></option> option to specify a
+ different logfile file-descriptor number.</para>
</listitem>
-</itemizedlist>
-</para>
+ <listitem>
+ <para><computeroutput>Warning: noted but unhandled ioctl
+ <number></computeroutput></para>
-<para>The <computeroutput>configure</computeroutput> script tests
-the version of the X server currently indicated by the current
-<computeroutput>$DISPLAY</computeroutput>. This is a known bug.
-The intention was to detect the version of the current X
-client libraries, so that correct suppressions could be selected
-for them, but instead the test checks the server version. This
-is just plain wrong.</para>
+ <para>Valgrind observed a call to one of the vast family of
+ <computeroutput>ioctl</computeroutput> system calls, but did not
+ modify its memory status info (because nobody has yet written a
+ suitable wrapper). The call will still have gone through, but you may get
+ spurious errors after this as a result of the non-update of the
+ memory info.</para>
+ </listitem>
-<para>If you are building a binary package of Valgrind for
-distribution, please read <literal>README_PACKAGERS</literal>
-<xref linkend="dist.readme-packagers"/>. It contains some
-important information.</para>
+ <listitem>
+ <para><computeroutput>Warning: set address range perms: large range
+ <number></computeroutput></para>
-<para>Apart from that, there's not much excitement here. Let us
-know if you have build problems.</para>
+ <para>Diagnostic message, mostly for benefit of the Valgrind
+ developers, to do with memory permissions.</para>
+ </listitem>
+
+ </itemizedlist>
</sect1>
-<sect1 id="manual-core.problems" xreflabel="If You Have Problems">
-<title>If You Have Problems</title>
+<sect1 id="manual-core.clientreq"
+ xreflabel="The Client Request mechanism">
+<title>The Client Request mechanism</title>
-<para>Contact us at <ulink url="&vg-url;">&vg-url;</ulink>.</para>
+<para>Valgrind has a trapdoor mechanism via which the client
+program can pass all manner of requests and queries to Valgrind
+and the current tool. Internally, this is used extensively to
+make malloc, free, etc, work, although you don't see that.</para>
-<para>See <xref linkend="manual-core.limits"/> for the known
-limitations of Valgrind, and for a list of programs which are
-known not to work on it.</para>
+<para>For your convenience, a subset of these so-called client
+requests is provided to allow you to tell Valgrind facts about
+the behaviour of your program, and also to make queries.
+In particular, your program can tell Valgrind about changes in
+memory range permissions that Valgrind would not otherwise know
+about, and so allows clients to get Valgrind to do arbitrary
+custom checks.</para>
-<para>All parts of the system make heavy use of assertions and
-internal self-checks. They are permanently enabled, and we have no
-plans to disable them. If one of them breaks, please mail us!</para>
+<para>Clients need to include a header file to make this work.
+Which header file depends on which client requests you use. Some
+client requests are handled by the core, and are defined in the
+header file <filename>valgrind/valgrind.h</filename>. Tool-specific
+header files are named after the tool, e.g.
+<filename>valgrind/memcheck.h</filename>. All header files can be found
+in the <literal>include/valgrind</literal> directory of wherever Valgrind
+was installed.</para>
-<para>If you get an assertion failure
-in <filename>m_mallocfree.c</filename>, this may have happened because
-your program wrote off the end of a malloc'd block, or before its
-beginning. Valgrind hopefully will have emitted a proper message to that
-effect before dying in this way. This is a known problem which
-we should fix.</para>
-
-<para>Read the <xref linkend="FAQ"/> for more advice about common problems,
-crashes, etc.</para>
-
-</sect1>
-
-
-
-<sect1 id="manual-core.limits" xreflabel="Limitations">
-<title>Limitations</title>
-
-<para>The following list of limitations seems long. However, most
-programs actually work fine.</para>
-
-<para>Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X
-system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the
-following constraints:</para>
-
- <itemizedlist>
- <listitem>
- <para>On x86 and amd64, there is no support for 3DNow! instructions.
- If the translator encounters these, Valgrind will generate a SIGILL
- when the instruction is executed. Apart from that, on x86 and amd64,
- essentially all instructions are supported, up to and including SSE3.
- </para>
-
- <para>On ppc32 and ppc64, almost all integer, floating point and Altivec
- instructions are supported. Specifically: integer and FP insns that are
- mandatory for PowerPC, the "General-purpose optional" group (fsqrt, fsqrts,
- stfiwx), the "Graphics optional" group (fre, fres, frsqrte, frsqrtes), and
- the Altivec (also known as VMX) SIMD instruction set, are supported.</para>
- </listitem>
-
- <listitem>
- <para>Atomic instruction sequences are not properly supported, in the
- sense that their atomicity is not preserved. This will affect any
- use of synchronization via memory shared between processes. They
- will appear to work, but fail sporadically.</para>
- </listitem>
-
- <listitem>
- <para>If your program does its own memory management, rather than
- using malloc/new/free/delete, it should still work, but Valgrind's
- error checking won't be so effective. If you describe your program's
- memory management scheme using "client requests"
- (see <xref linkend="manual-core.clientreq"/>), Memcheck can do
- better. Nevertheless, using malloc/new and free/delete is still the
- best approach.</para>
- </listitem>
-
- <listitem>
- <para>Valgrind's signal simulation is not as robust as it could be.
- Basic POSIX-compliant sigaction and sigprocmask functionality is
- supplied, but it's conceivable that things could go badly awry if you
- do weird things with signals. Workaround: don't. Programs that do
- non-POSIX signal tricks are in any case inherently unportable, so
- should be avoided if possible.</para>
- </listitem>
-
- <listitem>
- <para>Machine instructions, and system calls, have been implemented
- on demand. So it's possible, although unlikely, that a program will
- fall over with a message to that effect. If this happens, please
- report all the details printed out, so we can try and implement the
- missing feature.</para>
- </listitem>
+<para>The macros in these header files have the magical property
+that they generate code in-line which Valgrind can spot.
+However, the code does nothing when not run on Valgrind, so you
+are not forced to run your program under Valgrind just because you
+use the macros in this file. Also, you are not required to link your
+program with any extra supporting libraries.</para>
- <listitem>
- <para>Memory consumption of your program is majorly increased whilst
- running under Valgrind. This is due to the large amount of
- administrative information maintained behind the scenes. Another
- cause is that Valgrind dynamically translates the original
- executable. Translated, instrumented code is 12-18 times larger than
- the original so you can easily end up with 50+ MB of translations
- when running (eg) a web browser.</para>
- </listitem>
+<para>The code added to your binary has negligible performance impact:
+on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions
+and is probably undetectable except in tight loops.
+However, if you really wish to compile out the client requests, you can
+compile with <computeroutput>-DNVALGRIND</computeroutput> (analogous to
+<computeroutput>-DNDEBUG</computeroutput>'s effect on
+<computeroutput>assert()</computeroutput>).
+</para>
- <listitem>
- <para>Valgrind can handle dynamically-generated code just fine. If
- you regenerate code over the top of old code (ie. at the same memory
- addresses), if the code is on the stack Valgrind will realise the
- code has changed, and work correctly. This is necessary to handle
- the trampolines GCC uses to implemented nested functions. If you
- regenerate code somewhere other than the stack, you will need to use
- the <option>--smc-check=all</option> flag, and Valgrind will run more
- slowly than normal.</para>
- </listitem>
+<para>You are encouraged to copy the <filename>valgrind/*.h</filename> headers
+into your project's include directory, so your program doesn't have a
+compile-time dependency on Valgrind being installed. The Valgrind headers,
+unlike most of the rest of the code, are under a BSD-style license so you may
+include them without worrying about license incompatibility.</para>
- <listitem>
- <para>As of version 3.0.0, Valgrind has the following limitations
- in its implementation of x86/AMD64 floating point relative to
- IEEE754.</para>
+<para>Here is a brief description of the macros available in
+<filename>valgrind.h</filename>, which work with more than one
+tool (see the tool-specific documentation for explanations of the
+tool-specific macros).</para>
- <para>Precision: There is no support for 80 bit arithmetic.
- Internally, Valgrind represents all such "long double" numbers in 64
- bits, and so there may be some differences in results. Whether or
- not this is critical remains to be seen. Note, the x86/amd64
- fldt/fstpt instructions (read/write 80-bit numbers) are correctly
- simulated, using conversions to/from 64 bits, so that in-memory
- images of 80-bit numbers look correct if anyone wants to see.</para>
+ <variablelist>
- <para>The impression observed from many FP regression tests is that
- the accuracy differences aren't significant. Generally speaking, if
- a program relies on 80-bit precision, there may be difficulties
- porting it to non x86/amd64 platforms which only support 64-bit FP
- precision. Even on x86/amd64, the program may get different results
- depending on whether it is compiled to use SSE2 instructions (64-bits
- only), or x87 instructions (80-bit). The net effect is to make FP
- programs behave as if they had been run on a machine with 64-bit IEEE
- floats, for example PowerPC. On amd64 FP arithmetic is done by
- default on SSE2, so amd64 looks more like PowerPC than x86 from an FP
- perspective, and there are far fewer noticeable accuracy differences
- than with x86.</para>
+ <varlistentry>
+ <term><command><computeroutput>RUNNING_ON_VALGRIND</computeroutput></command>:</term>
+ <listitem>
+ <para>Returns 1 if running on Valgrind, 0 if running on the
+ real CPU. If you are running Valgrind on itself, returns the
+ number of layers of Valgrind emulation you're running on.
+ </para>
+ </listitem>
+ </varlistentry>
- <para>Rounding: Valgrind does observe the 4 IEEE-mandated rounding
- modes (to nearest, to +infinity, to -infinity, to zero) for the
- following conversions: float to integer, integer to float where
- there is a possibility of loss of precision, and float-to-float
- rounding. For all other FP operations, only the IEEE default mode
- (round to nearest) is supported.</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_DISCARD_TRANSLATIONS</computeroutput>:</command></term>
+ <listitem>
+ <para>Discards translations of code in the specified address
+ range. Useful if you are debugging a JIT compiler or some other
+ dynamic code generation system. After this call, attempts to
+ execute code in the invalidated address range will cause
+ Valgrind to make new translations of that code, which is
+ probably the semantics you want. Note that code invalidations
+ are expensive because finding all the relevant translations
+ quickly is very difficult. So try not to call it often.
+ Note that you can be clever about
+ this: you only need to call it when an area which previously
+ contained code is overwritten with new code. You can choose
+ to write code into fresh memory, and just call this
+ occasionally to discard large chunks of old code all at
+ once.</para>
+ <para>
+ Alternatively, for transparent self-modifying-code support,
+ use<computeroutput>--smc-check=all</computeroutput>, or run
+ on ppc32/Linux or ppc64/Linux.
+ </para>
+ </listitem>
+ </varlistentry>
- <para>Numeric exceptions in FP code: IEEE754 defines five types of
- numeric exception that can happen: invalid operation (sqrt of
- negative number, etc), division by zero, overflow, underflow,
- inexact (loss of precision).</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_COUNT_ERRORS</computeroutput>:</command></term>
+ <listitem>
+ <para>Returns the number of errors found so far by Valgrind. Can be
+ useful in test harness code when combined with the
+ <option>--log-fd=-1</option> option; this runs Valgrind silently,
+ but the client program can detect when errors occur. Only useful
+ for tools that report errors, e.g. it's useful for Memcheck, but for
+ Cachegrind it will always return zero because Cachegrind doesn't
+ report errors.</para>
+ </listitem>
+ </varlistentry>
- <para>For each exception, two courses of action are defined by IEEE754:
- either (1) a user-defined exception handler may be called, or (2) a
- default action is defined, which "fixes things up" and allows the
- computation to proceed without throwing an exception.</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>:</command></term>
+ <listitem>
+ <para>If your program manages its own memory instead of using
+ the standard <computeroutput>malloc()</computeroutput> /
+ <computeroutput>new</computeroutput> /
+ <computeroutput>new[]</computeroutput>, tools that track
+ information about heap blocks will not do nearly as good a
+ job. For example, Memcheck won't detect nearly as many
+ errors, and the error messages won't be as informative. To
+ improve this situation, use this macro just after your custom
+ allocator allocates some new memory. See the comments in
+ <filename>valgrind.h</filename> for information on how to use
+ it.</para>
+ </listitem>
+ </varlistentry>
- <para>Currently Valgrind only supports the default fixup actions.
- Again, feedback on the importance of exception support would be
- appreciated.</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_FREELIKE_BLOCK</computeroutput>:</command></term>
+ <listitem>
+ <para>This should be used in conjunction with
+ <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>.
+ Again, see <filename>memcheck/memcheck.h</filename> for
+ information on how to use it.</para>
+ </listitem>
+ </varlistentry>
- <para>When Valgrind detects that the program is trying to exceed any
- of these limitations (setting exception handlers, rounding mode, or
- precision control), it can print a message giving a traceback of
- where this has happened, and continue execution. This behaviour used
- to be the default, but the messages are annoying and so showing them
- is now disabled by default. Use <option>--show-emwarns=yes</option> to see
- them.</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>:</command></term>
+ <listitem>
+ <para>This is similar to
+ <computeroutput>VALGRIND_MALLOCLIKE_BLOCK</computeroutput>,
+ but is tailored towards code that uses memory pools. See the
+ comments in <filename>valgrind.h</filename> for information
+ on how to use it.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_DESTROY_MEMPOOL</computeroutput>:</command></term>
+ <listitem>
+ <para>This should be used in conjunction with
+ <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
+ Again, see the comments in <filename>valgrind.h</filename> for
+ information on how to use it.</para>
+ </listitem>
+ </varlistentry>
- <para>The above limitations define precisely the IEEE754 'default'
- behaviour: default fixup on all exceptions, round-to-nearest
- operations, and 64-bit precision.</para>
- </listitem>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_MEMPOOL_ALLOC</computeroutput>:</command></term>
+ <listitem>
+ <para>This should be used in conjunction with
+ <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
+ Again, see the comments in <filename>valgrind.h</filename> for
+ information on how to use it.</para>
+ </listitem>
+ </varlistentry>
- <listitem>
- <para>As of version 3.0.0, Valgrind has the following limitations in
- its implementation of x86/AMD64 SSE2 FP arithmetic, relative to
- IEEE754.</para>
-
- <para>Essentially the same: no exceptions, and limited observance of
- rounding mode. Also, SSE2 has control bits which make it treat
- denormalised numbers as zero (DAZ) and a related action, flush
- denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be
- less accurate than IEEE requires. Valgrind detects, ignores, and can
- warn about, attempts to enable either mode.</para>
- </listitem>
-
- <listitem>
- <para>As of version 3.2.0, Valgrind has the following limitations
- in its implementation of PPC32 and PPC64 floating point
- arithmetic, relative to IEEE754.</para>
-
- <para>Scalar (non-Altivec): Valgrind provides a bit-exact emulation of
- all floating point instructions, except for "fre" and "fres", which are
- done more precisely than required by the PowerPC architecture specification.
- All floating point operations observe the current rounding mode.
- </para>
-
- <para>However, fpscr[FPRF] is not set after each operation. That could
- be done but would give measurable performance overheads, and so far
- no need for it has been found.</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_MEMPOOL_FREE</computeroutput>:</command></term>
+ <listitem>
+ <para>This should be used in conjunction with
+ <computeroutput>VALGRIND_CREATE_MEMPOOL</computeroutput>.
+ Again, see the comments in <filename>valgrind.h</filename> for
+ information on how to use it.</para>
+ </listitem>
+ </varlistentry>
- <para>As on x86/AMD64, IEEE754 exceptions are not supported: all floating
- point exceptions are handled using the default IEEE fixup actions.
- Valgrind detects, ignores, and can warn about, attempts to unmask
- the 5 IEEE FP exception kinds by writing to the floating-point status
- and control register (fpscr).
- </para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_NON_SIMD_CALL[0123]</computeroutput>:</command></term>
+ <listitem>
+ <para>Executes a function of 0, 1, 2 or 3 args in the client
+ program on the <emphasis>real</emphasis> CPU, not the virtual
+ CPU that Valgrind normally runs code on. These are used in
+ various ways internally to Valgrind. They might be useful to
+ client programs.</para>
- <para>Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2:
- no exceptions, and limited observance of rounding mode.
- For Altivec, FP arithmetic
- is done in IEEE/Java mode, which is more accurate than the Linux default
- setting. "More accurate" means that denormals are handled properly,
- rather than simply being flushed to zero.</para>
- </listitem>
- </itemizedlist>
+ <para><command>Warning:</command> Only use these if you
+ <emphasis>really</emphasis> know what you are doing.</para>
+ </listitem>
+ </varlistentry>
- <para>Programs which are known not to work are:</para>
- <itemizedlist>
- <listitem>
- <para>emacs starts up but immediately concludes it is out of
- memory and aborts. It may be that Memcheck does not provide
- a good enough emulation of the
- <computeroutput>mallinfo</computeroutput> function.
- Emacs works fine if you build it to use
- the standard malloc/free routines.</para>
- </listitem>
- </itemizedlist>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_PRINTF(format, ...)</computeroutput>:</command></term>
+ <listitem>
+ <para>printf a message to the log file when running under
+ Valgrind. Nothing is output if not running under Valgrind.
+ Returns the number of characters output.</para>
+ </listitem>
+ </varlistentry>
-</sect1>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_PRINTF_BACKTRACE(format, ...)</computeroutput>:</command></term>
+ <listitem>
+ <para>printf a message to the log file along with a stack
+ backtrace when running under Valgrind. Nothing is output if
+ not running under Valgrind. Returns the number of characters
+ output.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_STACK_REGISTER(start, end)</computeroutput>:</command></term>
+ <listitem>
+ <para>Registers a new stack. Informs Valgrind that the memory range
+ between start and end is a unique stack. Returns a stack identifier
+ that can be used with other
+ <computeroutput>VALGRIND_STACK_*</computeroutput> calls.</para>
+ <para>Valgrind will use this information to determine if a change to
+ the stack pointer is an item pushed onto the stack or a change over
+ to a new stack. Use this if you're using a user-level thread package
+ and are noticing spurious errors from Valgrind about uninitialized
+ memory reads.</para>
+ </listitem>
+ </varlistentry>
-<sect1 id="manual-core.example" xreflabel="An Example Run">
-<title>An Example Run</title>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_STACK_DEREGISTER(id)</computeroutput>:</command></term>
+ <listitem>
+ <para>Deregisters a previously registered stack. Informs
+ Valgrind that previously registered memory range with stack id
+ <computeroutput>id</computeroutput> is no longer a stack.</para>
+ </listitem>
+ </varlistentry>
-<para>This is the log for a run of a small program using Memcheck.
-The program is in fact correct, and the reported error is as the
-result of a potentially serious code generation bug in GNU g++
-(snapshot 20010527).</para>
+ <varlistentry>
+ <term><command><computeroutput>VALGRIND_STACK_CHANGE(id, start, end)</computeroutput>:</command></term>
+ <listitem>
+ <para>Changes a previously registered stack. Informs
+ Valgrind that the previously registered stack with stack id
+ <computeroutput>id</computeroutput> has changed its start and end
+ values. Use this if your user-level thread package implements
+ stack growth.</para>
+ </listitem>
+ </varlistentry>
-<programlisting><![CDATA[
-sewardj@phoenix:~/newmat10$ ~/Valgrind-6/valgrind -v ./bogon
-==25832== Valgrind 0.10, a memory error detector for x86 RedHat 7.1.
-==25832== Copyright (C) 2000-2001, and GNU GPL'd, by Julian Seward.
-==25832== Startup, with flags:
-==25832== --suppressions=/home/sewardj/Valgrind/redhat71.supp
-==25832== reading syms from /lib/ld-linux.so.2
-==25832== reading syms from /lib/libc.so.6
-==25832== reading syms from /mnt/pima/jrs/Inst/lib/libgcc_s.so.0
-==25832== reading syms from /lib/libm.so.6
-==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3
-==25832== reading syms from /home/sewardj/Valgrind/valgrind.so
-==25832== reading syms from /proc/self/exe
-==25832==
-==25832== Invalid read of size 4
-==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45)
-==25832== by 0x80487AF: main (bogon.cpp:66)
-==25832== Address 0xBFFFF74C is not stack'd, malloc'd or free'd
-==25832==
-==25832== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
-==25832== malloc/free: in use at exit: 0 bytes in 0 blocks.
-==25832== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
-==25832== For a detailed leak analysis, rerun with: --leak-check=yes
-==25832==
-==25832== exiting, did 1881 basic blocks, 0 misses.
-==25832== 223 translations, 3626 bytes in, 56801 bytes out.]]></programlisting>
+ </variablelist>
-<para>The GCC folks fixed this about a week before gcc-3.0
-shipped.</para>
+<para>Note that <filename>valgrind.h</filename> is included by
+all the tool-specific header files (such as
+<filename>memcheck.h</filename>), so you don't need to include it
+in your client if you include a tool-specific header.</para>
</sect1>
-<sect1 id="manual-core.warnings" xreflabel="Warning Messages">
-<title>Warning Messages You Might See</title>
-<para>Most of these only appear if you run in verbose mode
-(enabled by <computeroutput>-v</computeroutput>):</para>
- <itemizedlist>
- <listitem>
- <para><computeroutput>More than 100 errors detected. Subsequent
- errors will still be recorded, but in less detail than
- before.</computeroutput></para>
+<sect1 id="manual-core.wrapping" xreflabel="Function Wrapping">
+<title>Function wrapping</title>
- <para>After 100 different errors have been shown, Valgrind becomes
- more conservative about collecting them. It then requires only the
- program counters in the top two stack frames to match when deciding
- whether or not two errors are really the same one. Prior to this
- point, the PCs in the top four frames are required to match. This
- hack has the effect of slowing down the appearance of new errors
- after the first 100. The 100 constant can be changed by recompiling
- Valgrind.</para>
- </listitem>
+<para>
+Valgrind versions 3.2.0 and above can do function wrapping on all
+supported targets. In function wrapping, calls to some specified
+function are intercepted and rerouted to a different, user-supplied
+function. This can do whatever it likes, typically examining the
+arguments, calling onwards to the original, and possibly examining the
+result. Any number of functions may be wrapped.</para>
- <listitem>
- <para><computeroutput>More than 1000 errors detected. I'm not
- reporting any more. Final error counts may be inaccurate. Go fix
- your program!</computeroutput></para>
+<para>
+Function wrapping is useful for instrumenting an API in some way. For
+example, wrapping functions in the POSIX pthreads API makes it
+possible to notify Valgrind of thread status changes, and wrapping
+functions in the MPI (message-passing) API allows notifying Valgrind
+of memory status changes associated with message arrival/departure.
+Such information is usually passed to Valgrind by using client
+requests in the wrapper functions, although that is not of relevance
+here.</para>
- <para>After 1000 different errors have been detected, Valgrind
- ignores any more. It seems unlikely that collecting even more
- different ones would be of practical help to anybody, and it avoids
- the danger that Valgrind spends more and more of its time comparing
- new errors against an ever-growing collection. As above, the 1000
- number is a compile-time constant.</para>
- </listitem>
+<sect2 id="manual-core.wrapping.example" xreflabel="A Simple Example">
+<title>A Simple Example</title>
- <listitem>
- <para><computeroutput>Warning: client switching stacks?</computeroutput></para>
+<para>Supposing we want to wrap some function</para>
- <para>Valgrind spotted such a large change in the stack pointer
- that it guesses the client is switching to
- a different stack. At this point it makes a kludgey guess where the
- base of the new stack is, and sets memory permissions accordingly.
- You may get many bogus error messages following this, if Valgrind
- guesses wrong. At the moment "large change" is defined as a change
- of more that 2000000 in the value of the
- stack pointer register.</para>
- </listitem>
+<programlisting><![CDATA[
+int foo ( int x, int y ) { return x + y; }]]></programlisting>
- <listitem>
- <para><computeroutput>Warning: client attempted to close Valgrind's
- logfile fd <number></computeroutput></para>
+<para>A wrapper is a function of identical type, but with a special name
+which identifies it as the wrapper for <computeroutput>foo</computeroutput>.
+Wrappers need to include
+supporting macros from <computeroutput>valgrind.h</computeroutput>.
+Here is a simple wrapper which prints the arguments and return value:</para>
- <para>Valgrind doesn't allow the client to close the logfile,
- because you'd never see any diagnostic information after that point.
- If you see this message, you may want to use the
- <option>--log-fd=<number></option> option to specify a
- different logfile file-descriptor number.</para>
- </listitem>
+<programlisting><![CDATA[
+#include <stdio.h>
+#include "valgrind.h"
+int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y )
+{
+ int result;
+ OrigFn fn;
+ VALGRIND_GET_ORIG_FN(fn);
+ printf("foo's wrapper: args %d %d\n", x, y);
+ CALL_FN_W_WW(result, fn, x,y);
+ printf("foo's wrapper: result %d\n", result);
+ return result;
+}
+]]></programlisting>
- <listitem>
- <para><computeroutput>Warning: noted but unhandled ioctl
- <number></computeroutput></para>
+<para>To become active, the wrapper merely needs to be present in a text
+section somewhere in the same process' address space as the function
+it wraps, and for its ELF symbol name to be visible to Valgrind. In
+practice, this means either compiling to a
+<computeroutput>.o</computeroutput> and linking it in, or
+compiling to a <computeroutput>.so</computeroutput> and
+<computeroutput>LD_PRELOAD</computeroutput>ing it in. The latter is more
+convenient in that it doesn't require relinking.</para>
- <para>Valgrind observed a call to one of the vast family of
- <computeroutput>ioctl</computeroutput> system calls, but did not
- modify its memory status info (because nobody has yet written a
- suitable wrapper). The call will still have gone through, but you may get
- spurious errors after this as a result of the non-update of the
- memory info.</para>
- </listitem>
+<para>All wrappers have approximately the above form. There are three
+crucial macros:</para>
- <listitem>
- <para><computeroutput>Warning: set address range perms: large range
- <number></computeroutput></para>
+<para><computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>:
+this generates the real name of the wrapper.
+This is an encoded name which Valgrind notices when reading symbol
+table information. What it says is: I am the wrapper for any function
+named <computeroutput>foo</computeroutput> which is found in
+an ELF shared object with an empty
+("<computeroutput>NONE</computeroutput>") soname field. The specification
+mechanism is powerful in
+that wildcards are allowed for both sonames and function names.
+The details are discussed below.</para>
- <para>Diagnostic message, mostly for benefit of the Valgrind
- developers, to do with memory permissions.</para>
- </listitem>
+<para><computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>:
+once in the the wrapper, the first priority is
+to get hold of the address of the original (and any other supporting
+information needed). This is stored in a value of opaque
+type <computeroutput>OrigFn</computeroutput>.
+The information is acquired using
+<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput>. It is crucial
+to make this macro call before calling any other wrapped function
+in the same thread.</para>
- </itemizedlist>
+<para><computeroutput>CALL_FN_W_WW</computeroutput>: eventually we will
+want to call the function being
+wrapped. Calling it directly does not work, since that just gets us
+back to the wrapper and tends to kill the program in short order by
+stack overflow. Instead, the result lvalue,
+<computeroutput>OrigFn</computeroutput> and arguments are
+handed to one of a family of macros of the form
+<computeroutput>CALL_FN_*</computeroutput>. These
+cause Valgrind to call the original and avoid recursion back to the
+wrapper.</para>
+</sect2>
+
+<sect2 id="manual-core.wrapping.specs" xreflabel="Wrapping Specifications">
+<title>Wrapping Specifications</title>
+
+<para>This scheme has the advantage of being self-contained. A library of
+wrappers can be compiled to object code in the normal way, and does
+not rely on an external script telling Valgrind which wrappers pertain
+to which originals.</para>
-</sect1>
+<para>Each wrapper has a name which, in the most general case says: I am the
+wrapper for any function whose name matches FNPATT and whose ELF
+"soname" matches SOPATT. Both FNPATT and SOPATT may contain wildcards
+(asterisks) and other characters (spaces, dots, @, etc) which are not
+generally regarded as valid C identifier names.</para>
+<para>This flexibility is needed to write robust wrappers for POSIX pthread
+functions, where typically we are not completely sure of either the
+function name or the soname, or alternatively we want to wrap a whole
+set of functions at once.</para>
-<sect1 id="manual-core.mpiwrap" xreflabel="MPI Wrappers">
-<title>Debugging MPI Parallel Programs with Valgrind</title>
-
-<para> Valgrind supports debugging of distributed-memory applications
-which use the MPI message passing standard. This support consists of a
-library of wrapper functions for the
-<computeroutput>PMPI_*</computeroutput> interface. When incorporated
-into the application's address space, either by direct linking or by
-<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
-calls to <computeroutput>PMPI_Send</computeroutput>,
-<computeroutput>PMPI_Recv</computeroutput>, etc. They then
-use client requests to inform Valgrind of memory state changes caused
-by the function being wrapped. This reduces the number of false
-positives that Memcheck otherwise typically reports for MPI
-applications.</para>
-
-<para>The wrappers also take the opportunity to carefully check
-size and definedness of buffers passed as arguments to MPI functions, hence
-detecting errors such as passing undefined data to
-<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
-buffer which is too small.</para>
-
-<para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
-BSD-style license, so you can link it into any code base you like.
-See the top of <computeroutput>auxprogs/libmpiwrap.c</computeroutput>
-for license details.</para>
-
-
-<sect2 id="manual-core.mpiwrap.build" xreflabel="Building MPI Wrappers">
-<title>Building and installing the wrappers</title>
-
-<para> The wrapper library will be built automatically if possible.
-Valgrind's configure script will look for a suitable
-<computeroutput>mpicc</computeroutput> to build it with. This must be
-the same <computeroutput>mpicc</computeroutput> you use to build the
-MPI application you want to debug. By default, Valgrind tries
-<computeroutput>mpicc</computeroutput>, but you can specify a
-different one by using the configure-time flag
-<computeroutput>--with-mpicc=</computeroutput>. Currently the
-wrappers are only buildable with
-<computeroutput>mpicc</computeroutput>s which are based on GNU
-<computeroutput>gcc</computeroutput> or Intel's
-<computeroutput>icc</computeroutput>.</para>
-
-<para>Check that the configure script prints a line like this:</para>
+<para>For example, <computeroutput>pthread_create</computeroutput>
+in GNU libpthread is usually a
+versioned symbol - one whose name ends in, eg,
+<computeroutput>@GLIBC_2.3</computeroutput>. Hence we
+are not sure what its real name is. We also want to cover any soname
+of the form <computeroutput>libpthread.so*</computeroutput>.
+So the header of the wrapper will be</para>
<programlisting><![CDATA[
-checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
+int I_WRAP_SONAME_FNNAME_ZZ(libpthreadZdsoZd0,pthreadZucreateZAZa)
+ ( ... formals ... )
+ { ... body ... }
]]></programlisting>
-<para>If it says <computeroutput>... no</computeroutput>, your
-<computeroutput>mpicc</computeroutput> has failed to compile and link
-a test MPI2 program.</para>
-
-<para>If the configure test succeeds, continue in the usual way with
-<computeroutput>make</computeroutput> and <computeroutput>make
-install</computeroutput>. The final install tree should then contain
-<computeroutput>libmpiwrap.so</computeroutput>.
-</para>
-
-<para>Compile up a test MPI program (eg, MPI hello-world) and try
-this:</para>
+<para>In order to write unusual characters as valid C function names, a
+Z-encoding scheme is used. Names are written literally, except that
+a capital Z acts as an escape character, with the following encoding:</para>
<programlisting><![CDATA[
-LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
- mpirun [args] $prefix/bin/valgrind ./hello
+ Za encodes *
+ Zp +
+ Zc :
+ Zd .
+ Zu _
+ Zh -
+ Zs (space)
+ ZA @
+ ZZ Z
+ ZL ( # only in valgrind 3.3.0 and later
+ ZR ) # only in valgrind 3.3.0 and later
]]></programlisting>
-<para>You should see something similar to the following</para>
+<para>Hence <computeroutput>libpthreadZdsoZd0</computeroutput> is an
+encoding of the soname <computeroutput>libpthread.so.0</computeroutput>
+and <computeroutput>pthreadZucreateZAZa</computeroutput> is an encoding
+of the function name <computeroutput>pthread_create@*</computeroutput>.
+</para>
-<programlisting><![CDATA[
-valgrind MPI wrappers 31901: Active for pid 31901
-valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
-]]></programlisting>
+<para>The macro <computeroutput>I_WRAP_SONAME_FNNAME_ZZ</computeroutput>
+constructs a wrapper name in which
+both the soname (first component) and function name (second component)
+are Z-encoded. Encoding the function name can be tiresome and is
+often unnecessary, so a second macro,
+<computeroutput>I_WRAP_SONAME_FNNAME_ZU</computeroutput>, can be
+used instead. The <computeroutput>_ZU</computeroutput> variant is
+also useful for writing wrappers for
+C++ functions, in which the function name is usually already mangled
+using some other convention in which Z plays an important role. Having
+to encode a second time quickly becomes confusing.</para>
-<para>repeated for every process in the group. If you do not see
-these, there is an build/installation problem of some kind.</para>
+<para>Since the function name field may contain wildcards, it can be
+anything, including just <computeroutput>*</computeroutput>.
+The same is true for the soname.
+However, some ELF objects - specifically, main executables - do not
+have sonames. Any object lacking a soname is treated as if its soname
+was <computeroutput>NONE</computeroutput>, which is why the original
+example above had a name
+<computeroutput>I_WRAP_SONAME_FNNAME_ZU(NONE,foo)</computeroutput>.</para>
-<para> The MPI functions to be wrapped are assumed to be in an ELF
-shared object with soname matching
-<computeroutput>libmpi.so*</computeroutput>. This is known to be
-correct at least for Open MPI and Quadrics MPI, and can easily be
-changed if required.</para>
+<para>Note that the soname of an ELF object is not the same as its
+file name, although it is often similar. You can find the soname of
+an object <computeroutput>libfoo.so</computeroutput> using the command
+<computeroutput>readelf -a libfoo.so | grep soname</computeroutput>.</para>
</sect2>
+<sect2 id="manual-core.wrapping.semantics" xreflabel="Wrapping Semantics">
+<title>Wrapping Semantics</title>
-<sect2 id="manual-core.mpiwrap.gettingstarted"
- xreflabel="Getting started with MPI Wrappers">
-<title>Getting started</title>
+<para>The ability for a wrapper to replace an infinite family of functions
+is powerful but brings complications in situations where ELF objects
+appear and disappear (are dlopen'd and dlclose'd) on the fly.
+Valgrind tries to maintain sensible behaviour in such situations.</para>
-<para>Compile your MPI application as usual, taking care to link it
-using the same <computeroutput>mpicc</computeroutput> that your
-Valgrind build was configured with.</para>
+<para>For example, suppose a process has dlopened (an ELF object with
+soname) <computeroutput>object1.so</computeroutput>, which contains
+<computeroutput>function1</computeroutput>. It starts to use
+<computeroutput>function1</computeroutput> immediately.</para>
-<para>
-Use the following basic scheme to run your application on Valgrind with
-the wrappers engaged:</para>
+<para>After a while it dlopens <computeroutput>wrappers.so</computeroutput>,
+which contains a wrapper
+for <computeroutput>function1</computeroutput> in (soname)
+<computeroutput>object1.so</computeroutput>. All subsequent calls to
+<computeroutput>function1</computeroutput> are rerouted to the wrapper.</para>
-<programlisting><![CDATA[
-MPIWRAP_DEBUG=[wrapper-args] \
- LD_PRELOAD=$prefix/lib/valgrind/<platform>/libmpiwrap.so \
- mpirun [mpirun-args] \
- $prefix/bin/valgrind [valgrind-args] \
- [application] [app-args]
-]]></programlisting>
+<para>If <computeroutput>wrappers.so</computeroutput> is
+later dlclose'd, calls to <computeroutput>function1</computeroutput> are
+naturally routed back to the original.</para>
+
+<para>Alternatively, if <computeroutput>object1.so</computeroutput>
+is dlclose'd but wrappers.so remains,
+then the wrapper exported by <computeroutput>wrapper.so</computeroutput>
+becomes inactive, since there
+is no way to get to it - there is no original to call any more. However,
+Valgrind remembers that the wrapper is still present. If
+<computeroutput>object1.so</computeroutput> is
+eventually dlopen'd again, the wrapper will become active again.</para>
+
+<para>In short, valgrind inspects all code loading/unloading events to
+ensure that the set of currently active wrappers remains consistent.</para>
-<para>As an alternative to
-<computeroutput>LD_PRELOAD</computeroutput>ing
-<computeroutput>libmpiwrap.so</computeroutput>, you can simply link it
-to your application if desired. This should not disturb native
-behaviour of your application in any way.</para>
+<para>A second possible problem is that of conflicting wrappers. It is
+easily possible to load two or more wrappers, both of which claim
+to be wrappers for some third function. In such cases Valgrind will
+complain about conflicting wrappers when the second one appears, and
+will honour only the first one.</para>
</sect2>
+<sect2 id="manual-core.wrapping.debugging" xreflabel="Debugging">
+<title>Debugging</title>
-<sect2 id="manual-core.mpiwrap.controlling"
- xreflabel="Controlling the MPI Wrappers">
-<title>Controlling the wrapper library</title>
+<para>Figuring out what's going on given the dynamic nature of wrapping
+can be difficult. The
+<computeroutput>--trace-redir=yes</computeroutput> flag makes
+this possible
+by showing the complete state of the redirection subsystem after
+every
+<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
+event affecting code (text).</para>
-<para>Environment variable
-<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
-startup. The default behaviour is to print a starting banner</para>
+<para>There are two central concepts:</para>
-<programlisting><![CDATA[
-valgrind MPI wrappers 16386: Active for pid 16386
-valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
-]]></programlisting>
+<itemizedlist>
-<para> and then be relatively quiet.</para>
+ <listitem><para>A "redirection specification" is a binding of
+ a (soname pattern, fnname pattern) pair to a code address.
+ These bindings are created by writing functions with names
+ made with the
+ <computeroutput>I_WRAP_SONAME_FNNAME_{ZZ,_ZU}</computeroutput>
+ macros.</para></listitem>
-<para>You can give a list of comma-separated options in
-<computeroutput>MPIWRAP_DEBUG</computeroutput>. These are</para>
+ <listitem><para>An "active redirection" is code-address to
+ code-address binding currently in effect.</para></listitem>
-<itemizedlist>
- <listitem>
- <para><computeroutput>verbose</computeroutput>:
- show entries/exits of all wrappers. Also show extra
- debugging info, such as the status of outstanding
- <computeroutput>MPI_Request</computeroutput>s resulting
- from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
- </listitem>
- <listitem>
- <para><computeroutput>quiet</computeroutput>:
- opposite of <computeroutput>verbose</computeroutput>, only print
- anything when the wrappers want
- to report a detected programming error, or in case of catastrophic
- failure of the wrappers.</para>
- </listitem>
- <listitem>
- <para><computeroutput>warn</computeroutput>:
- by default, functions which lack proper wrappers
- are not commented on, just silently
- ignored. This causes a warning to be printed for each unwrapped
- function used, up to a maximum of three warnings per function.</para>
- </listitem>
- <listitem>
- <para><computeroutput>strict</computeroutput>:
- print an error message and abort the program if
- a function lacking a wrapper is used.</para>
- </listitem>
</itemizedlist>
-<para> If you want to use Valgrind's XML output facility
-(<computeroutput>--xml=yes</computeroutput>), you should pass
-<computeroutput>quiet</computeroutput> in
-<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
-extraneous printing from the wrappers.</para>
+<para>The state of the wrapping-and-redirection subsystem comprises a set of
+specifications and a set of active bindings. The specifications are
+acquired/discarded by watching all
+<computeroutput>mmap</computeroutput>/<computeroutput>munmap</computeroutput>
+events on code (text)
+sections. The active binding set is (conceptually) recomputed from
+the specifications, and all known symbol names, following any change
+to the specification set.</para>
-</sect2>
+<para><computeroutput>--trace-redir=yes</computeroutput> shows the contents
+of both sets following any such event.</para>
+<para><computeroutput>-v</computeroutput> prints a line of text each
+time an active specification is used for the first time.</para>
-<sect2 id="manual-core.mpiwrap.limitations"
- xreflabel="Abilities and Limitations of MPI Wrappers">
-<title>Abilities and limitations</title>
+<para>Hence for maximum debugging effectiveness you will need to use both
+flags.</para>
-<sect3>
-<title>Functions</title>
+<para>One final comment. The function-wrapping facility is closely
+tied to Valgrind's ability to replace (redirect) specified
+functions, for example to redirect calls to
+<computeroutput>malloc</computeroutput> to its
+own implementation. Indeed, a replacement function can be
+regarded as a wrapper function which does not call the original.
+However, to make the implementation more robust, the two kinds
+of interception (wrapping vs replacement) are treated differently.
+</para>
-<para>All MPI2 functions except
-<computeroutput>MPI_Wtick</computeroutput>,
-<computeroutput>MPI_Wtime</computeroutput> and
-<computeroutput>MPI_Pcontrol</computeroutput> have wrappers. The
-first two are not wrapped because they return a
-<computeroutput>double</computeroutput>, and Valgrind's
-function-wrap mechanism cannot handle that (it could easily enough be
-extended to). <computeroutput>MPI_Pcontrol</computeroutput> cannot be
-wrapped as it has variable arity:
-<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
+<para><computeroutput>--trace-redir=yes</computeroutput> shows
+specifications and bindings for both
+replacement and wrapper functions. To differentiate the
+two, replacement bindings are printed using
+<computeroutput>R-></computeroutput> whereas
+wraps are printed using <computeroutput>W-></computeroutput>.
+</para>
+</sect2>
-<para>Most functions are wrapped with a default wrapper which does
-nothing except complain or abort if it is called, depending on
-settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
-above. The following functions have "real", do-something-useful
-wrappers:</para>
-<programlisting><![CDATA[
-PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
+<sect2 id="manual-core.wrapping.limitations-cf"
+ xreflabel="Limitations - control flow">
+<title>Limitations - control flow</title>
-PMPI_Recv PMPI_Get_count
+<para>For the most part, the function wrapping implementation is robust.
+The only important caveat is: in a wrapper, get hold of
+the <computeroutput>OrigFn</computeroutput> information using
+<computeroutput>VALGRIND_GET_ORIG_FN</computeroutput> before calling any
+other wrapped function. Once you have the
+<computeroutput>OrigFn</computeroutput>, arbitrary
+calls between, recursion between, and longjumps out of wrappers
+should work correctly. There is never any interaction between wrapped
+functions and merely replaced functions
+(eg <computeroutput>malloc</computeroutput>), so you can call
+<computeroutput>malloc</computeroutput> etc safely from within wrappers.
+</para>
-PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
+<para>The above comments are true for {x86,amd64,ppc32}-linux. On
+ppc64-linux function wrapping is more fragile due to the (arguably
+poorly designed) ppc64-linux ABI. This mandates the use of a shadow
+stack which tracks entries/exits of both wrapper and replacement
+functions. This gives two limitations: firstly, longjumping out of
+wrappers will rapidly lead to disaster, since the shadow stack will
+not get correctly cleared. Secondly, since the shadow stack has
+finite size, recursion between wrapper/replacement functions is only
+possible to a limited depth, beyond which Valgrind has to abort the
+run. This depth is currently 16 calls.</para>
-PMPI_Irecv
-PMPI_Wait PMPI_Waitall
-PMPI_Test PMPI_Testall
+<para>For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above
+comments apply on a per-thread basis. In other words, wrapping is
+thread-safe: each thread must individually observe the above
+restrictions, but there is no need for any kind of inter-thread
+cooperation.</para>
+</sect2>
-PMPI_Iprobe PMPI_Probe
-PMPI_Cancel
+<sect2 id="manual-core.wrapping.limitations-sigs"
+ xreflabel="Limitations - original function signatures">
+<title>Limitations - original function signatures</title>
-PMPI_Sendrecv
+<para>As shown in the above example, to call the original you must use a
+macro of the form <computeroutput>CALL_FN_*</computeroutput>.
+For technical reasons it is impossible
+to create a single macro to deal with all argument types and numbers,
+so a family of macros covering the most common cases is supplied. In
+what follows, 'W' denotes a machine-word-typed value (a pointer or a
+C <computeroutput>long</computeroutput>),
+and 'v' denotes C's <computeroutput>void</computeroutput> type.
+The currently available macros are:</para>
-PMPI_Type_commit PMPI_Type_free
+<programlisting><![CDATA[
+CALL_FN_v_v -- call an original of type void fn ( void )
+CALL_FN_W_v -- call an original of type long fn ( void )
-PMPI_Pack PMPI_Unpack
+CALL_FN_v_W -- void fn ( long )
+CALL_FN_W_W -- long fn ( long )
-PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
-PMPI_Reduce PMPI_Allreduce PMPI_Op_create
+CALL_FN_v_WW -- void fn ( long, long )
+CALL_FN_W_WW -- long fn ( long, long )
-PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
+CALL_FN_v_WWW -- void fn ( long, long, long )
+CALL_FN_W_WWW -- long fn ( long, long, long )
-PMPI_Error_string
-PMPI_Init PMPI_Initialized PMPI_Finalize
+CALL_FN_W_WWWW -- long fn ( long, long, long, long )
+CALL_FN_W_5W -- long fn ( long, long, long, long, long )
+CALL_FN_W_6W -- long fn ( long, long, long, long, long, long )
+and so on, up to
+CALL_FN_W_12W
]]></programlisting>
-<para> A few functions such as
-<computeroutput>PMPI_Address</computeroutput> are listed as
-<computeroutput>HAS_NO_WRAPPER</computeroutput>. They have no wrapper
-at all as there is nothing worth checking, and giving a no-op wrapper
-would reduce performance for no reason.</para>
-
-<para> Note that the wrapper library itself can itself generate large
-numbers of calls to the MPI implementation, especially when walking
-complex types. The most common functions called are
-<computeroutput>PMPI_Extent</computeroutput>,
-<computeroutput>PMPI_Type_get_envelope</computeroutput>,
-<computeroutput>PMPI_Type_get_contents</computeroutput>, and
-<computeroutput>PMPI_Type_free</computeroutput>. </para>
-</sect3>
-
-<sect3>
-<title>Types</title>
-
-<para> MPI-1.1 structured types are supported, and walked exactly.
-The currently supported combiners are
-<computeroutput>MPI_COMBINER_NAMED</computeroutput>,
-<computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
-<computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
-<computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
-<computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
-<computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
-<computeroutput>MPI_COMBINER_STRUCT</computeroutput>. This should
-cover all MPI-1.1 types. The mechanism (function
-<computeroutput>walk_type</computeroutput>) should extend easily to
-cover MPI2 combiners.</para>
-
-<para>MPI defines some named structured types
-(<computeroutput>MPI_FLOAT_INT</computeroutput>,
-<computeroutput>MPI_DOUBLE_INT</computeroutput>,
-<computeroutput>MPI_LONG_INT</computeroutput>,
-<computeroutput>MPI_2INT</computeroutput>,
-<computeroutput>MPI_SHORT_INT</computeroutput>,
-<computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
-of some basic type and a C <computeroutput>int</computeroutput>.
-Unfortunately the MPI specification makes it impossible to look inside
-these types and see where the fields are. Therefore these wrappers
-assume the types are laid out as <computeroutput>struct { float val;
-int loc; }</computeroutput> (for
-<computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
-accordingly. This appears to be correct at least for Open MPI 1.0.2
-and for Quadrics MPI.</para>
-
-<para>If <computeroutput>strict</computeroutput> is an option specified
-in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
-will abort if an unhandled type is encountered. Otherwise, the
-application will print a warning message and continue.</para>
-
-<para>Some effort is made to mark/check memory ranges corresponding to
-arrays of values in a single pass. This is important for performance
-since asking Valgrind to mark/check any range, no matter how small,
-carries quite a large constant cost. This optimisation is applied to
-arrays of primitive types (<computeroutput>double</computeroutput>,
-<computeroutput>float</computeroutput>,
-<computeroutput>int</computeroutput>,
-<computeroutput>long</computeroutput>, <computeroutput>long
-long</computeroutput>, <computeroutput>short</computeroutput>,
-<computeroutput>char</computeroutput>, and <computeroutput>long
-double</computeroutput> on platforms where <computeroutput>sizeof(long
-double) == 8</computeroutput>). For arrays of all other types, the
-wrappers handle each element individually and so there can be a very
-large performance cost.</para>
-
-</sect3>
+<para>The set of supported types can be expanded as needed. It is
+regrettable that this limitation exists. Function wrapping has proven
+difficult to implement, with a certain apparently unavoidable level of
+ickyness. After several implementation attempts, the present
+arrangement appears to be the least-worst tradeoff. At least it works
+reliably in the presence of dynamic linking and dynamic code
+loading/unloading.</para>
+<para>You should not attempt to wrap a function of one type signature with a
+wrapper of a different type signature. Such trickery will surely lead
+to crashes or strange behaviour. This is not of course a limitation
+of the function wrapping implementation, merely a reflection of the
+fact that it gives you sweeping powers to shoot yourself in the foot
+if you are not careful. Imagine the instant havoc you could wreak by
+writing a wrapper which matched any function name in any soname - in
+effect, one which claimed to be a wrapper for all functions in the
+process.</para>
</sect2>
+<sect2 id="manual-core.wrapping.examples" xreflabel="Examples">
+<title>Examples</title>
-<sect2 id="manual-core.mpiwrap.writingwrappers"
- xreflabel="Writing new MPI Wrappers">
-<title>Writing new wrappers</title>
+<para>In the source tree,
+<computeroutput>memcheck/tests/wrap[1-8].c</computeroutput> provide a series of
+examples, ranging from very simple to quite advanced.</para>
-<para>
-For the most part the wrappers are straightforward. The only
-significant complexity arises with nonblocking receives.</para>
-
-<para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
-states the recv buffer and returns immediately, giving a handle
-(<computeroutput>MPI_Request</computeroutput>) for the transaction.
-Later the user will have to poll for completion with
-<computeroutput>MPI_Wait</computeroutput> etc, and when the
-transaction completes successfully, the wrappers have to paint the
-recv buffer. But the recv buffer details are not presented to
-<computeroutput>MPI_Wait</computeroutput> -- only the handle is. The
-library therefore maintains a shadow table which associates
-uncompleted <computeroutput>MPI_Request</computeroutput>s with the
-corresponding buffer address/count/type. When an operation completes,
-the table is searched for the associated address/count/type info, and
-memory is marked accordingly.</para>
-
-<para>Access to the table is guarded by a (POSIX pthreads) lock, so as
-to make the library thread-safe.</para>
-
-<para>The table is allocated with
-<computeroutput>malloc</computeroutput> and never
-<computeroutput>free</computeroutput>d, so it will show up in leak
-checks.</para>
-
-<para>Writing new wrappers should be fairly easy. The source file is
-<computeroutput>auxprogs/libmpiwrap.c</computeroutput>. If possible,
-find an existing wrapper for a function of similar behaviour to the
-one you want to wrap, and use it as a starting point. The wrappers
-are organised in sections in the same order as the MPI 1.1 spec, to
-aid navigation. When adding a wrapper, remember to comment out the
-definition of the default wrapper in the long list of defaults at the
-bottom of the file (do not remove it, just comment it out).</para>
+<para><computeroutput>auxprogs/libmpiwrap.c</computeroutput> is an example
+of wrapping a big, complex API (the MPI-2 interface). This file defines
+almost 300 different wrappers.</para>
</sect2>
-<sect2 id="manual-core.mpiwrap.whattoexpect"
- xreflabel="What to expect with MPI Wrappers">
-<title>What to expect when using the wrappers</title>
-
-<para>The wrappers should reduce Memcheck's false-error rate on MPI
-applications. Because the wrapping is done at the MPI interface,
-there will still potentially be a large number of errors reported in
-the MPI implementation below the interface. The best you can do is
-try to suppress them.</para>
-
-<para>You may also find that the input-side (buffer
-length/definedness) checks find errors in your MPI use, for example
-passing too short a buffer to
-<computeroutput>MPI_Recv</computeroutput>.</para>
-
-<para>Functions which are not wrapped may increase the false
-error rate. A possible approach is to run with
-<computeroutput>MPI_DEBUG</computeroutput> containing
-<computeroutput>warn</computeroutput>. This will show you functions
-which lack proper wrappers but which are nevertheless used. You can
-then write wrappers for them.
-</para>
+</sect1>
-<para>A known source of potential false errors are the
-<computeroutput>PMPI_Reduce</computeroutput> family of functions, when
-using a custom (user-defined) reduction function. In a reduction
-operation, each node notionally sends data to a "central point" which
-uses the specified reduction function to merge the data items into a
-single item. Hence, in general, data is passed between nodes and fed
-to the reduction function, but the wrapper library cannot mark the
-transferred data as initialised before it is handed to the reduction
-function, because all that happens "inside" the
-<computeroutput>PMPI_Reduce</computeroutput> call. As a result you
-may see false positives reported in your reduction function.</para>
-</sect2>
-</sect1>
</chapter>