From: Julian Seward Date: Thu, 22 Nov 2007 01:21:56 +0000 (+0000) Subject: Update documents in preparation for 3.3.0, and restructure them X-Git-Tag: svn/VALGRIND_3_3_0~92 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=9101880b1f2b8eb5a228c7c7d7cc61e8a20e6c4a;p=thirdparty%2Fvalgrind.git Update documents in preparation for 3.3.0, and restructure them somewhat to move less relevant material out of the way to some extent. The main changes are: * Update date and version info * Mention other tools in the quick-start guide * Document --child-silent-after-fork * Rearrange order of sections in the Valgrind Core chapter, to move advanced stuff (client requests) to the end, and compact stuff relevant to the majority of users towards the front * Move MPI debugging stuff from the Core manual (a nonsensical place for it) to the Memcheck chapter * Update the manual's introductory chapter a bit * Connect up new tech docs summary page, and disconnect old and very out of date valgrind/memcheck tech docs * Add section tags to the Cachegrind manual, to stop xsltproc complaining about their absence git-svn-id: svn://svn.valgrind.org/valgrind/trunk@7199 --- diff --git a/ACKNOWLEDGEMENTS b/ACKNOWLEDGEMENTS index 0026d9c3f4..81ffdcf8f6 100644 --- a/ACKNOWLEDGEMENTS +++ b/ACKNOWLEDGEMENTS @@ -6,8 +6,9 @@ dynamic-translation framework. Jeremy Fitzhardinge, jeremy@valgrind.org -Jeremy wrote Helgrind and totally overhauled low-level syscall/signal -and address space layout stuff, among many other improvements. +Jeremy wrote Helgrind (in the 2.X line) and totally overhauled +low-level syscall/signal and address space layout stuff, among many +other improvements. Tom Hughes, tom@valgrind.org diff --git a/AUTHORS b/AUTHORS index eeb0549b72..3c68c2fa07 100644 --- a/AUTHORS +++ b/AUTHORS @@ -2,8 +2,9 @@ Cerion Armour-Brown worked on PowerPC instruction set support using the Vex dynamic-translation framework. -Jeremy Fitzhardinge wrote Helgrind and totally overhauled low-level -syscall/signal and address space layout stuff, among many other things. +Jeremy Fitzhardinge wrote Helgrind (in the 2.X line) and totally +overhauled low-level syscall/signal and address space layout stuff, +among many other things. Tom Hughes did a vast number of bug fixes, and helped out with support for more recent Linux/glibc versions. diff --git a/cachegrind/docs/cg-manual.xml b/cachegrind/docs/cg-manual.xml index 2a643ecc06..59b70e1d8c 100644 --- a/cachegrind/docs/cg-manual.xml +++ b/cachegrind/docs/cg-manual.xml @@ -937,7 +937,7 @@ way as for C/C++ programs. - + Warnings There are a couple of situations in which @@ -969,7 +969,8 @@ warnings. - + Things to watch out for Some odd things that can occur during annotation: @@ -1084,7 +1085,7 @@ rare. - + Accuracy Valgrind's cache profiling has a number of @@ -1221,7 +1222,8 @@ fail these checks. - + Acting on Cachegrind's information So, you've managed to profile your program with Cachegrind. Now what? @@ -1260,14 +1262,16 @@ yourself. But at least you have the information! - + Implementation details This section talks about details you don't need to know about in order to use Cachegrind, but may be of interest to some people. - + How Cachegrind works The best reference for understanding how Cachegrind works is chapter 3 of "Dynamic Binary Analysis and Instrumentation", by Nicholas Nethercote. It @@ -1275,7 +1279,8 @@ is available on the Valgrind publications page. - + Cachegrind output file format The file format is fairly straightforward, basically giving the cost centre for every line, grouped by files and diff --git a/docs/xml/Makefile.am b/docs/xml/Makefile.am index 7958b25887..791287221b 100644 --- a/docs/xml/Makefile.am +++ b/docs/xml/Makefile.am @@ -7,5 +7,6 @@ EXTRA_DIST = \ manual-writing-tools.xml\ quick-start-guide.xml \ tech-docs.xml \ + new-tech-docs.xml \ vg-entities.xml \ xml_help.txt diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml index addb2bdd20..a7f5d22f8a 100644 --- a/docs/xml/manual-core.xml +++ b/docs/xml/manual-core.xml @@ -119,7 +119,7 @@ benefits of higher optimisation levels whilst keeping relatively small the chances of false positives or false negatives from Memcheck. Also, you should compile your code with -Wall because it can identify some or all of the problems that Valgrind can miss at the -higher optimisations levels. (Using -Wall +higher optimisation levels. (Using -Wall is also a good idea in general.) All other tools (as far as we know) are unaffected by optimisation level. @@ -657,6 +657,25 @@ categories. + + + + + + When enabled, Valgrind will not show any debugging or + logging output for the child process resulting from + a fork call. This can make the output less + confusing (although more misleading) when dealing with processes + that create children. It is particularly useful in conjunction + with --trace-children=. Use of this flag is also + strongly recommended if you are requesting XML output + (--xml=yes), since otherwise the XML from child and + parent may become mixed up, which usually makes it useless. + + + + @@ -988,6 +1007,10 @@ that can report errors, e.g. Memcheck, but not Cachegrind. process to be debugged and each instance of %f expands to the path to the executable for the process to be debugged. + + Since <command> is likely + to contain spaces, you will need to put this entire flag in + quotes to ensure it is correctly handled by the shell. @@ -1273,254 +1296,6 @@ don't understand - -The Client Request mechanism - -Valgrind has a trapdoor mechanism via which the client -program can pass all manner of requests and queries to Valgrind -and the current tool. Internally, this is used extensively to -make malloc, free, etc, work, although you don't see that. - -For your convenience, a subset of these so-called client -requests is provided to allow you to tell Valgrind facts about -the behaviour of your program, and also to make queries. -In particular, your program can tell Valgrind about changes in -memory range permissions that Valgrind would not otherwise know -about, and so allows clients to get Valgrind to do arbitrary -custom checks. - -Clients need to include a header file to make this work. -Which header file depends on which client requests you use. Some -client requests are handled by the core, and are defined in the -header file valgrind/valgrind.h. Tool-specific -header files are named after the tool, e.g. -valgrind/memcheck.h. All header files can be found -in the include/valgrind directory of wherever Valgrind -was installed. - -The macros in these header files have the magical property -that they generate code in-line which Valgrind can spot. -However, the code does nothing when not run on Valgrind, so you -are not forced to run your program under Valgrind just because you -use the macros in this file. Also, you are not required to link your -program with any extra supporting libraries. - -The code added to your binary has negligible performance impact: -on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions -and is probably undetectable except in tight loops. -However, if you really wish to compile out the client requests, you can -compile with -DNVALGRIND (analogous to --DNDEBUG's effect on -assert()). - - -You are encouraged to copy the valgrind/*.h headers -into your project's include directory, so your program doesn't have a -compile-time dependency on Valgrind being installed. The Valgrind headers, -unlike most of the rest of the code, are under a BSD-style license so you may -include them without worrying about license incompatibility. - -Here is a brief description of the macros available in -valgrind.h, which work with more than one -tool (see the tool-specific documentation for explanations of the -tool-specific macros). - - - - - RUNNING_ON_VALGRIND: - - Returns 1 if running on Valgrind, 0 if running on the - real CPU. If you are running Valgrind on itself, returns the - number of layers of Valgrind emulation you're running on. - - - - - - VALGRIND_DISCARD_TRANSLATIONS: - - Discards translations of code in the specified address - range. Useful if you are debugging a JIT compiler or some other - dynamic code generation system. After this call, attempts to - execute code in the invalidated address range will cause - Valgrind to make new translations of that code, which is - probably the semantics you want. Note that code invalidations - are expensive because finding all the relevant translations - quickly is very difficult. So try not to call it often. - Note that you can be clever about - this: you only need to call it when an area which previously - contained code is overwritten with new code. You can choose - to write code into fresh memory, and just call this - occasionally to discard large chunks of old code all at - once. - - Alternatively, for transparent self-modifying-code support, - use--smc-check=all, or run - on ppc32/Linux or ppc64/Linux. - - - - - - VALGRIND_COUNT_ERRORS: - - Returns the number of errors found so far by Valgrind. Can be - useful in test harness code when combined with the - option; this runs Valgrind silently, - but the client program can detect when errors occur. Only useful - for tools that report errors, e.g. it's useful for Memcheck, but for - Cachegrind it will always return zero because Cachegrind doesn't - report errors. - - - - - VALGRIND_MALLOCLIKE_BLOCK: - - If your program manages its own memory instead of using - the standard malloc() / - new / - new[], tools that track - information about heap blocks will not do nearly as good a - job. For example, Memcheck won't detect nearly as many - errors, and the error messages won't be as informative. To - improve this situation, use this macro just after your custom - allocator allocates some new memory. See the comments in - valgrind.h for information on how to use - it. - - - - - VALGRIND_FREELIKE_BLOCK: - - This should be used in conjunction with - VALGRIND_MALLOCLIKE_BLOCK. - Again, see memcheck/memcheck.h for - information on how to use it. - - - - - VALGRIND_CREATE_MEMPOOL: - - This is similar to - VALGRIND_MALLOCLIKE_BLOCK, - but is tailored towards code that uses memory pools. See the - comments in valgrind.h for information - on how to use it. - - - - - VALGRIND_DESTROY_MEMPOOL: - - This should be used in conjunction with - VALGRIND_CREATE_MEMPOOL. - Again, see the comments in valgrind.h for - information on how to use it. - - - - - VALGRIND_MEMPOOL_ALLOC: - - This should be used in conjunction with - VALGRIND_CREATE_MEMPOOL. - Again, see the comments in valgrind.h for - information on how to use it. - - - - - VALGRIND_MEMPOOL_FREE: - - This should be used in conjunction with - VALGRIND_CREATE_MEMPOOL. - Again, see the comments in valgrind.h for - information on how to use it. - - - - - VALGRIND_NON_SIMD_CALL[0123]: - - Executes a function of 0, 1, 2 or 3 args in the client - program on the real CPU, not the virtual - CPU that Valgrind normally runs code on. These are used in - various ways internally to Valgrind. They might be useful to - client programs. - - Warning: Only use these if you - really know what you are doing. - - - - - VALGRIND_PRINTF(format, ...): - - printf a message to the log file when running under - Valgrind. Nothing is output if not running under Valgrind. - Returns the number of characters output. - - - - - VALGRIND_PRINTF_BACKTRACE(format, ...): - - printf a message to the log file along with a stack - backtrace when running under Valgrind. Nothing is output if - not running under Valgrind. Returns the number of characters - output. - - - - - VALGRIND_STACK_REGISTER(start, end): - - Registers a new stack. Informs Valgrind that the memory range - between start and end is a unique stack. Returns a stack identifier - that can be used with other - VALGRIND_STACK_* calls. - Valgrind will use this information to determine if a change to - the stack pointer is an item pushed onto the stack or a change over - to a new stack. Use this if you're using a user-level thread package - and are noticing spurious errors from Valgrind about uninitialized - memory reads. - - - - - VALGRIND_STACK_DEREGISTER(id): - - Deregisters a previously registered stack. Informs - Valgrind that previously registered memory range with stack id - id is no longer a stack. - - - - - VALGRIND_STACK_CHANGE(id, start, end): - - Changes a previously registered stack. Informs - Valgrind that the previously registered stack with stack id - id has changed its start and end - values. Use this if your user-level thread package implements - stack growth. - - - - - -Note that valgrind.h is included by -all the tool-specific header files (such as -memcheck.h), so you don't need to include it -in your client if you include a tool-specific header. - - - @@ -1528,7 +1303,7 @@ in your client if you include a tool-specific header. Valgrind supports programs which use POSIX pthreads. Getting this to work was technically challenging but it now works -well enough for significant threaded applications to work. +well enough for significant threaded applications to run. The main thing to point out is that although Valgrind works with the standard Linux threads library (eg. NPTL or LinuxThreads), it @@ -1544,7 +1319,8 @@ every 100000 basic blocks (on x86, typically around 600000 instructions), which means you'll get a much finer interleaving of thread executions than when run natively. This in itself may cause your program to behave differently if you have some kind of -concurrency, critical race, locking, or similar, bugs. +concurrency, critical race, locking, or similar, bugs. In that case +you might consider using Valgrind's Helgrind tool to track them down. Your program will use the native libpthread, but not all of its facilities @@ -1595,1203 +1371,1085 @@ will create a core dump in the usual way. - -Function wrapping - -Valgrind versions 3.2.0 and above can do function wrapping on all -supported targets. In function wrapping, calls to some specified -function are intercepted and rerouted to a different, user-supplied -function. This can do whatever it likes, typically examining the -arguments, calling onwards to the original, and possibly examining the -result. Any number of functions may be wrapped. - -Function wrapping is useful for instrumenting an API in some way. For -example, wrapping functions in the POSIX pthreads API makes it -possible to notify Valgrind of thread status changes, and wrapping -functions in the MPI (message-passing) API allows notifying Valgrind -of memory status changes associated with message arrival/departure. -Such information is usually passed to Valgrind by using client -requests in the wrapper functions, although that is not of relevance -here. - -A Simple Example -Supposing we want to wrap some function - + +Building and Installing Valgrind -A wrapper is a function of identical type, but with a special name -which identifies it as the wrapper for foo. -Wrappers need to include -supporting macros from valgrind.h. -Here is a simple wrapper which prints the arguments and return value: +We use the standard Unix +./configure, +make, make +install mechanism, and we have attempted to +ensure that it works on machines with kernel 2.4 or 2.6 and glibc +2.2.X to 2.5.X. Once you have completed +make install you may then want +to run the regression tests +with make regtest. + - -#include "valgrind.h" -int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y ) -{ - int result; - OrigFn fn; - VALGRIND_GET_ORIG_FN(fn); - printf("foo's wrapper: args %d %d\n", x, y); - CALL_FN_W_WW(result, fn, x,y); - printf("foo's wrapper: result %d\n", result); - return result; -} -]]> +There are five options (in addition to the usual + which affect how Valgrind is built: + -To become active, the wrapper merely needs to be present in a text -section somewhere in the same process' address space as the function -it wraps, and for its ELF symbol name to be visible to Valgrind. In -practice, this means either compiling to a -.o and linking it in, or -compiling to a .so and -LD_PRELOADing it in. The latter is more -convenient in that it doesn't require relinking. + + + This builds Valgrind with some special magic hacks which make + it possible to run it on a standard build of Valgrind (what the + developers call "self-hosting"). Ordinarily you should not use + this flag as various kinds of safety checks are disabled. + + -All wrappers have approximately the above form. There are three -crucial macros: + + + TLS (Thread Local Storage) is a relatively new mechanism which + requires compiler, linker and kernel support. Valgrind tries to + automatically test if TLS is supported and if so enables this option. + Sometimes it cannot test for TLS, so this option allows you to + override the automatic test. + -I_WRAP_SONAME_FNNAME_ZU: -this generates the real name of the wrapper. -This is an encoded name which Valgrind notices when reading symbol -table information. What it says is: I am the wrapper for any function -named foo which is found in -an ELF shared object with an empty -("NONE") soname field. The specification -mechanism is powerful in -that wildcards are allowed for both sonames and function names. -The details are discussed below. + + + Specifies the path to the underlying VEX dynamic-translation + library. By default this is taken to be in the VEX directory off + the root of the source tree. + + -VALGRIND_GET_ORIG_FN: -once in the the wrapper, the first priority is -to get hold of the address of the original (and any other supporting -information needed). This is stored in a value of opaque -type OrigFn. -The information is acquired using -VALGRIND_GET_ORIG_FN. It is crucial -to make this macro call before calling any other wrapped function -in the same thread. + + + + On 64-bit + platforms (amd64-linux, ppc64-linux), Valgrind is by default built + in such a way that both 32-bit and 64-bit executables can be run. + Sometimes this cleverness is a problem for a variety of reasons. + These two flags allow for single-target builds in this situation. + If you issue both, the configure script will complain. Note they + are ignored on 32-bit-only platforms (x86-linux, ppc32-linux). + + -CALL_FN_W_WW: eventually we will -want to call the function being -wrapped. Calling it directly does not work, since that just gets us -back to the wrapper and tends to kill the program in short order by -stack overflow. Instead, the result lvalue, -OrigFn and arguments are -handed to one of a family of macros of the form -CALL_FN_*. These -cause Valgrind to call the original and avoid recursion back to the -wrapper. - + + - -Wrapping Specifications +The configure script tests +the version of the X server currently indicated by the current +$DISPLAY. This is a known bug. +The intention was to detect the version of the current X +client libraries, so that correct suppressions could be selected +for them, but instead the test checks the server version. This +is just plain wrong. -This scheme has the advantage of being self-contained. A library of -wrappers can be compiled to object code in the normal way, and does -not rely on an external script telling Valgrind which wrappers pertain -to which originals. +If you are building a binary package of Valgrind for +distribution, please read README_PACKAGERS +. It contains some +important information. -Each wrapper has a name which, in the most general case says: I am the -wrapper for any function whose name matches FNPATT and whose ELF -"soname" matches SOPATT. Both FNPATT and SOPATT may contain wildcards -(asterisks) and other characters (spaces, dots, @, etc) which are not -generally regarded as valid C identifier names. +Apart from that, there's not much excitement here. Let us +know if you have build problems. -This flexibility is needed to write robust wrappers for POSIX pthread -functions, where typically we are not completely sure of either the -function name or the soname, or alternatively we want to wrap a whole -set of functions at once. + -For example, pthread_create -in GNU libpthread is usually a -versioned symbol - one whose name ends in, eg, -@GLIBC_2.3. Hence we -are not sure what its real name is. We also want to cover any soname -of the form libpthread.so*. -So the header of the wrapper will be - -In order to write unusual characters as valid C function names, a -Z-encoding scheme is used. Names are written literally, except that -a capital Z acts as an escape character, with the following encoding: + +If You Have Problems - +Contact us at &vg-url;. -Hence libpthreadZdsoZd0 is an -encoding of the soname libpthread.so.0 -and pthreadZucreateZAZa is an encoding -of the function name pthread_create@*. - +See for the known +limitations of Valgrind, and for a list of programs which are +known not to work on it. -The macro I_WRAP_SONAME_FNNAME_ZZ -constructs a wrapper name in which -both the soname (first component) and function name (second component) -are Z-encoded. Encoding the function name can be tiresome and is -often unnecessary, so a second macro, -I_WRAP_SONAME_FNNAME_ZU, can be -used instead. The _ZU variant is -also useful for writing wrappers for -C++ functions, in which the function name is usually already mangled -using some other convention in which Z plays an important role. Having -to encode a second time quickly becomes confusing. +All parts of the system make heavy use of assertions and +internal self-checks. They are permanently enabled, and we have no +plans to disable them. If one of them breaks, please mail us! -Since the function name field may contain wildcards, it can be -anything, including just *. -The same is true for the soname. -However, some ELF objects - specifically, main executables - do not -have sonames. Any object lacking a soname is treated as if its soname -was NONE, which is why the original -example above had a name -I_WRAP_SONAME_FNNAME_ZU(NONE,foo). +If you get an assertion failure +in m_mallocfree.c, this may have happened because +your program wrote off the end of a malloc'd block, or before its +beginning. Valgrind hopefully will have emitted a proper message to that +effect before dying in this way. This is a known problem which +we should fix. -Note that the soname of an ELF object is not the same as its -file name, although it is often similar. You can find the soname of -an object libfoo.so using the command -readelf -a libfoo.so | grep soname. - +Read the for more advice about common problems, +crashes, etc. - -Wrapping Semantics + -The ability for a wrapper to replace an infinite family of functions -is powerful but brings complications in situations where ELF objects -appear and disappear (are dlopen'd and dlclose'd) on the fly. -Valgrind tries to maintain sensible behaviour in such situations. -For example, suppose a process has dlopened (an ELF object with -soname) object1.so, which contains -function1. It starts to use -function1 immediately. -After a while it dlopens wrappers.so, -which contains a wrapper -for function1 in (soname) -object1.so. All subsequent calls to -function1 are rerouted to the wrapper. + +Limitations -If wrappers.so is -later dlclose'd, calls to function1 are -naturally routed back to the original. +The following list of limitations seems long. However, most +programs actually work fine. -Alternatively, if object1.so -is dlclose'd but wrappers.so remains, -then the wrapper exported by wrapper.so -becomes inactive, since there -is no way to get to it - there is no original to call any more. However, -Valgrind remembers that the wrapper is still present. If -object1.so is -eventually dlopen'd again, the wrapper will become active again. +Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X +system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the +following constraints: -In short, valgrind inspects all code loading/unloading events to -ensure that the set of currently active wrappers remains consistent. + + + On x86 and amd64, there is no support for 3DNow! instructions. + If the translator encounters these, Valgrind will generate a SIGILL + when the instruction is executed. Apart from that, on x86 and amd64, + essentially all instructions are supported, up to and including SSE3. + -A second possible problem is that of conflicting wrappers. It is -easily possible to load two or more wrappers, both of which claim -to be wrappers for some third function. In such cases Valgrind will -complain about conflicting wrappers when the second one appears, and -will honour only the first one. - + On ppc32 and ppc64, almost all integer, floating point and Altivec + instructions are supported. Specifically: integer and FP insns that are + mandatory for PowerPC, the "General-purpose optional" group (fsqrt, fsqrts, + stfiwx), the "Graphics optional" group (fre, fres, frsqrte, frsqrtes), and + the Altivec (also known as VMX) SIMD instruction set, are supported. + - -Debugging + + Atomic instruction sequences are not properly supported, in the + sense that their atomicity is not preserved. This will affect any + use of synchronization via memory shared between processes. They + will appear to work, but fail sporadically. + -Figuring out what's going on given the dynamic nature of wrapping -can be difficult. The ---trace-redir=yes flag makes -this possible -by showing the complete state of the redirection subsystem after -every -mmap/munmap -event affecting code (text). + + If your program does its own memory management, rather than + using malloc/new/free/delete, it should still work, but Memcheck's + error checking won't be so effective. If you describe your program's + memory management scheme using "client requests" + (see ), Memcheck can do + better. Nevertheless, using malloc/new and free/delete is still the + best approach. + -There are two central concepts: + + Valgrind's signal simulation is not as robust as it could be. + Basic POSIX-compliant sigaction and sigprocmask functionality is + supplied, but it's conceivable that things could go badly awry if you + do weird things with signals. Workaround: don't. Programs that do + non-POSIX signal tricks are in any case inherently unportable, so + should be avoided if possible. + - + + Machine instructions, and system calls, have been implemented + on demand. So it's possible, although unlikely, that a program will + fall over with a message to that effect. If this happens, please + report all the details printed out, so we can try and implement the + missing feature. + - A "redirection specification" is a binding of - a (soname pattern, fnname pattern) pair to a code address. - These bindings are created by writing functions with names - made with the - I_WRAP_SONAME_FNNAME_{ZZ,_ZU} - macros. + + Memory consumption of your program is majorly increased whilst + running under Valgrind. This is due to the large amount of + administrative information maintained behind the scenes. Another + cause is that Valgrind dynamically translates the original + executable. Translated, instrumented code is 12-18 times larger than + the original so you can easily end up with 50+ MB of translations + when running (eg) a web browser. + - An "active redirection" is code-address to - code-address binding currently in effect. + + Valgrind can handle dynamically-generated code just fine. If + you regenerate code over the top of old code (ie. at the same memory + addresses), if the code is on the stack Valgrind will realise the + code has changed, and work correctly. This is necessary to handle + the trampolines GCC uses to implemented nested functions. If you + regenerate code somewhere other than the stack, you will need to use + the flag, and Valgrind will run more + slowly than normal. + - + + As of version 3.0.0, Valgrind has the following limitations + in its implementation of x86/AMD64 floating point relative to + IEEE754. -The state of the wrapping-and-redirection subsystem comprises a set of -specifications and a set of active bindings. The specifications are -acquired/discarded by watching all -mmap/munmap -events on code (text) -sections. The active binding set is (conceptually) recomputed from -the specifications, and all known symbol names, following any change -to the specification set. + Precision: There is no support for 80 bit arithmetic. + Internally, Valgrind represents all such "long double" numbers in 64 + bits, and so there may be some differences in results. Whether or + not this is critical remains to be seen. Note, the x86/amd64 + fldt/fstpt instructions (read/write 80-bit numbers) are correctly + simulated, using conversions to/from 64 bits, so that in-memory + images of 80-bit numbers look correct if anyone wants to see. ---trace-redir=yes shows the contents -of both sets following any such event. + The impression observed from many FP regression tests is that + the accuracy differences aren't significant. Generally speaking, if + a program relies on 80-bit precision, there may be difficulties + porting it to non x86/amd64 platforms which only support 64-bit FP + precision. Even on x86/amd64, the program may get different results + depending on whether it is compiled to use SSE2 instructions (64-bits + only), or x87 instructions (80-bit). The net effect is to make FP + programs behave as if they had been run on a machine with 64-bit IEEE + floats, for example PowerPC. On amd64 FP arithmetic is done by + default on SSE2, so amd64 looks more like PowerPC than x86 from an FP + perspective, and there are far fewer noticeable accuracy differences + than with x86. --v prints a line of text each -time an active specification is used for the first time. + Rounding: Valgrind does observe the 4 IEEE-mandated rounding + modes (to nearest, to +infinity, to -infinity, to zero) for the + following conversions: float to integer, integer to float where + there is a possibility of loss of precision, and float-to-float + rounding. For all other FP operations, only the IEEE default mode + (round to nearest) is supported. -Hence for maximum debugging effectiveness you will need to use both -flags. + Numeric exceptions in FP code: IEEE754 defines five types of + numeric exception that can happen: invalid operation (sqrt of + negative number, etc), division by zero, overflow, underflow, + inexact (loss of precision). -One final comment. The function-wrapping facility is closely -tied to Valgrind's ability to replace (redirect) specified -functions, for example to redirect calls to -malloc to its -own implementation. Indeed, a replacement function can be -regarded as a wrapper function which does not call the original. -However, to make the implementation more robust, the two kinds -of interception (wrapping vs replacement) are treated differently. - + For each exception, two courses of action are defined by IEEE754: + either (1) a user-defined exception handler may be called, or (2) a + default action is defined, which "fixes things up" and allows the + computation to proceed without throwing an exception. ---trace-redir=yes shows -specifications and bindings for both -replacement and wrapper functions. To differentiate the -two, replacement bindings are printed using -R-> whereas -wraps are printed using W->. - - + Currently Valgrind only supports the default fixup actions. + Again, feedback on the importance of exception support would be + appreciated. + When Valgrind detects that the program is trying to exceed any + of these limitations (setting exception handlers, rounding mode, or + precision control), it can print a message giving a traceback of + where this has happened, and continue execution. This behaviour used + to be the default, but the messages are annoying and so showing them + is now disabled by default. Use to see + them. - -Limitations - control flow + The above limitations define precisely the IEEE754 'default' + behaviour: default fixup on all exceptions, round-to-nearest + operations, and 64-bit precision. + + + + As of version 3.0.0, Valgrind has the following limitations in + its implementation of x86/AMD64 SSE2 FP arithmetic, relative to + IEEE754. -For the most part, the function wrapping implementation is robust. -The only important caveat is: in a wrapper, get hold of -the OrigFn information using -VALGRIND_GET_ORIG_FN before calling any -other wrapped function. Once you have the -OrigFn, arbitrary -calls between, recursion between, and longjumps out of wrappers -should work correctly. There is never any interaction between wrapped -functions and merely replaced functions -(eg malloc), so you can call -malloc etc safely from within wrappers. - + Essentially the same: no exceptions, and limited observance of + rounding mode. Also, SSE2 has control bits which make it treat + denormalised numbers as zero (DAZ) and a related action, flush + denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be + less accurate than IEEE requires. Valgrind detects, ignores, and can + warn about, attempts to enable either mode. + -The above comments are true for {x86,amd64,ppc32}-linux. On -ppc64-linux function wrapping is more fragile due to the (arguably -poorly designed) ppc64-linux ABI. This mandates the use of a shadow -stack which tracks entries/exits of both wrapper and replacement -functions. This gives two limitations: firstly, longjumping out of -wrappers will rapidly lead to disaster, since the shadow stack will -not get correctly cleared. Secondly, since the shadow stack has -finite size, recursion between wrapper/replacement functions is only -possible to a limited depth, beyond which Valgrind has to abort the -run. This depth is currently 16 calls. + + As of version 3.2.0, Valgrind has the following limitations + in its implementation of PPC32 and PPC64 floating point + arithmetic, relative to IEEE754. -For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above -comments apply on a per-thread basis. In other words, wrapping is -thread-safe: each thread must individually observe the above -restrictions, but there is no need for any kind of inter-thread -cooperation. - + Scalar (non-Altivec): Valgrind provides a bit-exact emulation of + all floating point instructions, except for "fre" and "fres", which are + done more precisely than required by the PowerPC architecture specification. + All floating point operations observe the current rounding mode. + + However, fpscr[FPRF] is not set after each operation. That could + be done but would give measurable performance overheads, and so far + no need for it has been found. - -Limitations - original function signatures + As on x86/AMD64, IEEE754 exceptions are not supported: all floating + point exceptions are handled using the default IEEE fixup actions. + Valgrind detects, ignores, and can warn about, attempts to unmask + the 5 IEEE FP exception kinds by writing to the floating-point status + and control register (fpscr). + -As shown in the above example, to call the original you must use a -macro of the form CALL_FN_*. -For technical reasons it is impossible -to create a single macro to deal with all argument types and numbers, -so a family of macros covering the most common cases is supplied. In -what follows, 'W' denotes a machine-word-typed value (a pointer or a -C long), -and 'v' denotes C's void type. -The currently available macros are: - -Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2: + no exceptions, and limited observance of rounding mode. + For Altivec, FP arithmetic + is done in IEEE/Java mode, which is more accurate than the Linux default + setting. "More accurate" means that denormals are handled properly, + rather than simply being flushed to zero. + + -CALL_FN_v_WWW -- void fn ( long, long, long ) -CALL_FN_W_WWW -- long fn ( long, long, long ) + Programs which are known not to work are: + + + emacs starts up but immediately concludes it is out of + memory and aborts. It may be that Memcheck does not provide + a good enough emulation of the + mallinfo function. + Emacs works fine if you build it to use + the standard malloc/free routines. + + -CALL_FN_W_WWWW -- long fn ( long, long, long, long ) -CALL_FN_W_5W -- long fn ( long, long, long, long, long ) -CALL_FN_W_6W -- long fn ( long, long, long, long, long, long ) -and so on, up to -CALL_FN_W_12W -]]> + -The set of supported types can be expanded as needed. It is -regrettable that this limitation exists. Function wrapping has proven -difficult to implement, with a certain apparently unavoidable level of -ickyness. After several implementation attempts, the present -arrangement appears to be the least-worst tradeoff. At least it works -reliably in the presence of dynamic linking and dynamic code -loading/unloading. -You should not attempt to wrap a function of one type signature with a -wrapper of a different type signature. Such trickery will surely lead -to crashes or strange behaviour. This is not of course a limitation -of the function wrapping implementation, merely a reflection of the -fact that it gives you sweeping powers to shoot yourself in the foot -if you are not careful. Imagine the instant havoc you could wreak by -writing a wrapper which matched any function name in any soname - in -effect, one which claimed to be a wrapper for all functions in the -process. - + +An Example Run - -Examples +This is the log for a run of a small program using Memcheck. +The program is in fact correct, and the reported error is as the +result of a potentially serious code generation bug in GNU g++ +(snapshot 20010527). -In the source tree, -memcheck/tests/wrap[1-8].c provide a series of -examples, ranging from very simple to quite advanced. + -auxprogs/libmpiwrap.c is an example -of wrapping a big, complex API (the MPI-2 interface). This file defines -almost 300 different wrappers. - +The GCC folks fixed this about a week before gcc-3.0 +shipped. + +Warning Messages You Might See - -Building and Installing Valgrind - -We use the standard Unix -./configure, -make, make -install mechanism, and we have attempted to -ensure that it works on machines with kernel 2.4 or 2.6 and glibc -2.2.X to 2.5.X. Once you have completed -make install you may then want -to run the regression tests -with make regtest. - +Most of these only appear if you run in verbose mode +(enabled by -v): -There are five options (in addition to the usual - which affect how Valgrind is built: - + - - This builds Valgrind with some special magic hacks which make - it possible to run it on a standard build of Valgrind (what the - developers call "self-hosting"). Ordinarily you should not use - this flag as various kinds of safety checks are disabled. - + More than 100 errors detected. Subsequent + errors will still be recorded, but in less detail than + before. + + After 100 different errors have been shown, Valgrind becomes + more conservative about collecting them. It then requires only the + program counters in the top two stack frames to match when deciding + whether or not two errors are really the same one. Prior to this + point, the PCs in the top four frames are required to match. This + hack has the effect of slowing down the appearance of new errors + after the first 100. The 100 constant can be changed by recompiling + Valgrind. - - TLS (Thread Local Storage) is a relatively new mechanism which - requires compiler, linker and kernel support. Valgrind tries to - automatically test if TLS is supported and if so enables this option. - Sometimes it cannot test for TLS, so this option allows you to - override the automatic test. + More than 1000 errors detected. I'm not + reporting any more. Final error counts may be inaccurate. Go fix + your program! + + After 1000 different errors have been detected, Valgrind + ignores any more. It seems unlikely that collecting even more + different ones would be of practical help to anybody, and it avoids + the danger that Valgrind spends more and more of its time comparing + new errors against an ever-growing collection. As above, the 1000 + number is a compile-time constant. - - Specifies the path to the underlying VEX dynamic-translation - library. By default this is taken to be in the VEX directory off - the root of the source tree. - + Warning: client switching stacks? + + Valgrind spotted such a large change in the stack pointer + that it guesses the client is switching to + a different stack. At this point it makes a kludgey guess where the + base of the new stack is, and sets memory permissions accordingly. + You may get many bogus error messages following this, if Valgrind + guesses wrong. At the moment "large change" is defined as a change + of more that 2000000 in the value of the + stack pointer register. - - - On 64-bit - platforms (amd64-linux, ppc64-linux), Valgrind is by default built - in such a way that both 32-bit and 64-bit executables can be run. - Sometimes this cleverness is a problem for a variety of reasons. - These two flags allow for single-target builds in this situation. - If you issue both, the configure script will complain. Note they - are ignored on 32-bit-only platforms (x86-linux, ppc32-linux). - + Warning: client attempted to close Valgrind's + logfile fd <number> + + Valgrind doesn't allow the client to close the logfile, + because you'd never see any diagnostic information after that point. + If you see this message, you may want to use the + option to specify a + different logfile file-descriptor number. - - + + Warning: noted but unhandled ioctl + <number> -The configure script tests -the version of the X server currently indicated by the current -$DISPLAY. This is a known bug. -The intention was to detect the version of the current X -client libraries, so that correct suppressions could be selected -for them, but instead the test checks the server version. This -is just plain wrong. + Valgrind observed a call to one of the vast family of + ioctl system calls, but did not + modify its memory status info (because nobody has yet written a + suitable wrapper). The call will still have gone through, but you may get + spurious errors after this as a result of the non-update of the + memory info. + -If you are building a binary package of Valgrind for -distribution, please read README_PACKAGERS -. It contains some -important information. + + Warning: set address range perms: large range + <number> -Apart from that, there's not much excitement here. Let us -know if you have build problems. + Diagnostic message, mostly for benefit of the Valgrind + developers, to do with memory permissions. + + + - -If You Have Problems + +The Client Request mechanism -Contact us at &vg-url;. +Valgrind has a trapdoor mechanism via which the client +program can pass all manner of requests and queries to Valgrind +and the current tool. Internally, this is used extensively to +make malloc, free, etc, work, although you don't see that. -See for the known -limitations of Valgrind, and for a list of programs which are -known not to work on it. +For your convenience, a subset of these so-called client +requests is provided to allow you to tell Valgrind facts about +the behaviour of your program, and also to make queries. +In particular, your program can tell Valgrind about changes in +memory range permissions that Valgrind would not otherwise know +about, and so allows clients to get Valgrind to do arbitrary +custom checks. -All parts of the system make heavy use of assertions and -internal self-checks. They are permanently enabled, and we have no -plans to disable them. If one of them breaks, please mail us! +Clients need to include a header file to make this work. +Which header file depends on which client requests you use. Some +client requests are handled by the core, and are defined in the +header file valgrind/valgrind.h. Tool-specific +header files are named after the tool, e.g. +valgrind/memcheck.h. All header files can be found +in the include/valgrind directory of wherever Valgrind +was installed. -If you get an assertion failure -in m_mallocfree.c, this may have happened because -your program wrote off the end of a malloc'd block, or before its -beginning. Valgrind hopefully will have emitted a proper message to that -effect before dying in this way. This is a known problem which -we should fix. - -Read the for more advice about common problems, -crashes, etc. - - - - - - -Limitations - -The following list of limitations seems long. However, most -programs actually work fine. - -Valgrind will run Linux ELF binaries, on a kernel 2.4.X or 2.6.X -system, on the x86, amd64, ppc32 and ppc64 architectures, subject to the -following constraints: - - - - On x86 and amd64, there is no support for 3DNow! instructions. - If the translator encounters these, Valgrind will generate a SIGILL - when the instruction is executed. Apart from that, on x86 and amd64, - essentially all instructions are supported, up to and including SSE3. - - - On ppc32 and ppc64, almost all integer, floating point and Altivec - instructions are supported. Specifically: integer and FP insns that are - mandatory for PowerPC, the "General-purpose optional" group (fsqrt, fsqrts, - stfiwx), the "Graphics optional" group (fre, fres, frsqrte, frsqrtes), and - the Altivec (also known as VMX) SIMD instruction set, are supported. - - - - Atomic instruction sequences are not properly supported, in the - sense that their atomicity is not preserved. This will affect any - use of synchronization via memory shared between processes. They - will appear to work, but fail sporadically. - - - - If your program does its own memory management, rather than - using malloc/new/free/delete, it should still work, but Valgrind's - error checking won't be so effective. If you describe your program's - memory management scheme using "client requests" - (see ), Memcheck can do - better. Nevertheless, using malloc/new and free/delete is still the - best approach. - - - - Valgrind's signal simulation is not as robust as it could be. - Basic POSIX-compliant sigaction and sigprocmask functionality is - supplied, but it's conceivable that things could go badly awry if you - do weird things with signals. Workaround: don't. Programs that do - non-POSIX signal tricks are in any case inherently unportable, so - should be avoided if possible. - - - - Machine instructions, and system calls, have been implemented - on demand. So it's possible, although unlikely, that a program will - fall over with a message to that effect. If this happens, please - report all the details printed out, so we can try and implement the - missing feature. - +The macros in these header files have the magical property +that they generate code in-line which Valgrind can spot. +However, the code does nothing when not run on Valgrind, so you +are not forced to run your program under Valgrind just because you +use the macros in this file. Also, you are not required to link your +program with any extra supporting libraries. - - Memory consumption of your program is majorly increased whilst - running under Valgrind. This is due to the large amount of - administrative information maintained behind the scenes. Another - cause is that Valgrind dynamically translates the original - executable. Translated, instrumented code is 12-18 times larger than - the original so you can easily end up with 50+ MB of translations - when running (eg) a web browser. - +The code added to your binary has negligible performance impact: +on x86, amd64, ppc32 and ppc64, the overhead is 6 simple integer instructions +and is probably undetectable except in tight loops. +However, if you really wish to compile out the client requests, you can +compile with -DNVALGRIND (analogous to +-DNDEBUG's effect on +assert()). + - - Valgrind can handle dynamically-generated code just fine. If - you regenerate code over the top of old code (ie. at the same memory - addresses), if the code is on the stack Valgrind will realise the - code has changed, and work correctly. This is necessary to handle - the trampolines GCC uses to implemented nested functions. If you - regenerate code somewhere other than the stack, you will need to use - the flag, and Valgrind will run more - slowly than normal. - +You are encouraged to copy the valgrind/*.h headers +into your project's include directory, so your program doesn't have a +compile-time dependency on Valgrind being installed. The Valgrind headers, +unlike most of the rest of the code, are under a BSD-style license so you may +include them without worrying about license incompatibility. - - As of version 3.0.0, Valgrind has the following limitations - in its implementation of x86/AMD64 floating point relative to - IEEE754. +Here is a brief description of the macros available in +valgrind.h, which work with more than one +tool (see the tool-specific documentation for explanations of the +tool-specific macros). - Precision: There is no support for 80 bit arithmetic. - Internally, Valgrind represents all such "long double" numbers in 64 - bits, and so there may be some differences in results. Whether or - not this is critical remains to be seen. Note, the x86/amd64 - fldt/fstpt instructions (read/write 80-bit numbers) are correctly - simulated, using conversions to/from 64 bits, so that in-memory - images of 80-bit numbers look correct if anyone wants to see. + - The impression observed from many FP regression tests is that - the accuracy differences aren't significant. Generally speaking, if - a program relies on 80-bit precision, there may be difficulties - porting it to non x86/amd64 platforms which only support 64-bit FP - precision. Even on x86/amd64, the program may get different results - depending on whether it is compiled to use SSE2 instructions (64-bits - only), or x87 instructions (80-bit). The net effect is to make FP - programs behave as if they had been run on a machine with 64-bit IEEE - floats, for example PowerPC. On amd64 FP arithmetic is done by - default on SSE2, so amd64 looks more like PowerPC than x86 from an FP - perspective, and there are far fewer noticeable accuracy differences - than with x86. + + RUNNING_ON_VALGRIND: + + Returns 1 if running on Valgrind, 0 if running on the + real CPU. If you are running Valgrind on itself, returns the + number of layers of Valgrind emulation you're running on. + + + - Rounding: Valgrind does observe the 4 IEEE-mandated rounding - modes (to nearest, to +infinity, to -infinity, to zero) for the - following conversions: float to integer, integer to float where - there is a possibility of loss of precision, and float-to-float - rounding. For all other FP operations, only the IEEE default mode - (round to nearest) is supported. + + VALGRIND_DISCARD_TRANSLATIONS: + + Discards translations of code in the specified address + range. Useful if you are debugging a JIT compiler or some other + dynamic code generation system. After this call, attempts to + execute code in the invalidated address range will cause + Valgrind to make new translations of that code, which is + probably the semantics you want. Note that code invalidations + are expensive because finding all the relevant translations + quickly is very difficult. So try not to call it often. + Note that you can be clever about + this: you only need to call it when an area which previously + contained code is overwritten with new code. You can choose + to write code into fresh memory, and just call this + occasionally to discard large chunks of old code all at + once. + + Alternatively, for transparent self-modifying-code support, + use--smc-check=all, or run + on ppc32/Linux or ppc64/Linux. + + + - Numeric exceptions in FP code: IEEE754 defines five types of - numeric exception that can happen: invalid operation (sqrt of - negative number, etc), division by zero, overflow, underflow, - inexact (loss of precision). + + VALGRIND_COUNT_ERRORS: + + Returns the number of errors found so far by Valgrind. Can be + useful in test harness code when combined with the + option; this runs Valgrind silently, + but the client program can detect when errors occur. Only useful + for tools that report errors, e.g. it's useful for Memcheck, but for + Cachegrind it will always return zero because Cachegrind doesn't + report errors. + + - For each exception, two courses of action are defined by IEEE754: - either (1) a user-defined exception handler may be called, or (2) a - default action is defined, which "fixes things up" and allows the - computation to proceed without throwing an exception. + + VALGRIND_MALLOCLIKE_BLOCK: + + If your program manages its own memory instead of using + the standard malloc() / + new / + new[], tools that track + information about heap blocks will not do nearly as good a + job. For example, Memcheck won't detect nearly as many + errors, and the error messages won't be as informative. To + improve this situation, use this macro just after your custom + allocator allocates some new memory. See the comments in + valgrind.h for information on how to use + it. + + - Currently Valgrind only supports the default fixup actions. - Again, feedback on the importance of exception support would be - appreciated. + + VALGRIND_FREELIKE_BLOCK: + + This should be used in conjunction with + VALGRIND_MALLOCLIKE_BLOCK. + Again, see memcheck/memcheck.h for + information on how to use it. + + - When Valgrind detects that the program is trying to exceed any - of these limitations (setting exception handlers, rounding mode, or - precision control), it can print a message giving a traceback of - where this has happened, and continue execution. This behaviour used - to be the default, but the messages are annoying and so showing them - is now disabled by default. Use to see - them. + + VALGRIND_CREATE_MEMPOOL: + + This is similar to + VALGRIND_MALLOCLIKE_BLOCK, + but is tailored towards code that uses memory pools. See the + comments in valgrind.h for information + on how to use it. + + + + + VALGRIND_DESTROY_MEMPOOL: + + This should be used in conjunction with + VALGRIND_CREATE_MEMPOOL. + Again, see the comments in valgrind.h for + information on how to use it. + + - The above limitations define precisely the IEEE754 'default' - behaviour: default fixup on all exceptions, round-to-nearest - operations, and 64-bit precision. - + + VALGRIND_MEMPOOL_ALLOC: + + This should be used in conjunction with + VALGRIND_CREATE_MEMPOOL. + Again, see the comments in valgrind.h for + information on how to use it. + + - - As of version 3.0.0, Valgrind has the following limitations in - its implementation of x86/AMD64 SSE2 FP arithmetic, relative to - IEEE754. - - Essentially the same: no exceptions, and limited observance of - rounding mode. Also, SSE2 has control bits which make it treat - denormalised numbers as zero (DAZ) and a related action, flush - denormals to zero (FTZ). Both of these cause SSE2 arithmetic to be - less accurate than IEEE requires. Valgrind detects, ignores, and can - warn about, attempts to enable either mode. - - - - As of version 3.2.0, Valgrind has the following limitations - in its implementation of PPC32 and PPC64 floating point - arithmetic, relative to IEEE754. - - Scalar (non-Altivec): Valgrind provides a bit-exact emulation of - all floating point instructions, except for "fre" and "fres", which are - done more precisely than required by the PowerPC architecture specification. - All floating point operations observe the current rounding mode. - - - However, fpscr[FPRF] is not set after each operation. That could - be done but would give measurable performance overheads, and so far - no need for it has been found. + + VALGRIND_MEMPOOL_FREE: + + This should be used in conjunction with + VALGRIND_CREATE_MEMPOOL. + Again, see the comments in valgrind.h for + information on how to use it. + + - As on x86/AMD64, IEEE754 exceptions are not supported: all floating - point exceptions are handled using the default IEEE fixup actions. - Valgrind detects, ignores, and can warn about, attempts to unmask - the 5 IEEE FP exception kinds by writing to the floating-point status - and control register (fpscr). - + + VALGRIND_NON_SIMD_CALL[0123]: + + Executes a function of 0, 1, 2 or 3 args in the client + program on the real CPU, not the virtual + CPU that Valgrind normally runs code on. These are used in + various ways internally to Valgrind. They might be useful to + client programs. - Vector (Altivec, VMX): essentially as with x86/AMD64 SSE/SSE2: - no exceptions, and limited observance of rounding mode. - For Altivec, FP arithmetic - is done in IEEE/Java mode, which is more accurate than the Linux default - setting. "More accurate" means that denormals are handled properly, - rather than simply being flushed to zero. - - + Warning: Only use these if you + really know what you are doing. + + - Programs which are known not to work are: - - - emacs starts up but immediately concludes it is out of - memory and aborts. It may be that Memcheck does not provide - a good enough emulation of the - mallinfo function. - Emacs works fine if you build it to use - the standard malloc/free routines. - - + + VALGRIND_PRINTF(format, ...): + + printf a message to the log file when running under + Valgrind. Nothing is output if not running under Valgrind. + Returns the number of characters output. + + - + + VALGRIND_PRINTF_BACKTRACE(format, ...): + + printf a message to the log file along with a stack + backtrace when running under Valgrind. Nothing is output if + not running under Valgrind. Returns the number of characters + output. + + + + VALGRIND_STACK_REGISTER(start, end): + + Registers a new stack. Informs Valgrind that the memory range + between start and end is a unique stack. Returns a stack identifier + that can be used with other + VALGRIND_STACK_* calls. + Valgrind will use this information to determine if a change to + the stack pointer is an item pushed onto the stack or a change over + to a new stack. Use this if you're using a user-level thread package + and are noticing spurious errors from Valgrind about uninitialized + memory reads. + + - -An Example Run + + VALGRIND_STACK_DEREGISTER(id): + + Deregisters a previously registered stack. Informs + Valgrind that previously registered memory range with stack id + id is no longer a stack. + + -This is the log for a run of a small program using Memcheck. -The program is in fact correct, and the reported error is as the -result of a potentially serious code generation bug in GNU g++ -(snapshot 20010527). + + VALGRIND_STACK_CHANGE(id, start, end): + + Changes a previously registered stack. Informs + Valgrind that the previously registered stack with stack id + id has changed its start and end + values. Use this if your user-level thread package implements + stack growth. + + - + -The GCC folks fixed this about a week before gcc-3.0 -shipped. +Note that valgrind.h is included by +all the tool-specific header files (such as +memcheck.h), so you don't need to include it +in your client if you include a tool-specific header. - -Warning Messages You Might See -Most of these only appear if you run in verbose mode -(enabled by -v): - - - More than 100 errors detected. Subsequent - errors will still be recorded, but in less detail than - before. + +Function wrapping - After 100 different errors have been shown, Valgrind becomes - more conservative about collecting them. It then requires only the - program counters in the top two stack frames to match when deciding - whether or not two errors are really the same one. Prior to this - point, the PCs in the top four frames are required to match. This - hack has the effect of slowing down the appearance of new errors - after the first 100. The 100 constant can be changed by recompiling - Valgrind. - + +Valgrind versions 3.2.0 and above can do function wrapping on all +supported targets. In function wrapping, calls to some specified +function are intercepted and rerouted to a different, user-supplied +function. This can do whatever it likes, typically examining the +arguments, calling onwards to the original, and possibly examining the +result. Any number of functions may be wrapped. - - More than 1000 errors detected. I'm not - reporting any more. Final error counts may be inaccurate. Go fix - your program! + +Function wrapping is useful for instrumenting an API in some way. For +example, wrapping functions in the POSIX pthreads API makes it +possible to notify Valgrind of thread status changes, and wrapping +functions in the MPI (message-passing) API allows notifying Valgrind +of memory status changes associated with message arrival/departure. +Such information is usually passed to Valgrind by using client +requests in the wrapper functions, although that is not of relevance +here. - After 1000 different errors have been detected, Valgrind - ignores any more. It seems unlikely that collecting even more - different ones would be of practical help to anybody, and it avoids - the danger that Valgrind spends more and more of its time comparing - new errors against an ever-growing collection. As above, the 1000 - number is a compile-time constant. - + +A Simple Example - - Warning: client switching stacks? +Supposing we want to wrap some function - Valgrind spotted such a large change in the stack pointer - that it guesses the client is switching to - a different stack. At this point it makes a kludgey guess where the - base of the new stack is, and sets memory permissions accordingly. - You may get many bogus error messages following this, if Valgrind - guesses wrong. At the moment "large change" is defined as a change - of more that 2000000 in the value of the - stack pointer register. - + - - Warning: client attempted to close Valgrind's - logfile fd <number> +A wrapper is a function of identical type, but with a special name +which identifies it as the wrapper for foo. +Wrappers need to include +supporting macros from valgrind.h. +Here is a simple wrapper which prints the arguments and return value: - Valgrind doesn't allow the client to close the logfile, - because you'd never see any diagnostic information after that point. - If you see this message, you may want to use the - option to specify a - different logfile file-descriptor number. - + +#include "valgrind.h" +int I_WRAP_SONAME_FNNAME_ZU(NONE,foo)( int x, int y ) +{ + int result; + OrigFn fn; + VALGRIND_GET_ORIG_FN(fn); + printf("foo's wrapper: args %d %d\n", x, y); + CALL_FN_W_WW(result, fn, x,y); + printf("foo's wrapper: result %d\n", result); + return result; +} +]]> - - Warning: noted but unhandled ioctl - <number> +To become active, the wrapper merely needs to be present in a text +section somewhere in the same process' address space as the function +it wraps, and for its ELF symbol name to be visible to Valgrind. In +practice, this means either compiling to a +.o and linking it in, or +compiling to a .so and +LD_PRELOADing it in. The latter is more +convenient in that it doesn't require relinking. - Valgrind observed a call to one of the vast family of - ioctl system calls, but did not - modify its memory status info (because nobody has yet written a - suitable wrapper). The call will still have gone through, but you may get - spurious errors after this as a result of the non-update of the - memory info. - +All wrappers have approximately the above form. There are three +crucial macros: - - Warning: set address range perms: large range - <number> +I_WRAP_SONAME_FNNAME_ZU: +this generates the real name of the wrapper. +This is an encoded name which Valgrind notices when reading symbol +table information. What it says is: I am the wrapper for any function +named foo which is found in +an ELF shared object with an empty +("NONE") soname field. The specification +mechanism is powerful in +that wildcards are allowed for both sonames and function names. +The details are discussed below. - Diagnostic message, mostly for benefit of the Valgrind - developers, to do with memory permissions. - +VALGRIND_GET_ORIG_FN: +once in the the wrapper, the first priority is +to get hold of the address of the original (and any other supporting +information needed). This is stored in a value of opaque +type OrigFn. +The information is acquired using +VALGRIND_GET_ORIG_FN. It is crucial +to make this macro call before calling any other wrapped function +in the same thread. - +CALL_FN_W_WW: eventually we will +want to call the function being +wrapped. Calling it directly does not work, since that just gets us +back to the wrapper and tends to kill the program in short order by +stack overflow. Instead, the result lvalue, +OrigFn and arguments are +handed to one of a family of macros of the form +CALL_FN_*. These +cause Valgrind to call the original and avoid recursion back to the +wrapper. + + + +Wrapping Specifications + +This scheme has the advantage of being self-contained. A library of +wrappers can be compiled to object code in the normal way, and does +not rely on an external script telling Valgrind which wrappers pertain +to which originals. - +Each wrapper has a name which, in the most general case says: I am the +wrapper for any function whose name matches FNPATT and whose ELF +"soname" matches SOPATT. Both FNPATT and SOPATT may contain wildcards +(asterisks) and other characters (spaces, dots, @, etc) which are not +generally regarded as valid C identifier names. +This flexibility is needed to write robust wrappers for POSIX pthread +functions, where typically we are not completely sure of either the +function name or the soname, or alternatively we want to wrap a whole +set of functions at once. - -Debugging MPI Parallel Programs with Valgrind - - Valgrind supports debugging of distributed-memory applications -which use the MPI message passing standard. This support consists of a -library of wrapper functions for the -PMPI_* interface. When incorporated -into the application's address space, either by direct linking or by -LD_PRELOAD, the wrappers intercept -calls to PMPI_Send, -PMPI_Recv, etc. They then -use client requests to inform Valgrind of memory state changes caused -by the function being wrapped. This reduces the number of false -positives that Memcheck otherwise typically reports for MPI -applications. - -The wrappers also take the opportunity to carefully check -size and definedness of buffers passed as arguments to MPI functions, hence -detecting errors such as passing undefined data to -PMPI_Send, or receiving data into a -buffer which is too small. - -Unlike most of the rest of Valgrind, the wrapper library is subject to a -BSD-style license, so you can link it into any code base you like. -See the top of auxprogs/libmpiwrap.c -for license details. - - - -Building and installing the wrappers - - The wrapper library will be built automatically if possible. -Valgrind's configure script will look for a suitable -mpicc to build it with. This must be -the same mpicc you use to build the -MPI application you want to debug. By default, Valgrind tries -mpicc, but you can specify a -different one by using the configure-time flag ---with-mpicc=. Currently the -wrappers are only buildable with -mpiccs which are based on GNU -gcc or Intel's -icc. - -Check that the configure script prints a line like this: +For example, pthread_create +in GNU libpthread is usually a +versioned symbol - one whose name ends in, eg, +@GLIBC_2.3. Hence we +are not sure what its real name is. We also want to cover any soname +of the form libpthread.so*. +So the header of the wrapper will be -If it says ... no, your -mpicc has failed to compile and link -a test MPI2 program. - -If the configure test succeeds, continue in the usual way with -make and make -install. The final install tree should then contain -libmpiwrap.so. - - -Compile up a test MPI program (eg, MPI hello-world) and try -this: +In order to write unusual characters as valid C function names, a +Z-encoding scheme is used. Names are written literally, except that +a capital Z acts as an escape character, with the following encoding: /libmpiwrap.so \ - mpirun [args] $prefix/bin/valgrind ./hello + Za encodes * + Zp + + Zc : + Zd . + Zu _ + Zh - + Zs (space) + ZA @ + ZZ Z + ZL ( # only in valgrind 3.3.0 and later + ZR ) # only in valgrind 3.3.0 and later ]]> -You should see something similar to the following +Hence libpthreadZdsoZd0 is an +encoding of the soname libpthread.so.0 +and pthreadZucreateZAZa is an encoding +of the function name pthread_create@*. + - +The macro I_WRAP_SONAME_FNNAME_ZZ +constructs a wrapper name in which +both the soname (first component) and function name (second component) +are Z-encoded. Encoding the function name can be tiresome and is +often unnecessary, so a second macro, +I_WRAP_SONAME_FNNAME_ZU, can be +used instead. The _ZU variant is +also useful for writing wrappers for +C++ functions, in which the function name is usually already mangled +using some other convention in which Z plays an important role. Having +to encode a second time quickly becomes confusing. -repeated for every process in the group. If you do not see -these, there is an build/installation problem of some kind. +Since the function name field may contain wildcards, it can be +anything, including just *. +The same is true for the soname. +However, some ELF objects - specifically, main executables - do not +have sonames. Any object lacking a soname is treated as if its soname +was NONE, which is why the original +example above had a name +I_WRAP_SONAME_FNNAME_ZU(NONE,foo). - The MPI functions to be wrapped are assumed to be in an ELF -shared object with soname matching -libmpi.so*. This is known to be -correct at least for Open MPI and Quadrics MPI, and can easily be -changed if required. +Note that the soname of an ELF object is not the same as its +file name, although it is often similar. You can find the soname of +an object libfoo.so using the command +readelf -a libfoo.so | grep soname. + +Wrapping Semantics - -Getting started +The ability for a wrapper to replace an infinite family of functions +is powerful but brings complications in situations where ELF objects +appear and disappear (are dlopen'd and dlclose'd) on the fly. +Valgrind tries to maintain sensible behaviour in such situations. -Compile your MPI application as usual, taking care to link it -using the same mpicc that your -Valgrind build was configured with. +For example, suppose a process has dlopened (an ELF object with +soname) object1.so, which contains +function1. It starts to use +function1 immediately. - -Use the following basic scheme to run your application on Valgrind with -the wrappers engaged: +After a while it dlopens wrappers.so, +which contains a wrapper +for function1 in (soname) +object1.so. All subsequent calls to +function1 are rerouted to the wrapper. -/libmpiwrap.so \ - mpirun [mpirun-args] \ - $prefix/bin/valgrind [valgrind-args] \ - [application] [app-args] -]]> +If wrappers.so is +later dlclose'd, calls to function1 are +naturally routed back to the original. + +Alternatively, if object1.so +is dlclose'd but wrappers.so remains, +then the wrapper exported by wrapper.so +becomes inactive, since there +is no way to get to it - there is no original to call any more. However, +Valgrind remembers that the wrapper is still present. If +object1.so is +eventually dlopen'd again, the wrapper will become active again. + +In short, valgrind inspects all code loading/unloading events to +ensure that the set of currently active wrappers remains consistent. -As an alternative to -LD_PRELOADing -libmpiwrap.so, you can simply link it -to your application if desired. This should not disturb native -behaviour of your application in any way. +A second possible problem is that of conflicting wrappers. It is +easily possible to load two or more wrappers, both of which claim +to be wrappers for some third function. In such cases Valgrind will +complain about conflicting wrappers when the second one appears, and +will honour only the first one. + +Debugging - -Controlling the wrapper library +Figuring out what's going on given the dynamic nature of wrapping +can be difficult. The +--trace-redir=yes flag makes +this possible +by showing the complete state of the redirection subsystem after +every +mmap/munmap +event affecting code (text). -Environment variable -MPIWRAP_DEBUG is consulted at -startup. The default behaviour is to print a starting banner +There are two central concepts: - + - and then be relatively quiet. + A "redirection specification" is a binding of + a (soname pattern, fnname pattern) pair to a code address. + These bindings are created by writing functions with names + made with the + I_WRAP_SONAME_FNNAME_{ZZ,_ZU} + macros. -You can give a list of comma-separated options in -MPIWRAP_DEBUG. These are + An "active redirection" is code-address to + code-address binding currently in effect. - - - verbose: - show entries/exits of all wrappers. Also show extra - debugging info, such as the status of outstanding - MPI_Requests resulting - from uncompleted MPI_Irecvs. - - - quiet: - opposite of verbose, only print - anything when the wrappers want - to report a detected programming error, or in case of catastrophic - failure of the wrappers. - - - warn: - by default, functions which lack proper wrappers - are not commented on, just silently - ignored. This causes a warning to be printed for each unwrapped - function used, up to a maximum of three warnings per function. - - - strict: - print an error message and abort the program if - a function lacking a wrapper is used. - - If you want to use Valgrind's XML output facility -(--xml=yes), you should pass -quiet in -MPIWRAP_DEBUG so as to get rid of any -extraneous printing from the wrappers. +The state of the wrapping-and-redirection subsystem comprises a set of +specifications and a set of active bindings. The specifications are +acquired/discarded by watching all +mmap/munmap +events on code (text) +sections. The active binding set is (conceptually) recomputed from +the specifications, and all known symbol names, following any change +to the specification set. - +--trace-redir=yes shows the contents +of both sets following any such event. +-v prints a line of text each +time an active specification is used for the first time. - -Abilities and limitations +Hence for maximum debugging effectiveness you will need to use both +flags. - -Functions +One final comment. The function-wrapping facility is closely +tied to Valgrind's ability to replace (redirect) specified +functions, for example to redirect calls to +malloc to its +own implementation. Indeed, a replacement function can be +regarded as a wrapper function which does not call the original. +However, to make the implementation more robust, the two kinds +of interception (wrapping vs replacement) are treated differently. + -All MPI2 functions except -MPI_Wtick, -MPI_Wtime and -MPI_Pcontrol have wrappers. The -first two are not wrapped because they return a -double, and Valgrind's -function-wrap mechanism cannot handle that (it could easily enough be -extended to). MPI_Pcontrol cannot be -wrapped as it has variable arity: -int MPI_Pcontrol(const int level, ...) +--trace-redir=yes shows +specifications and bindings for both +replacement and wrapper functions. To differentiate the +two, replacement bindings are printed using +R-> whereas +wraps are printed using W->. + + -Most functions are wrapped with a default wrapper which does -nothing except complain or abort if it is called, depending on -settings in MPIWRAP_DEBUG listed -above. The following functions have "real", do-something-useful -wrappers: - +Limitations - control flow -PMPI_Recv PMPI_Get_count +For the most part, the function wrapping implementation is robust. +The only important caveat is: in a wrapper, get hold of +the OrigFn information using +VALGRIND_GET_ORIG_FN before calling any +other wrapped function. Once you have the +OrigFn, arbitrary +calls between, recursion between, and longjumps out of wrappers +should work correctly. There is never any interaction between wrapped +functions and merely replaced functions +(eg malloc), so you can call +malloc etc safely from within wrappers. + -PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend +The above comments are true for {x86,amd64,ppc32}-linux. On +ppc64-linux function wrapping is more fragile due to the (arguably +poorly designed) ppc64-linux ABI. This mandates the use of a shadow +stack which tracks entries/exits of both wrapper and replacement +functions. This gives two limitations: firstly, longjumping out of +wrappers will rapidly lead to disaster, since the shadow stack will +not get correctly cleared. Secondly, since the shadow stack has +finite size, recursion between wrapper/replacement functions is only +possible to a limited depth, beyond which Valgrind has to abort the +run. This depth is currently 16 calls. -PMPI_Irecv -PMPI_Wait PMPI_Waitall -PMPI_Test PMPI_Testall +For all platforms ({x86,amd64,ppc32,ppc64}-linux) all the above +comments apply on a per-thread basis. In other words, wrapping is +thread-safe: each thread must individually observe the above +restrictions, but there is no need for any kind of inter-thread +cooperation. + -PMPI_Iprobe PMPI_Probe -PMPI_Cancel + +Limitations - original function signatures -PMPI_Sendrecv +As shown in the above example, to call the original you must use a +macro of the form CALL_FN_*. +For technical reasons it is impossible +to create a single macro to deal with all argument types and numbers, +so a family of macros covering the most common cases is supplied. In +what follows, 'W' denotes a machine-word-typed value (a pointer or a +C long), +and 'v' denotes C's void type. +The currently available macros are: -PMPI_Type_commit PMPI_Type_free + - A few functions such as -PMPI_Address are listed as -HAS_NO_WRAPPER. They have no wrapper -at all as there is nothing worth checking, and giving a no-op wrapper -would reduce performance for no reason. - - Note that the wrapper library itself can itself generate large -numbers of calls to the MPI implementation, especially when walking -complex types. The most common functions called are -PMPI_Extent, -PMPI_Type_get_envelope, -PMPI_Type_get_contents, and -PMPI_Type_free. - - - -Types - - MPI-1.1 structured types are supported, and walked exactly. -The currently supported combiners are -MPI_COMBINER_NAMED, -MPI_COMBINER_CONTIGUOUS, -MPI_COMBINER_VECTOR, -MPI_COMBINER_HVECTOR -MPI_COMBINER_INDEXED, -MPI_COMBINER_HINDEXED and -MPI_COMBINER_STRUCT. This should -cover all MPI-1.1 types. The mechanism (function -walk_type) should extend easily to -cover MPI2 combiners. - -MPI defines some named structured types -(MPI_FLOAT_INT, -MPI_DOUBLE_INT, -MPI_LONG_INT, -MPI_2INT, -MPI_SHORT_INT, -MPI_LONG_DOUBLE_INT) which are pairs -of some basic type and a C int. -Unfortunately the MPI specification makes it impossible to look inside -these types and see where the fields are. Therefore these wrappers -assume the types are laid out as struct { float val; -int loc; } (for -MPI_FLOAT_INT), etc, and act -accordingly. This appears to be correct at least for Open MPI 1.0.2 -and for Quadrics MPI. - -If strict is an option specified -in MPIWRAP_DEBUG, the application -will abort if an unhandled type is encountered. Otherwise, the -application will print a warning message and continue. - -Some effort is made to mark/check memory ranges corresponding to -arrays of values in a single pass. This is important for performance -since asking Valgrind to mark/check any range, no matter how small, -carries quite a large constant cost. This optimisation is applied to -arrays of primitive types (double, -float, -int, -long, long -long, short, -char, and long -double on platforms where sizeof(long -double) == 8). For arrays of all other types, the -wrappers handle each element individually and so there can be a very -large performance cost. - - +The set of supported types can be expanded as needed. It is +regrettable that this limitation exists. Function wrapping has proven +difficult to implement, with a certain apparently unavoidable level of +ickyness. After several implementation attempts, the present +arrangement appears to be the least-worst tradeoff. At least it works +reliably in the presence of dynamic linking and dynamic code +loading/unloading. +You should not attempt to wrap a function of one type signature with a +wrapper of a different type signature. Such trickery will surely lead +to crashes or strange behaviour. This is not of course a limitation +of the function wrapping implementation, merely a reflection of the +fact that it gives you sweeping powers to shoot yourself in the foot +if you are not careful. Imagine the instant havoc you could wreak by +writing a wrapper which matched any function name in any soname - in +effect, one which claimed to be a wrapper for all functions in the +process. + +Examples - -Writing new wrappers +In the source tree, +memcheck/tests/wrap[1-8].c provide a series of +examples, ranging from very simple to quite advanced. - -For the most part the wrappers are straightforward. The only -significant complexity arises with nonblocking receives. - -The issue is that MPI_Irecv -states the recv buffer and returns immediately, giving a handle -(MPI_Request) for the transaction. -Later the user will have to poll for completion with -MPI_Wait etc, and when the -transaction completes successfully, the wrappers have to paint the -recv buffer. But the recv buffer details are not presented to -MPI_Wait -- only the handle is. The -library therefore maintains a shadow table which associates -uncompleted MPI_Requests with the -corresponding buffer address/count/type. When an operation completes, -the table is searched for the associated address/count/type info, and -memory is marked accordingly. - -Access to the table is guarded by a (POSIX pthreads) lock, so as -to make the library thread-safe. - -The table is allocated with -malloc and never -freed, so it will show up in leak -checks. - -Writing new wrappers should be fairly easy. The source file is -auxprogs/libmpiwrap.c. If possible, -find an existing wrapper for a function of similar behaviour to the -one you want to wrap, and use it as a starting point. The wrappers -are organised in sections in the same order as the MPI 1.1 spec, to -aid navigation. When adding a wrapper, remember to comment out the -definition of the default wrapper in the long list of defaults at the -bottom of the file (do not remove it, just comment it out). +auxprogs/libmpiwrap.c is an example +of wrapping a big, complex API (the MPI-2 interface). This file defines +almost 300 different wrappers. - -What to expect when using the wrappers - -The wrappers should reduce Memcheck's false-error rate on MPI -applications. Because the wrapping is done at the MPI interface, -there will still potentially be a large number of errors reported in -the MPI implementation below the interface. The best you can do is -try to suppress them. - -You may also find that the input-side (buffer -length/definedness) checks find errors in your MPI use, for example -passing too short a buffer to -MPI_Recv. - -Functions which are not wrapped may increase the false -error rate. A possible approach is to run with -MPI_DEBUG containing -warn. This will show you functions -which lack proper wrappers but which are nevertheless used. You can -then write wrappers for them. - + -A known source of potential false errors are the -PMPI_Reduce family of functions, when -using a custom (user-defined) reduction function. In a reduction -operation, each node notionally sends data to a "central point" which -uses the specified reduction function to merge the data items into a -single item. Hence, in general, data is passed between nodes and fed -to the reduction function, but the wrapper library cannot mark the -transferred data as initialised before it is handed to the reduction -function, because all that happens "inside" the -PMPI_Reduce call. As a result you -may see false positives reported in your reduction function. - - diff --git a/docs/xml/manual-intro.xml b/docs/xml/manual-intro.xml index 7a4152d0eb..a43fae5be8 100644 --- a/docs/xml/manual-intro.xml +++ b/docs/xml/manual-intro.xml @@ -11,7 +11,7 @@ Valgrind is a suite of simulation-based debugging and profiling tools for programs running on Linux (x86, amd64, ppc32 and ppc64). The system consists of a core, which provides a synthetic CPU in -software, and a series of tools, each of which performs some kind of +software, and a set of tools, each of which performs some kind of debugging, profiling, or similar task. The architecture is modular, so that new tools can be created easily and without disturbing the existing structure. @@ -106,6 +106,30 @@ summary, these are: paging needed. + + Helgrind detects synchronisation errors + in programs that use the POSIX pthreads threading primitives. It + detects the following three classes of errors: + + + + Misuses of the POSIX pthreads API. + + + Potential deadlocks arising from lock ordering + problems. + + + Data races -- accessing memory without adequate locking. + + + + Problems like these often result in unreproducible, + timing-dependent crashes, deadlocks and other misbehaviour, and + can be difficult to find by other means. + + + @@ -119,19 +143,22 @@ integer and floating point operations your program does. Valgrind is closely tied to details of the CPU and operating system, and to a lesser extent, the compiler and basic C libraries. -Nonetheless, as of version 3.2.0 it supports several platforms: +Nonetheless, as of version 3.3.0 it supports several platforms: x86/Linux (mature), amd64/Linux (maturing), ppc32/Linux and -ppc64/Linux (less mature but work well). Valgrind uses the standard Unix +ppc64/Linux (less mature but work well). There is also experimental +support for ppc32/AIX5 and ppc64/AIX5 (AIX 5.2 and 5.3 only). +Valgrind uses the standard Unix ./configure, make, make install mechanism, and we have attempted to ensure that it works on machines with Linux kernel 2.4.X or 2.6.X and glibc -2.2.X to 2.5.X. +2.2.X to 2.7.X. Valgrind is licensed under the , version 2. The valgrind/*.h headers that you may wish to include in your code (eg. -valgrind.h, memcheck.h) are +valgrind.h, memcheck.h, +helgrind.h) are distributed under a BSD-style license, so you may include them in your code without worrying about license conflicts. Some of the PThreads test cases, pth_*.c, are taken from "Pthreads @@ -139,6 +166,13 @@ Programming" by Bradford Nichols, Dick Buttlar & Jacqueline Proulx Farrell, ISBN 1-56592-115-1, published by O'Reilly & Associates, Inc. +If you contribute code to Valgrind, please ensure your +contributions are licensed as "GPLv2, or (at your option) any later +version." This is so as to allow the possibility of easily upgrading +the license to GPLv3 in future. If you want to modify code in the VEX +subdirectory, please also see VEX/HACKING.README. + + @@ -158,11 +192,15 @@ want to run the Memcheck tool. The final chapter explains how to write a new tool. Be aware that the core understands some command line flags, and -the tools have their own flags which they know about. This means there -is no central place describing all the flags that are accepted -- you -have to read the flags documentation both for +the tools have their own flags which they know about. This means +there is no central place describing all the flags that are +accepted -- you have to read the flags documentation both for and for the tool you want to use. +The manual is quite big and complex. If you are looking for a +quick getting-started guide, have a look at +. + diff --git a/docs/xml/quick-start-guide.xml b/docs/xml/quick-start-guide.xml index 69655bdbf0..773871bb7e 100644 --- a/docs/xml/quick-start-guide.xml +++ b/docs/xml/quick-start-guide.xml @@ -32,24 +32,64 @@ memory errors such as: - touching memory you shouldn't (eg. overrunning heap block - boundaries); + Touching memory you shouldn't (eg. overrunning heap block + boundaries, or reading/writing freed memory). - using values before they have been initialized; + Using values before they have been initialized. - incorrect freeing of memory, such as double-freeing heap - blocks; + Incorrect freeing of memory, such as double-freeing heap + blocks. - memory leaks. + Memory leaks. +Memcheck is only one of the tools in the Valgrind suite. +Other tools you may find useful are: + + + + Cachegrind: a profiling tool which produces detailed data on + cache (miss) and branch (misprediction) events. Statistics are + gathered for the entire program, for each function, for each line + of code, and even for each instruction, if you need that level of + detail. + + + Callgrind: a heavyweight profiling tool similar to + Cachegrind, but which also shows cost relationships across + function calls. Information gathered by Callgrind can be viewed + using the KCachegrind GUI. KCachegrind is not part of the + Valgrind suite - it is part of the KDE Desktop Environment. + + + Massif: a space profiling tool. It allows you to explore + in detail which parts of your program allocate memory. + + + Helgrind: a debugging tool for threaded programs. Helgrind + looks for various kinds of synchronisation errors in code that uses + the POSIX PThreads API. + + + In addition, there are a number of "experimental" tools in + the codebase. They can be distinguished by the "exp-" prefix on + their names. Experimental tools are not subject to the same + quality control standards that apply to our production-grade tools + (Memcheck, Cachegrind, Callgrind, Massif and Helgrind). + + + +The rest of this guide discusses only the Memcheck tool. For +full documentation on the other tools, see the Valgrind User +Manual. + What follows is the minimum information you need to start detecting memory errors in your program with Memcheck. Note that this -guide applies to Valgrind version 2.4.0 and later. Some of the +guide applies to Valgrind version 3.3.0 and later. Some of the information is not quite right for earlier versions. @@ -162,8 +202,9 @@ Things to notice: -It's worth fixing errors in the order they are reported, as later errors -can be caused by earlier errors. +It's worth fixing errors in the order they are reported, as later +errors can be caused by earlier errors. Failing to do this is a +common cause of difficulty with Memcheck. Memory leak messages look like this: @@ -219,6 +260,15 @@ that are allocated statically or on the stack. But it should detect many errors that could crash your program (eg. cause a segmentation fault). +Try to make your program so clean that Memcheck reports no +errors. Once you achieve this state, it is much easier to see when +changes to the program cause Memcheck to report new errors. +Experience from several years of Memcheck use shows that it is +possible to make even huge programs run Memcheck-clean. For example, +large parts of KDE 3.5.X, and recent versions of OpenOffice.org +(2.3.0) are Memcheck-clean, or very close to it. + + diff --git a/docs/xml/tech-docs.xml b/docs/xml/tech-docs.xml index 8615c1d807..552631331a 100644 --- a/docs/xml/tech-docs.xml +++ b/docs/xml/tech-docs.xml @@ -17,11 +17,14 @@ - - + + diff --git a/docs/xml/vg-entities.xml b/docs/xml/vg-entities.xml index 19f95e6127..d56e957a91 100644 --- a/docs/xml/vg-entities.xml +++ b/docs/xml/vg-entities.xml @@ -2,13 +2,13 @@ - + - - + + diff --git a/memcheck/docs/mc-manual.xml b/memcheck/docs/mc-manual.xml index b8c36a7ce9..f8444c76a2 100644 --- a/memcheck/docs/mc-manual.xml +++ b/memcheck/docs/mc-manual.xml @@ -1287,6 +1287,393 @@ inform Memcheck about changes to the state of a mempool: + + + + + + + + + +Debugging MPI Parallel Programs with Valgrind + + Valgrind supports debugging of distributed-memory applications +which use the MPI message passing standard. This support consists of a +library of wrapper functions for the +PMPI_* interface. When incorporated +into the application's address space, either by direct linking or by +LD_PRELOAD, the wrappers intercept +calls to PMPI_Send, +PMPI_Recv, etc. They then +use client requests to inform Valgrind of memory state changes caused +by the function being wrapped. This reduces the number of false +positives that Memcheck otherwise typically reports for MPI +applications. + +The wrappers also take the opportunity to carefully check +size and definedness of buffers passed as arguments to MPI functions, hence +detecting errors such as passing undefined data to +PMPI_Send, or receiving data into a +buffer which is too small. + +Unlike most of the rest of Valgrind, the wrapper library is subject to a +BSD-style license, so you can link it into any code base you like. +See the top of auxprogs/libmpiwrap.c +for license details. + + + +Building and installing the wrappers + + The wrapper library will be built automatically if possible. +Valgrind's configure script will look for a suitable +mpicc to build it with. This must be +the same mpicc you use to build the +MPI application you want to debug. By default, Valgrind tries +mpicc, but you can specify a +different one by using the configure-time flag +--with-mpicc=. Currently the +wrappers are only buildable with +mpiccs which are based on GNU +gcc or Intel's +icc. + +Check that the configure script prints a line like this: + + + +If it says ... no, your +mpicc has failed to compile and link +a test MPI2 program. + +If the configure test succeeds, continue in the usual way with +make and make +install. The final install tree should then contain +libmpiwrap.so. + + +Compile up a test MPI program (eg, MPI hello-world) and try +this: + +/libmpiwrap.so \ + mpirun [args] $prefix/bin/valgrind ./hello +]]> + +You should see something similar to the following + + + +repeated for every process in the group. If you do not see +these, there is an build/installation problem of some kind. + + The MPI functions to be wrapped are assumed to be in an ELF +shared object with soname matching +libmpi.so*. This is known to be +correct at least for Open MPI and Quadrics MPI, and can easily be +changed if required. + + + + +Getting started + +Compile your MPI application as usual, taking care to link it +using the same mpicc that your +Valgrind build was configured with. + + +Use the following basic scheme to run your application on Valgrind with +the wrappers engaged: + +/libmpiwrap.so \ + mpirun [mpirun-args] \ + $prefix/bin/valgrind [valgrind-args] \ + [application] [app-args] +]]> + +As an alternative to +LD_PRELOADing +libmpiwrap.so, you can simply link it +to your application if desired. This should not disturb native +behaviour of your application in any way. + + + + +Controlling the wrapper library + +Environment variable +MPIWRAP_DEBUG is consulted at +startup. The default behaviour is to print a starting banner + + + + and then be relatively quiet. + +You can give a list of comma-separated options in +MPIWRAP_DEBUG. These are + + + + verbose: + show entries/exits of all wrappers. Also show extra + debugging info, such as the status of outstanding + MPI_Requests resulting + from uncompleted MPI_Irecvs. + + + quiet: + opposite of verbose, only print + anything when the wrappers want + to report a detected programming error, or in case of catastrophic + failure of the wrappers. + + + warn: + by default, functions which lack proper wrappers + are not commented on, just silently + ignored. This causes a warning to be printed for each unwrapped + function used, up to a maximum of three warnings per function. + + + strict: + print an error message and abort the program if + a function lacking a wrapper is used. + + + + If you want to use Valgrind's XML output facility +(--xml=yes), you should pass +quiet in +MPIWRAP_DEBUG so as to get rid of any +extraneous printing from the wrappers. + + + + + +Abilities and limitations + + +Functions + +All MPI2 functions except +MPI_Wtick, +MPI_Wtime and +MPI_Pcontrol have wrappers. The +first two are not wrapped because they return a +double, and Valgrind's +function-wrap mechanism cannot handle that (it could easily enough be +extended to). MPI_Pcontrol cannot be +wrapped as it has variable arity: +int MPI_Pcontrol(const int level, ...) + +Most functions are wrapped with a default wrapper which does +nothing except complain or abort if it is called, depending on +settings in MPIWRAP_DEBUG listed +above. The following functions have "real", do-something-useful +wrappers: + + + + A few functions such as +PMPI_Address are listed as +HAS_NO_WRAPPER. They have no wrapper +at all as there is nothing worth checking, and giving a no-op wrapper +would reduce performance for no reason. + + Note that the wrapper library itself can itself generate large +numbers of calls to the MPI implementation, especially when walking +complex types. The most common functions called are +PMPI_Extent, +PMPI_Type_get_envelope, +PMPI_Type_get_contents, and +PMPI_Type_free. + + + +Types + + MPI-1.1 structured types are supported, and walked exactly. +The currently supported combiners are +MPI_COMBINER_NAMED, +MPI_COMBINER_CONTIGUOUS, +MPI_COMBINER_VECTOR, +MPI_COMBINER_HVECTOR +MPI_COMBINER_INDEXED, +MPI_COMBINER_HINDEXED and +MPI_COMBINER_STRUCT. This should +cover all MPI-1.1 types. The mechanism (function +walk_type) should extend easily to +cover MPI2 combiners. + +MPI defines some named structured types +(MPI_FLOAT_INT, +MPI_DOUBLE_INT, +MPI_LONG_INT, +MPI_2INT, +MPI_SHORT_INT, +MPI_LONG_DOUBLE_INT) which are pairs +of some basic type and a C int. +Unfortunately the MPI specification makes it impossible to look inside +these types and see where the fields are. Therefore these wrappers +assume the types are laid out as struct { float val; +int loc; } (for +MPI_FLOAT_INT), etc, and act +accordingly. This appears to be correct at least for Open MPI 1.0.2 +and for Quadrics MPI. + +If strict is an option specified +in MPIWRAP_DEBUG, the application +will abort if an unhandled type is encountered. Otherwise, the +application will print a warning message and continue. + +Some effort is made to mark/check memory ranges corresponding to +arrays of values in a single pass. This is important for performance +since asking Valgrind to mark/check any range, no matter how small, +carries quite a large constant cost. This optimisation is applied to +arrays of primitive types (double, +float, +int, +long, long +long, short, +char, and long +double on platforms where sizeof(long +double) == 8). For arrays of all other types, the +wrappers handle each element individually and so there can be a very +large performance cost. + + + + + + + +Writing new wrappers + + +For the most part the wrappers are straightforward. The only +significant complexity arises with nonblocking receives. + +The issue is that MPI_Irecv +states the recv buffer and returns immediately, giving a handle +(MPI_Request) for the transaction. +Later the user will have to poll for completion with +MPI_Wait etc, and when the +transaction completes successfully, the wrappers have to paint the +recv buffer. But the recv buffer details are not presented to +MPI_Wait -- only the handle is. The +library therefore maintains a shadow table which associates +uncompleted MPI_Requests with the +corresponding buffer address/count/type. When an operation completes, +the table is searched for the associated address/count/type info, and +memory is marked accordingly. + +Access to the table is guarded by a (POSIX pthreads) lock, so as +to make the library thread-safe. + +The table is allocated with +malloc and never +freed, so it will show up in leak +checks. + +Writing new wrappers should be fairly easy. The source file is +auxprogs/libmpiwrap.c. If possible, +find an existing wrapper for a function of similar behaviour to the +one you want to wrap, and use it as a starting point. The wrappers +are organised in sections in the same order as the MPI 1.1 spec, to +aid navigation. When adding a wrapper, remember to comment out the +definition of the default wrapper in the long list of defaults at the +bottom of the file (do not remove it, just comment it out). + + + +What to expect when using the wrappers + +The wrappers should reduce Memcheck's false-error rate on MPI +applications. Because the wrapping is done at the MPI interface, +there will still potentially be a large number of errors reported in +the MPI implementation below the interface. The best you can do is +try to suppress them. + +You may also find that the input-side (buffer +length/definedness) checks find errors in your MPI use, for example +passing too short a buffer to +MPI_Recv. + +Functions which are not wrapped may increase the false +error rate. A possible approach is to run with +MPI_DEBUG containing +warn. This will show you functions +which lack proper wrappers but which are nevertheless used. You can +then write wrappers for them. + + +A known source of potential false errors are the +PMPI_Reduce family of functions, when +using a custom (user-defined) reduction function. In a reduction +operation, each node notionally sends data to a "central point" which +uses the specified reduction function to merge the data items into a +single item. Hence, in general, data is passed between nodes and fed +to the reduction function, but the wrapper library cannot mark the +transferred data as initialised before it is handed to the reduction +function, because all that happens "inside" the +PMPI_Reduce call. As a result you +may see false positives reported in your reduction function. + + + + + + +