From: Julian Seward Date: Tue, 15 Nov 2005 19:51:04 +0000 (+0000) Subject: Update manual for 3.1.0, sections <= manual-core.html. X-Git-Tag: svn/VALGRIND_3_1_0~97 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=86c998a8def23ecdf279dd05baf3565845313b31;p=thirdparty%2Fvalgrind.git Update manual for 3.1.0, sections <= manual-core.html. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@5135 --- diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml index 27845b0634..a493240efd 100644 --- a/docs/xml/manual-core.xml +++ b/docs/xml/manual-core.xml @@ -102,9 +102,8 @@ with C++, is -fno-inline. That makes it easier to see the function-call chain, which can help reduce confusion when navigating around large C++ apps. For whatever it's worth, debugging OpenOffice.org with Memcheck is a -bit easier when using this flag. - -You don't have to do this, but doing so helps Valgrind +bit easier when using this flag. +You don't have to do this, but doing so helps Valgrind produce more accurate and less confusing error reports. Chances are you're set up like this already, if you intended to debug your program with GNU gdb, or some other debugger. @@ -169,7 +168,9 @@ whatever reason. the commentary, so as to avoid flooding you with information of secondary importance. If you want more information about what is happening, re-run, passing the --v flag to Valgrind. +-v flag to Valgrind. +A second -v gives yet more detail. + You can direct the commentary to three different places: @@ -183,6 +184,11 @@ places: want to send it to some other file descriptor, for example number 9, you can specify --log-fd=9. + This is the simplest and most common arrangement, but can + cause problems when Valgrinding entire trees of + processes which expect specific file descriptors, particularly + stdin/stdout/stderr, to be available for their own use. + If you want to specify precisely the file name to use, without the trailing - .12345part, you can instead use + .12345 part, you can instead use --log-file-exactly=filename. You can also use the --log-file-qualifier=<VAR> option - to specify the filename via the environment variable - $VAR. This is rarely needed, but + to modify the filename via according to the environment variable + VAR. This is rarely needed, but very useful in certain circumstances (eg. when running MPI programs). + In this case, the trailing .12345 + part is replaced by the contents of + $VAR. The idea is that you + specify a variable which will be set differently for each process + in the job, for example BPROC_RANK + or whatever is applicable in your MPI setup. @@ -347,7 +359,7 @@ problem. The process of detecting duplicate errors is quite an expensive one and can become a significant performance overhead if your program generates huge quantities of errors. To avoid -serious problems here, Valgrind will simply stop collecting +serious problems, Valgrind will simply stop collecting errors after 1000 different errors have been seen, or 100000 errors in total have been seen. In this situation you might as well stop your program and fix it, because Valgrind won't tell you @@ -359,7 +371,7 @@ if necessary. To avoid this cutoff you can use the --error-limit=no flag. Then Valgrind will always show errors, regardless of how many there -are. Use this flag carefully, since it may have a dire effect on +are. Use this flag carefully, since it may have a bad effect on performance. @@ -626,6 +638,17 @@ categories. Repeating the flag increases the verbosity level. + + -d + Emit information for debugging Valgrind itself. This + is usually only of interest to the Valgrind developers. + Repeating the flag produces more detailed output. If you + want to send us a bug report, a log of the output + generated by -v -v -d -d + will make your report more useful. + + + --trace-children=no [default] @@ -684,10 +707,12 @@ categories. --log-file-qualifer=<VAR> - Specifies that Valgrind should send all of its messages - to the file named by the environment variable - $VAR. This is useful when running + When used in conjunction with + --log-file=, causes the log file + name to be qualified using the contents of environment variable + VAR. This is useful when running MPI programs. + For further details, see . @@ -737,7 +762,7 @@ errors, e.g. Memcheck, but not Cachegrind. --demangle=yes [default] Disable/enable automatic demangling (decoding) of C++ names. Enabled by default. When enabled, Valgrind will - attempt to translate encoded C++ procedure names back to + attempt to translate encoded C++ names back to something approaching the original. The demangler handles symbols mangled by g++ versions 2.X and 3.X. @@ -751,8 +776,8 @@ errors, e.g. Memcheck, but not Cachegrind. - --num-callers=<number> [default=4] - By default, Valgrind shows four levels of function call + --num-callers=<number> [default=12] + By default, Valgrind shows twelve levels of function call names to help you identify program locations. You can change that number with this option. This can help in determining the program's location in deeply-nested call chains. Note @@ -868,7 +893,7 @@ errors, e.g. Memcheck, but not Cachegrind. this situation. 1 May 2002: this is a historical relic which could be - easily fixed if it gets in your way. Mail me and complain if + easily fixed if it gets in your way. Mail us and complain if this is a problem for you. Nov 2002: if you're sending output to a logfile or to a network socket, I guess this option doesn't make any sense. Caveat emptor. @@ -1150,12 +1175,11 @@ don't understand Valgrind has a trapdoor mechanism via which the client program can pass all manner of requests and queries to Valgrind and the current tool. Internally, this is used extensively to -make malloc, free, signals, threads, etc, work, although you -don't see that. +make malloc, free, etc, work, although you don't see that. For your convenience, a subset of these so-called client requests is provided to allow you to tell Valgrind facts about -the behaviour of your program, and conversely to make queries. +the behaviour of your program, and also to make queries. In particular, your program can tell Valgrind about changes in memory range permissions that Valgrind would not otherwise know about, and so allows clients to get Valgrind to do arbitrary @@ -1177,7 +1201,9 @@ are not forced to run your program under Valgrind just because you use the macros in this file. Also, you are not required to link your program with any extra supporting libraries. -The code left in your binary has negligible performance impact. +The code left in your binary has negligible performance impact: +on x86, amd64 and ppc32, the overhead is 6 simple integer instructions +and is probably undetectable except in tight loops. However, if you really wish to compile out the client requests, you can compile with -DNVALGRIND (analogous to -DNDEBUG's effect on @@ -1187,7 +1213,7 @@ compile with -DNVALGRIND (analogous to You are encouraged to copy the valgrind/*.h headers into your project's include directory, so your program doesn't have a compile-time dependency on Valgrind being installed. The Valgrind headers, -unlike the rest of the code, is under a BSD-style license so you may include +unlike the rest of the code, are under a BSD-style license so you may include them without worrying about license incompatibility. Here is a brief description of the macros available in @@ -1201,8 +1227,8 @@ tool-specific macros). RUNNING_ON_VALGRIND: returns 1 if running on Valgrind, 0 if running on the - real CPU. If you are running Valgrind under itself, it will return the - number of layers of Valgrind emulation we're running under. + real CPU. If you are running Valgrind on itself, it will return the + number of layers of Valgrind emulation we're running on. @@ -1215,19 +1241,19 @@ tool-specific macros). dynamic code generation system. After this call, attempts to execute code in the invalidated address range will cause Valgrind to make new translations of that code, which is - probably the semantics you want. Note that this is - implemented naively, and involves checking all 200191 entries - in the translation table to see if any of them overlap the - specified address range. So try not to call it often, or - performance will nosedive. Note that you can be clever about + probably the semantics you want. Note that code invalidations + are expensive because finding all the relevant translations + quickly is very difficult. So try not to call it often. + Note that you can be clever about this: you only need to call it when an area which previously contained code is overwritten with new code. You can choose - to write coode into fresh memory, and just call this + to write code into fresh memory, and just call this occasionally to discard large chunks of old code all at once. - - Warning: minimally tested, - especially for tools other than Memcheck. + + Alternatively, for transparent self-modifying-code support, + use--smc-check=all. + @@ -1415,27 +1441,14 @@ of thread executions than when run natively. This in itself may cause your program to behave differently if you have some kind of concurrency, critical race, locking, or similar, bugs. - - Handling of Signals @@ -1511,7 +1472,8 @@ able to cope with any valid use of signals. If you're using signals in clever ways (for example, catching SIGSEGV, modifying page state and restarting the instruction), you're probably relying on precise exceptions. In this case, you will need -to use --single-step=yes. +to use --vex-iropt-precise-memory-exns=yes. + If your program dies as a result of a fatal core-dumping signal, Valgrind will generate its own core file @@ -1532,22 +1494,21 @@ similar. (Note: it will not generate a core if your core dump size limit is make, make install mechanism, and we have attempted to ensure that it works on machines with kernel 2.4 or 2.6 and glibc -2.2.X, 2.3.X, 2.4.X. +2.2.X or 2.3.X. You may then want to run the regression tests +with make regtest. + -There are two options (in addition to the usual +There are three options (in addition to the usual --prefix= which affect how Valgrind is built: - --enable-pie - PIE stands for "position-independent executable". - PIE allows Valgrind to place itself as high as possible in memory, - giving your program as much address space as possible. It also allows - Valgrind to run under itself. If PIE is disabled, Valgrind loads at a - default address which is suitable for most systems. This is also - useful for debugging Valgrind itself. It's not on by default because - it caused problems for some people. Note that not all toolchaines - support PIEs, you need fairly recent version of the compiler, linker, - etc. + --enable-inner + This builds Valgrind with some special magic hacks which + make it possible to run it on a standard build of Valgrind + (what the developers call "self-hosting"). Ordinarily you + should not use this flag as various kinds of safety checks + are disabled. + @@ -1557,6 +1518,14 @@ ensure that it works on machines with kernel 2.4 or 2.6 and glibc if TLS is supported and enable this option. Sometimes it cannot test for TLS, so this option allows you to override the automatic test. + + + --with-vex= + Specifies the path to the underlying VEX dynamic-translation + library. By default this is taken to be in the VEX directory + off the root of the source tree. + + @@ -1589,9 +1558,9 @@ know if you have build problems. limitations of Valgrind, and for a list of programs which are known not to work on it. -The translator/instrumentor has a lot of assertions in it. -They are permanently enabled, and I have no plans to disable -them. If one of these breaks, please mail us! +All parts of the system make heavy use of assertions and +internal self-checks. They are permanently enabled, and we have no +plans to disable them. If one of them breaks, please mail us! If you get an assertion failure on the expression blockSane(ch) in @@ -1613,24 +1582,31 @@ more advice about common problems, crashes, etc. Limitations -The following list of limitations seems depressingly long. +The following list of limitations seems long. However, most programs actually work fine. -Valgrind will run x86/Linux ELF dynamically linked -binaries, on a kernel 2.4.X or 2.6.X system, subject to +Valgrind will run Linux ELF +binaries, on a kernel 2.4.X or 2.6.X system, on the x86, amd64 +and ppc32 architectures, subject to the following constraints: - On x86 and AMD64, there is no support for 3DNow! instructions. If + On x86 and amd64, there is no support for 3DNow! instructions. If the translator encounters these, Valgrind will generate a SIGILL when the - instruction is executed. The same is true for Intel's SSE3 SIMD - instructions. + instruction is executed. Apart from that, on x86 and amd64, + essentially all instructions are supported, up to and including SSE2. + Version 3.1.0 includes limited support for SSE3 on x86. This could be + improved + if necessary. + On ppc32, almost all integer, floating point and Altivec instructions + are supported. - Atomic instruction sequences are not supported, which will affect - any use of synchronization objects being shared between processes. They + Atomic instruction sequences are not properly supported, in the + sense that their atomicity is not preserved. This will affect + any use of synchronization via memory shared between processes. They will appear to work, but fail sporadically. @@ -1639,7 +1615,8 @@ the following constraints: than using malloc/new/free/delete, it should still work, but Valgrind's error checking won't be so effective. If you describe your program's memory management scheme using "client - requests" (Section 3.7 of this manual), Memcheck can do + requests" (see ), + Memcheck can do better. Nevertheless, using malloc/new and free/delete is still the best approach. @@ -1668,8 +1645,8 @@ the following constraints: amount of administrative information maintained behind the scenes. Another cause is that Valgrind dynamically translates the original executable. Translated, instrumented code is - 14-16 times larger than the original (!) so you can easily end - up with 30+ MB of translations when running (eg) a web + 12-18 times larger than the original so you can easily end + up with 50+ MB of translations when running (eg) a web browser. @@ -1690,7 +1667,8 @@ the following constraints: Precision: There is no support for 80 bit arithmetic. - Internally, Valgrind represents all FP numbers in 64 bits, and so + Internally, Valgrind represents all such "long double" + numbers in 64 bits, and so there may be some differences in results. Whether or not this is critical remains to be seen. Note, the x86/amd64 fldt/fstpt instructions (read/write 80-bit numbers) are correctly simulated, @@ -1764,213 +1742,17 @@ the following constraints: emacs starts up but immediately concludes it is out of - memory and aborts. Emacs has it's own memory-management - scheme, but I don't understand why this should interact so - badly with Valgrind. Emacs works fine if you build it to use + memory and aborts. It may be that Memcheck does not provide + a good enough emulation of the + mallinfo function. + Emacs works fine if you build it to use the standard malloc/free routines. - - Known platform-specific limitations, as of release 2.4.0: - - - (none) - - - - - -How It Works -- A Rough Overview - -Some gory details, for those with a passion for gory -details. You don't need to read this section if all you want to -do is use Valgrind. What follows is an outline of the machinery. -It is out of date, as the JITter has been completey rewritten in -version 3.0, and so it works quite differently. -A more detailed (and even more out of date) description is to be -found . - - -Getting started - -Valgrind is compiled into two executables: -valgrind, and -stage2. -valgrind is a statically-linked executable -which loads at the normal address (0x8048000). -stage2 is a normal dynamically-linked -executable; it is either linked to load at a high address (0xb8000000) or is -a Position Independent Executable. - -Valgrind (also known as stage1): - - Decides where to load stage2. - Pads the address space with - mmap, leaving holes only where stage2 - should load. - Loads stage2 in the same manner as - execve() would, but - "manually". - Jumps to the start of stage2. - - -Once stage2 is loaded, it uses -dlopen() to load the tool, unmaps all -traces of stage1, initializes the client's state, and starts the synthetic -CPU. - -Each thread runs in its own kernel thread, and loops in -VG_(schedule) as it runs. When the thread -terminates, VG_(schedule) returns. Once -all the threads have terminated, Valgrind as a whole exits. - -Each thread also has two stacks. One is the client's stack, which -is manipulated with the client's instructions. The other is -Valgrind's internal stack, which is used by all Valgrind's code on -behalf of that thread. It is important to not get them confused. - - - - - -The translation/instrumentation engine - -Valgrind does not directly run any of the original -program's code. Only instrumented translations are run. -Valgrind maintains a translation table, which allows it to find -the translation quickly for any branch target (code address). If -no translation has yet been made, the translator - a just-in-time -translator - is summoned. This makes an instrumented -translation, which is added to the collection of translations. -Subsequent jumps to that address will use this -translation. - -Valgrind no longer directly supports detection of -self-modifying code. Such checking is expensive, and in practice -(fortunately) almost no applications need it. However, to help -people who are debugging dynamic code generation systems, there -is a Client Request (basically a macro you can put in your -program) which directs Valgrind to discard translations in a -given address range. So Valgrind can still work in this -situation provided the client tells it when code has become -out-of-date and needs to be retranslated. - -The JITter translates basic blocks -- blocks of -straight-line-code -- as single entities. To minimise the -considerable difficulties of dealing with the x86 instruction -set, x86 instructions are first translated to a RISC-like -intermediate code, similar to sparc code, but with an infinite -number of virtual integer registers. Initially each insn is -translated seperately, and there is no attempt at -instrumentation. - -The intermediate code is improved, mostly so as to try and -cache the simulated machine's registers in the real machine's -registers over several simulated instructions. This is often -very effective. Also, we try to remove redundant updates of the -simulated machines's condition-code register. - -The intermediate code is then instrumented, giving more -intermediate code. There are a few extra intermediate-code -operations to support instrumentation; it is all refreshingly -simple. After instrumentation there is a cleanup pass to remove -redundant value checks. - -This gives instrumented intermediate code which mentions -arbitrary numbers of virtual registers. A linear-scan register -allocator is used to assign real registers and possibly generate -spill code. All of this is still phrased in terms of the -intermediate code. This machinery is inspired by the work of -Reuben Thomas (Mite). - -Then, and only then, is the final x86 code emitted. The -intermediate code is carefully designed so that x86 code can be -generated from it without need for spare registers or other -inconveniences. - -The translations are managed using a traditional LRU-based -caching scheme. The translation cache has a default size of -about 14MB. - - - - - -Tracking the Status of Memory - -Each byte in the process' address space has nine bits -associated with it: one A bit and eight V bits. The A and V bits -for each byte are stored using a sparse array, which flexibly and -efficiently covers arbitrary parts of the 32-bit address space -without imposing significant space or performance overheads for -the parts of the address space never visited. The scheme used, -and speedup hacks, are described in detail at the top of the -source file coregrind/vg_memory.c, so you -should read that for the gory details. - - - - - - -System calls - -All system calls are intercepted. The memory status map is -consulted before and updated after each call. It's all rather -tiresome. See coregrind/vg_syscalls.c for -details. - - - - - -Signals - -All signal-related system calls are intercepted. If the client -program is trying to set a signal handler, Valgrind makes a note of the -handler address and which signal it is for. Valgrind then arranges for the -same signal to be delivered to its own handler. - -When such a signal arrives, Valgrind's own handler catches -it, and notes the fact. At a convenient safe point in execution, -Valgrind builds a signal delivery frame on the client's stack and -runs its handler. If the handler longjmp()s, there is nothing -more to be said. If the handler returns, Valgrind notices this, -zaps the delivery frame, and carries on where it left off before -delivering the signal. - -The purpose of this nonsense is that setting signal -handlers essentially amounts to giving callback addresses to the -Linux kernel. We can't allow this to happen, because if it did, -signal handlers would run on the real CPU, not the simulated one. -This means the checking machinery would not operate during the -handler run, and, worse, memory permissions maps would not be -updated, which could cause spurious error reports once the -handler had returned. - -An even worse thing would happen if the signal handler -longjmp'd rather than returned: Valgrind would completely lose -control of the client program. - -Upshot: we can't allow the client to install signal -handlers directly. Instead, Valgrind must catch, on behalf of -the client, any signal the client asks to catch, and must -delivery it to the client on the simulated CPU, not the real one. -This involves considerable gruesome fakery; see -coregrind/vg_signals.c for details. - - - - - - - An Example Run @@ -1993,7 +1775,6 @@ sewardj@phoenix:~/newmat10$ ==25832== reading syms from /mnt/pima/jrs/Inst/lib/libstdc++.so.3 ==25832== reading syms from /home/sewardj/Valgrind/valgrind.so ==25832== reading syms from /proc/self/exe -==25832== loaded 5950 symbols, 142333 line number locations ==25832== ==25832== Invalid read of size 4 ==25832== at 0x8048724: _ZN10BandMatrix6ReSizeEiii (bogon.cpp:45) @@ -2023,29 +1804,29 @@ shipped. - More than 50 errors detected. + More than 100 errors detected. Subsequent errors will still be recorded, but in less detail than before. - After 50 different errors have been shown, Valgrind + After 100 different errors have been shown, Valgrind becomes more conservative about collecting them. It then requires only the program counters in the top two stack frames to match when deciding whether or not two errors are really the same one. Prior to this point, the PCs in the top four frames are required to match. This hack has the effect of - slowing down the appearance of new errors after the first 50. - The 50 constant can be changed by recompiling Valgrind. + slowing down the appearance of new errors after the first 100. + The 100 constant can be changed by recompiling Valgrind. - More than 300 errors detected. I'm not + More than 1000 errors detected. I'm not reporting any more. Final error counts may be inaccurate. Go fix your program! - After 300 different errors have been detected, Valgrind + After 1000 different errors have been detected, Valgrind ignores any more. It seems unlikely that collecting even more different ones would be of practical help to anybody, and it avoids the danger that Valgrind spends more and more of its time comparing new errors against an ever-growing collection. - As above, the 300 number is a compile-time constant. + As above, the 1000 number is a compile-time constant. diff --git a/docs/xml/manual-intro.xml b/docs/xml/manual-intro.xml index b49f3b304f..e7c716cbf0 100644 --- a/docs/xml/manual-intro.xml +++ b/docs/xml/manual-intro.xml @@ -8,8 +8,9 @@ An Overview of Valgrind -Valgrind is a flexible system for debugging and profiling -Linux executables. The system consists of a core, which +Valgrind is a suite of simulation-based debugging and +profiling tools for programs running on Linux (x86, amd64 and ppc32). +The system consists of a core, which provides a synthetic CPU in software, and a series of tools, each of which performs some kind of debugging, profiling, or similar task. The architecture is modular, so that new tools can @@ -56,9 +57,7 @@ summary, these are: Overlapping src and dst pointers in memcpy() and related - functions Some misuses of - the POSIX pthreads API - + functions Problems like these can be difficult to find by other @@ -91,6 +90,10 @@ summary, these are: stellar, it's quite usable, and it seems plausible to run KDE for long periods at a time like this, collecting up all the addressing errors that appear. + + NOTE: Addrcheck is not available in Valgrind 3.1.X. We hope + to reinstate its functionality in later releases. For now, use + Memcheck instead. @@ -135,27 +138,34 @@ summary, these are: Helgrind has been hacked on extensively by Jeremy Fitzhardinge, and we have him to thank for getting it to a releasable state. + + NOTE: Helgrind is, unfortunately, not available in Valgrind 3.1.X, + as a result of threading changes that happened in the 2.4.0 release. + We hope to reinstate its functionality in a future 3.2.0 release. -A number of minor tools (Corecheck, -Lackey and Nulgrind) are +A couple of minor tools (Lackey +and Nulgrind) are also supplied. These aren't particularly useful -- they exist to illustrate how to create simple tools and to help the valgrind -developers in various ways. +developers in various ways. Nulgrind is the null tool -- it adds +no instrumentation. Lackey is a simple example tool +which counts instructions, memory accesses, and the number of +integer and floating point operations your program does. Valgrind is closely tied to details of the CPU and operating system, and to a lesser extent, the compiler and basic C libraries. -Nonetheless, as of version 3.0.0 it supports several platforms: x86/Linux -(mature), AMD64/Linux (immature but works well), and PPC32/Linux (very -preliminary). Valgrind uses the standard Unix +Nonetheless, as of version 3.1.0 it supports several platforms: x86/Linux +(mature), AMD64/Linux (maturing), and PPC32/Linux (immature but works well). +Valgrind uses the standard Unix ./configure, make, make install mechanism, and we have attempted to ensure that it works on machines with kernel 2.4 or 2.6 and glibc -2.2.X--2.4.X. +2.2.X--2.3.X. Valgrind is licensed under the , version 2. The valgrind/*.h headers that diff --git a/docs/xml/quick-start-guide.xml b/docs/xml/quick-start-guide.xml index ea0d74ffe4..d652c23545 100644 --- a/docs/xml/quick-start-guide.xml +++ b/docs/xml/quick-start-guide.xml @@ -40,10 +40,13 @@ right for earlier versions. Preparing your program Compile your program with -g to include debugging information so that Memcheck's error messages include exact line -numbers. Using -O0 is also a good idea; -with -O1 line numbers in error messages can -be inaccurate, and with -O2 Memcheck -occasionally reports undefined error messages incorrectly. +numbers. Using -O0 is also a good idea, if +you can tolerate the slowdown. With -O1 +line numbers in error messages can inaccurate, although generally speaking +Memchecking code compiled at -O1 works +fairly well. Use of -O2 and above is +not recommended as Memcheck occasionally reports uninitialised-value +errors which don't really exist. - - + +