From: Julian Seward Date: Sun, 5 Aug 2012 13:44:15 +0000 (+0000) Subject: Doc updates for 3.8.0. X-Git-Tag: svn/VALGRIND_3_8_0~30 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=9461b8012f205c5dee9cf73ede9a64da8c9bad2f;p=thirdparty%2Fvalgrind.git Doc updates for 3.8.0. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12838 --- diff --git a/AUTHORS b/AUTHORS index 740a724d26..5140aad0db 100644 --- a/AUTHORS +++ b/AUTHORS @@ -65,8 +65,14 @@ Philippe Waroquiers wrote and maintains the embedded GDB server. He also made a bunch of performance and memory-reduction fixes across diverse parts of the system. -Maynard Johnson contributed IBM Power6 and Power7 support, and generally -deals with ppc64-linux issues. +Carl Love and Maynard Johnson contributed IBM Power6 and Power7 +support, and generally deal with ppc{32,64}-linux issues. + +Petar Jovanovic and Dejan Jevtic wrote and maintain the mips32-linux +port. + +Dragos Tatulea modified the arm-android port so it also works on +x86-android. Many, many people sent bug reports, patches, and helpful feedback. diff --git a/README b/README index 4122991f78..9af6be278e 100644 --- a/README +++ b/README @@ -32,7 +32,7 @@ a lesser extent, compiler and basic C libraries. This makes it difficult to make it portable. Nonetheless, it is available for the following platforms: -- x86/Linux +- X86/Linux - AMD64/Linux - PPC32/Linux - PPC64/Linux @@ -40,8 +40,9 @@ platforms: - x86/MacOSX - AMD64/MacOSX - S390X/Linux +- MIPS32/Linux -Note that AMD64 is just another name for x86-64, and Valgrind runs fine +Note that AMD64 is just another name for x86_64, and Valgrind runs fine on Intel processors. Also note that the core of MacOSX is called "Darwin" and this name is used sometimes. diff --git a/README.android b/README.android index fd06c59a73..138f644872 100644 --- a/README.android +++ b/README.android @@ -4,18 +4,16 @@ How to cross-compile for Android. These notes were last updated on This is known to work at least for : ARM: -#### Android 4.0.3 running on a (rooted, AOSP build) Nexus S. Android 4.0.3 running on Motorola Xoom. Android 4.0.3 running on android arm emulator. Android 4.1 running on android emulator. -Android 2.3.4 on Nexus S worked at some time in the past. + Android 2.3.4 on Nexus S worked at some time in the past. x86: -#### Android 4.0.3 running on android x86 emulator. -On android, GDBserver might insert breaks at wrong addresses. +On android-arm, GDBserver might insert breaks at wrong addresses. Feedback on this welcome. Other configurations and toolchains might work, but haven't been tested. @@ -62,14 +60,12 @@ cd /path/to/valgrind/source/tree # Set up toolchain paths. # -For ARM -####### +# For ARM export AR=$NDKROOT/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-ar export LD=$NDKROOT/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-ld export CC=$NDKROOT/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-gcc -For x86 -####### +# For x86 export AR=$NDKROOT/toolchains/x86-4.4.3/prebuilt/linux-x86/bin/i686-android-linux-ar export LD=$NDKROOT/toolchains/x86-4.4.3/prebuilt/linux-x86/bin/i686-android-linux-ld export CC=$NDKROOT/toolchains/x86-4.4.3/prebuilt/linux-x86/bin/i686-android-linux-gcc @@ -101,9 +97,11 @@ CPPFLAGS="--sysroot=$NDKROOT/platforms/android-9/arch-x86 -DANDROID_HARDWARE_$HW # At the end of the configure run, a few lines of details # are printed. Make sure that you see these two lines: +# # For ARM: # Platform variant: android # Primary -DVGPV string: -DVGPV_arm_linux_android=1 +# # For x86: # Platform variant: android # Primary -DVGPV string: -DVGPV_x86_linux_android=1 diff --git a/callgrind/docs/cl-manual.xml b/callgrind/docs/cl-manual.xml index 994ddcbacb..ab8d9bb07f 100644 --- a/callgrind/docs/cl-manual.xml +++ b/callgrind/docs/cl-manual.xml @@ -93,11 +93,11 @@ below describes the features supported in addition to Cachegrind's features. Callgrind's ability to detect function calls and returns depends -on the instruction set of the platform it is run on. It works best -on x86 and amd64, and unfortunately currently does not work so well -on PowerPC code. This is because there are no explicit call or return -instructions in the PowerPC instruction set, so Callgrind has to rely -on heuristics to detect calls and returns. +on the instruction set of the platform it is run on. It works best on +x86 and amd64, and unfortunately currently does not work so well on +PowerPC, ARM, Thumb or MIPS code. This is because there are no explicit +call or return instructions in these instruction sets, so Callgrind +has to rely on heuristics to detect calls and returns. diff --git a/docs/xml/manual-core-adv.xml b/docs/xml/manual-core-adv.xml index d3c43e8e20..f30b7d5f7b 100644 --- a/docs/xml/manual-core-adv.xml +++ b/docs/xml/manual-core-adv.xml @@ -728,10 +728,10 @@ the lower part starts with an x, the upper part starts with an y and has an h before the shadow postfix. -The special presentation of the AVX shadow registers is due -to the fact that GDB retrieves independently the lower and upper half -of the ymm registers. GDB however -does not know that the shadow half registers have to be shown combined. +The special presentation of the AVX shadow registers is due to +the fact that GDB independently retrieves the lower and upper half of +the ymm registers. GDB does not +however know that the shadow half registers have to be shown combined. @@ -1716,7 +1716,8 @@ functions and merely replaced functions malloc etc safely from within wrappers. -The above comments are true for {x86,amd64,ppc32,arm}-linux. On +The above comments are true for {x86,amd64,ppc32,arm,mips32,s390}-linux. +On ppc64-linux function wrapping is more fragile due to the (arguably poorly designed) ppc64-linux ABI. This mandates the use of a shadow stack which tracks entries/exits of both wrapper and replacement @@ -1727,7 +1728,8 @@ finite size, recursion between wrapper/replacement functions is only possible to a limited depth, beyond which Valgrind has to abort the run. This depth is currently 16 calls. -For all platforms ({x86,amd64,ppc32,ppc64,arm}-linux) all the above +For all platforms ({x86,amd64,ppc32,ppc64,arm,mips32,s390}-linux) +all the above comments apply on a per-thread basis. In other words, wrapping is thread-safe: each thread must individually observe the above restrictions, but there is no need for any kind of inter-thread diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml index fe11c17da5..109ad06ad1 100644 --- a/docs/xml/manual-core.xml +++ b/docs/xml/manual-core.xml @@ -1436,18 +1436,19 @@ Massif, Helgrind, DRD), the following options apply. - Valgrind's malloc, realloc, etc, add padding - blocks before and after each block allocated for the client. Such padding - blocks are called redzones. - The default value for the redzone size depends on the tool. - For example, Memcheck adds and protects a minimum of 16 bytes before and - after each block allocated by the client to detect block overrun or - underrun. + Valgrind's malloc, realloc, etc, add + padding blocks before and after each heap block allocated by the + program being run. Such padding blocks are called redzones. The + default value for the redzone size depends on the tool. For + example, Memcheck adds and protects a minimum of 16 bytes before + and after each block allocated by the client. This allows it to + detect block underruns or overruns of up to 16 bytes. - Increasing the redzone size allows to detect more cases of - blocks overrun or underrun. Decreasing the redzone size will - reduce the memory needed by Valgrind but reduces the chance to - detect block overrun/underrun. + Increasing the redzone size makes it possible to detect + overruns of larger distances, but increases the amount of memory + used by Valgrind. Decreasing the redzone size will reduce the + memory needed by Valgrind but also reduces the chances of + detecting over/underruns, so is not recommended. @@ -1463,7 +1464,7 @@ Massif, Helgrind, DRD), the following options apply. These options apply to all tools, as they affect certain obscure workings of the Valgrind core. Most people won't -need to use these. +need to use them. @@ -1514,14 +1515,14 @@ need to use these. takes advantage of this observation, limiting the overhead of checking to code which is likely to be JIT generated. - Some architectures (including ppc32, ppc64 and ARM) require - programs which create code at runtime to flush the instruction - cache in between code generation and first use. Valgrind - observes and honours such instructions. Hence, on ppc32/Linux, - ppc64/Linux and ARM/Linux, Valgrind always provides complete, transparent - support for self-modifying code. It is only on platforms such as - x86/Linux, AMD64/Linux, x86/Darwin and AMD64/Darwin - that you need to use this option. + Some architectures (including ppc32, ppc64, ARM and MIPS) + require programs which create code at runtime to flush the + instruction cache in between code generation and first use. + Valgrind observes and honours such instructions. Hence, on + ppc32/Linux, ppc64/Linux and ARM/Linux, Valgrind always provides + complete, transparent support for self-modifying code. It is + only on platforms such as x86/Linux, AMD64/Linux, x86/Darwin and + AMD64/Darwin that you need to use this option. @@ -1693,33 +1694,39 @@ need to use these. - The controls the - locking mechanism used by Valgrind to serialise thread - execution. The locking mechanism differs in the way the threads - are scheduled, giving a different trade-off between fairness and - performance. For more details about the Valgrind thread - serialisation principle and its impact on performance and thread - scheduling, see . + The option controls + the locking mechanism used by Valgrind to serialise thread + execution. The locking mechanism controls the way the threads + are scheduled, and different settings give different trade-offs + between fairness and performance. For more details about the + Valgrind thread serialisation scheme and its impact on + performance and thread scheduling, see + . The value - activates a fair scheduling. Basically, if multiple threads are + activates a fair scheduler. In short, if multiple threads are ready to run, the threads will be scheduled in a round robin fashion. This mechanism is not available on all platforms or - linux versions. If not available, + Linux versions. If not available, using will cause Valgrind to terminate with an error. + You may find this setting improves overall + responsiveness if you are running an interactive + multithreaded program, for example a web browser, on + Valgrind. The value - activates the fair scheduling if available on the - platform. Otherwise, it will automatically fallback + activates fair scheduling if available on the + platform. Otherwise, it will automatically fall back to . The value activates - a scheduling mechanism which does not guarantee fairness - between threads ready to run. + a scheduler which does not guarantee fairness + between threads ready to run, but which in general gives the + highest performance. @@ -1813,10 +1820,10 @@ need to use these. - When a shared library is loaded, Valgrind examines if some - functions of this library must be replaced or wrapped. - For example, memcheck is replacing the malloc related - functions (malloc, free, calloc, ...). + When a shared library is loaded, Valgrind checks for + functions in the library that must be replaced or wrapped. + For example, Memcheck is replaces all malloc related + functions (malloc, free, calloc, ...) with its own versions. Such replacements are done by default only in shared libraries whose soname matches a predefined soname pattern (e.g. libc.so* on linux). @@ -1826,7 +1833,7 @@ need to use these. to specify one additional synonym pattern, giving flexibility in the replacement. - Currently, this flexibility is only allowed for the + Currently, this flexibility is only allowed for the malloc related functions, using the synonym somalloc. This synonym is usable for all tools doing standard replacement of malloc related functions @@ -1859,6 +1866,14 @@ need to use these. that a NONE pattern will match the main executable and any shared library having no soname. + + + To run a "default" Firefox build for Linux, in which + JEMalloc is linked in to the main executable, + use . + + + @@ -1985,79 +2000,89 @@ sharing will fail. Scheduling and Multi-Thread Performance -A thread executes some code only when it holds the lock. After -executing a certain nr of instructions, the running thread will release -the lock. All threads ready to run will compete to acquire the lock. - -The option controls the locking mechanism -used to serialise the thread execution. - - The default pipe based locking -() is available on all platforms. The -pipe based locking does not guarantee fairness between threads : it is -very well possible that the thread that has just released the lock -gets it back directly. When using the pipe based locking, different -execution of the same multithreaded application might give very different -thread scheduling. - - The futex based locking is available on some platforms. -If available, it is activated by or -. The futex based locking ensures -fairness between threads : if multiple threads are ready to run, the lock -will be given to the thread which first requested the lock. Note that a thread -which is blocked in a system call (e.g. in a blocking read system call) has -not (yet) requested the lock: such a thread requests the lock only after the -system call is finished. - - The fairness of the futex based locking ensures a better reproducibility -of the thread scheduling for different executions of a multithreaded -application. This fairness/better reproducibility is particularly -interesting when using Helgrind or DRD. - - The Valgrind thread serialisation implies that only one thread -is running at a time. On a multiprocessor/multicore system, the +A thread executes code only when it holds the abovementioned +lock. After executing some number of instructions, the running thread +will release the lock. All threads ready to run will then compete to +acquire the lock. + +The option controls the locking mechanism +used to serialise thread execution. + +The default pipe based locking mechanism +() is available on all +platforms. Pipe based locking does not guarantee fairness between +threads: it is quite likely that a thread that has just released the +lock reacquires it immediately, even though other threads are ready to +run. When using pipe based locking, different runs of the same +multithreaded application might give very different thread +scheduling. + +An alternative locking mechanism, based on futexes, is available +on some platforms. If available, it is activated +by or +. Futex based locking ensures +fairness (round-robin scheduling) between threads: if multiple threads +are ready to run, the lock will be given to the thread which first +requested the lock. Note that a thread which is blocked in a system +call (e.g. in a blocking read system call) has not (yet) requested the +lock: such a thread requests the lock only after the system call is +finished. + + The fairness of the futex based locking produces better +reproducibility of thread scheduling for different executions of a +multithreaded application. This better reproducibility is particularly +helpful when using Helgrind or DRD. + +Valgrind's use of thread serialisation implies that only one +thread at a time may run. On a multiprocessor/multicore system, the running thread is assigned to one of the CPUs by the OS kernel -scheduler. When a thread acquires the lock, sometimes the thread will +scheduler. When a thread acquires the lock, sometimes the thread will be assigned to the same CPU as the thread that just released the -lock. Sometimes, the thread will be assigned to another CPU. When -using the pipe based locking, the thread that just acquired the lock -will often be scheduled on the same CPU as the thread that just -released the lock. With the futex based mechanism, the thread that +lock. Sometimes, the thread will be assigned to another CPU. When +using pipe based locking, the thread that just acquired the lock +will usually be scheduled on the same CPU as the thread that just +released the lock. With the futex based mechanism, the thread that just acquired the lock will more often be scheduled on another -CPU. +CPU. -The Valgrind thread serialisation and CPU assignment by the OS -kernel scheduler can badly interact with the CPU frequency scaling -available on many modern CPUs : to decrease power consumption, the +Valgrind's thread serialisation and CPU assignment by the OS +kernel scheduler can interact badly with the CPU frequency scaling +available on many modern CPUs. To decrease power consumption, the frequency of a CPU or core is automatically decreased if the CPU/core has not been used recently. If the OS kernel often assigns the thread -which just acquired the lock to another CPU/core, there is quite some -chance that this CPU/core is currently at a low frequency. The -frequency of this CPU will be increased after some time. However, -during this time, the (only) running thread will have run at a low -frequency. Once this thread has run during some time, it will release -the lock. Another thread will acquire this lock, and might be -scheduled again on another CPU whose clock frequency was decreased in -the meantime. - -The futex based locking causes threads to more often switch of -CPU/core. So, if CPU frequency scaling is activated, the futex based -locking might decrease significantly (up to 50% degradation has been -observed) the performance of a multithreaded app running under -Valgrind. The pipe based locking also somewhat interacts badly with -CPU frequency scaling. Up to 10..20% performance degradation has been -observed. - -To avoid this performance degradation, you can indicate to the -kernel that all CPUs/cores should always run at maximum clock -speed. Depending on your linux distribution, CPU frequency scaling -might be controlled using a graphical interface or using command line +which just acquired the lock to another CPU/core, it is quite likely +that this CPU/core is currently at a low frequency. The frequency of +this CPU will be increased after some time. However, during this +time, the (only) running thread will have run at the low frequency. +Once this thread has run for some time, it will release the lock. +Another thread will acquire this lock, and might be scheduled again on +another CPU whose clock frequency was decreased in the +meantime. + +The futex based locking causes threads to change CPUs/cores more +often. So, if CPU frequency scaling is activated, the futex based +locking might decrease significantly the performance of a +multithreaded app running under Valgrind. Performance losses of up to +50% degradation have been observed, as compared to running on a +machine for which CPU frequency scaling has been disabled. The pipe +based locking locking scheme also interacts badly with CPU frequency +scaling, with performance losses in the range 10..20% having been +observed. + +To avoid such performance degradation, you should indicate to +the kernel that all CPUs/cores should always run at maximum clock +speed. Depending on your Linux distribution, CPU frequency scaling +may be controlled using a graphical interface or using command line such as cpufreq-selector or -cpufreq-set. You might also indicate to the -OS scheduler to run a Valgrind process on a specific (fixed) CPU using the -taskset command : running on a fixed -CPU should ensure that this specific CPU keeps a high frequency clock speed. +cpufreq-set. + + +An alternative way to avoid these problems is to tell the +OS scheduler to tie a Valgrind process to a specific (fixed) CPU using the +taskset command. This should ensure +that the selected CPU does not fall below its maximum frequency +setting so long as any thread of the program has work to do. @@ -2202,11 +2227,10 @@ subject to the following constraints: instructions. If the translator encounters these, Valgrind will generate a SIGILL when the instruction is executed. Apart from that, on x86 and amd64, essentially all instructions are supported, - up to and including SSE4.2 in 64-bit mode and SSSE3 in 32-bit mode. - Some exceptions: SSE4.2 AES instructions are not supported in - 64-bit mode, and 32-bit mode does in fact support the bare minimum - SSE4 instructions to needed to run programs on MacOSX 10.6 on - 32-bit targets. + up to and including AVX abd AES in 64-bit mode and SSSE3 in 32-bit + mode. 32-bit mode does in fact support the bare minimum SSE4 + instructions to needed to run programs on MacOSX 10.6 on 32-bit + targets. @@ -2262,7 +2286,7 @@ subject to the following constraints: large amount of administrative information maintained behind the scenes. Another cause is that Valgrind dynamically translates the original executable. Translated, instrumented code is 12-18 times - larger than the original so you can easily end up with 100+ MB of + larger than the original so you can easily end up with 150+ MB of translations when running (eg) a web browser. diff --git a/docs/xml/vg-entities.xml b/docs/xml/vg-entities.xml index 1b5adb1a15..d5532d6e7f 100644 --- a/docs/xml/vg-entities.xml +++ b/docs/xml/vg-entities.xml @@ -2,12 +2,12 @@ - + - - + +