From: Bart Van Assche Date: Fri, 27 Jun 2008 14:56:06 +0000 (+0000) Subject: Continued working on the DRD documentation. X-Git-Tag: svn/VALGRIND_3_4_0~427 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=af97950f2366e9fcbe85081332cfcc9882569d5b;p=thirdparty%2Fvalgrind.git Continued working on the DRD documentation. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@8287 --- diff --git a/exp-drd/docs/drd-manual.xml b/exp-drd/docs/drd-manual.xml index 4f5a998e12..75a546faec 100644 --- a/exp-drd/docs/drd-manual.xml +++ b/exp-drd/docs/drd-manual.xml @@ -11,221 +11,491 @@ --tool=exp-drd on the Valgrind command line. + -Introduction +Background DRD is a Valgrind tool for detecting errors in multithreaded C and C++ shared-memory programs. The tool works for any program that uses the -POSIX threading primitives or a threading library built on top of the -POSIX threading primitives. POSIX threads, also known as Pthreads, is -the most widely available threading library on Unix systems. +POSIX threading primitives or that uses threading concepts built on +top of the POSIX threading primitives. + +Multithreaded Programming Paradigms + -Multithreaded programming is error prone. Depending on how multithreading is -expressed in a program, one or more of the following problems can pop up in a -multithreaded program: +For many applications multithreading is a necessity. There are two +reasons why the use of threads may be required: - A data race, i.e. one or more threads access the same memory - location without sufficient locking. + To model concurrent activities. Managing the state of one + activity per thread can be a great simplification compared to + multiplexing the states of multiple activities in a single + thread. This is why most server and embedded software is + multithreaded. - Lock contention: one thread blocks the progress of another thread - by holding a lock too long. + To let computations run on multiple CPU cores + simultaneously. This is why many High Performance Computing + (HPC) applications are multithreaded. + + + + + + +Multithreaded programs can use one or more of the following +paradigms. Which paradigm is appropriate a.o. depends on the +application type -- modeling concurrent activities versus HPC. + + + + Locking. Data that is shared between threads may only be + accessed after a lock is obtained on the mutex associated with + the shared data item. A.o. the POSIX threads library, the Qt + library and the Boost.Thread library support this paradigm + directly. - Deadlock: two or more threads wait for each other indefinitely. + Message passing. No data is shared between threads, but threads + exchange data by passing messages to each other. Well known + implementations of the message passing paradigm are MPI and + CORBA. - False sharing: threads on two different processors access different - variables in the same cache line frequently, causing frequent exchange - of cache lines and slowing down both threads. + Software Transactional Memory (STM). Data is shared between + threads, and shared data is updated via transactions. After each + transaction it is verified whether there were conflicting + transactions. If there were conflicts, the transaction is + aborted, otherwise it is committed. This is a so-called + optimistic approach. There is a prototype of the Intel C + Compiler (icc) available that + supports STM. Research is ongoing about the addition of STM + support to gcc. - Improper use of the POSIX threads API. + Automatic parallelization. A compiler converts a sequential + program into a multithreaded program. The original program may + or may not contain parallelization hints. As an example, + gcc supports OpenMP from + version 4.3.0 on. OpenMP is a set of compiler directives which + tell a compiler how to parallelize a C, C++ or Fortran program. -Although the likelihood of some classes of multithreaded programming -errors can be reduced by a disciplined programming style, a tool for -automatic detection of runtime threading errors is always a great help -when developing multithreaded software. +DRD supports any combination of multithreaded programming paradigms as +long as the implementation of these paradigms is based on the POSIX +threads primitives. DRD however does not support programs that use +e.g. Linux' futexes directly. Attempts to analyze such programs with +DRD will result in false positives. - -The remainder of this manual is organized as follows. In the next -section it is discussed which multithreading programming -paradigms exist. - + -Then there is a -summary of command-line -options. - - -DRD can detect three classes of errors, which are discussed in detail: - + +POSIX Threads Programming Model - - - Data races. - - - Lock contention. - - - - - Misuse of the POSIX threads API. - - - -Finally, there is a section about the current -limitations -of DRD. + +POSIX threads, also known as Pthreads, is the most widely available +threading library on Unix systems. - - - - -Multithreaded Programming Paradigms - -For many applications multithreading is a necessity. There are two -reasons why the use of threads may be required: +The POSIX threads programming model is based on the following abstractions: - To model concurrent activities. Managing the state of one activity - per thread is a simpler programming model than multiplexing the states - of multiple activities in a single thread. This is why most server and - embedded software is multithreaded. + A shared address space. All threads running within the same + process share the same address space. All data, whether shared or + not, is identified by its address. - To let computations run on multiple CPU cores simultaneously. This is - why many High Performance Computing (HPC) applications are multithreaded. + Regular load and store operations, which allow to read values + from or to write values to the memory shared by all threads + running in the same process. + + + + + Atomic store and load-modify-store operations. While these + are not mentioned in the POSIX threads standard, most + microprocessors support atomic memory operations. And some + compilers provide direct support for atomic memory operations + through built-in functions like + e.g. __sync_fetch_and_add(). + + + + + Threads. Each thread represents a concurrent activity. + + + + + Synchronization objects and operations on these synchronization + objects. The following types of synchronization objects are + defined in the POSIX threads standard: mutexes, condition + variables, semaphores, reader-writer locks, barriers and + spinlocks. -Multithreaded programs can be developed by using one or more of the -following paradigms. Which paradigm is appropriate also depends on the -application type -- modeling concurrent activities versus HPC. +Which source code statements generate which memory accesses depends on +the memory model of the programming language being used. There is not +yet a definitive memory model for the C and C++ languagues. For a +draft memory model, see also document +WG21/N2338. + + + +For more information about POSIX threads, see also the Single UNIX +Specification version 3, also known as + +IEEE Std 1003.1. + + + + + + +Multithreaded Programming Problems + + +Depending on how multithreading is expressed in a program, one or more +of the following problems can be triggered by a multithreaded program: - Locking: data that is shared between threads may only be accessed - after a lock is obtained on the mutex(es) associated with the - shared data item. The POSIX threads library, the Qt library - and the Boost.Thread library support this paradigm directly. + Data races. One or more threads access the same memory + location without sufficient locking. + + + + + Lock contention. One thread blocks the progress of one or more other + threads by holding a lock too long. - Message passing: any data that has to be passed from one thread to - another is sent via a message to that other thread. No data is explicitly - shared. Well known implementations of the message passing paradigm are - MPI and CORBA. + Improper use of the POSIX threads API. The most popular POSIX + threads implementation, NPTL, is optimized for speed. The NPTL + will not complain on certain errors, e.g. when a mutex is locked + in one thread and unlocked in another thread. - Software Transactional Memory (STM). Just like the locking - paradigm, with STM data is shared between threads. While the - locking paradigm requires that all associated mutexes are locked - before the shared data is accessed, with the STM paradigm after - each transaction it is verified whether there were conflicting - transactions. If there were conflicts, the transaction is aborted, - otherwise it is committed. This is a so-called optimistic - approach. Not all C, C++ and Fortran compilers already support STM. + Deadlock. A deadlock occurs when two or more threads wait for + each other indefinitely. - Automatic parallelization: a compiler converts a sequential - program into a multithreaded program. The original program can - contain parallelization hints. As an example, gcc version 4.3.0 - and later supports OpenMP, a set of standardized compiler - directives which tell a compiler how to parallelize a C, C++ or - Fortran program. + False sharing. If threads that run on different processor cores + access different variables located in the same cache line + frequently, this will slow down the involved threads a lot due + to frequent exchange of cache lines. -Next to the above paradigms, most CPU instruction sets support atomic -memory accesses. Such operations are the most efficient way to update -a single value on a system with multiple CPU cores. +Although the likelihood of the occurrence of data races can be reduced +by a disciplined programming style, a tool for automatic detection of +data races is a necessity when developing multithreaded software. DRD +can detect these, as well as lock contention and improper use of the +POSIX threads API. + + + + +Data Race Detection by DRD versus Helgrind + -DRD supports any combination of multithreaded programming paradigms -and atomic memory accesses, as long as the libraries that implement -the paradigms are based on POSIX threads. Direct use of e.g. Linux' -futexes is not recognized by DRD and will result in false positives. +Synchronization operations impose an order on interthread memory +accesses. This order is also known as the happens-before relationship. + +A multithreaded program is data-race free if all interthread memory +accesses are ordered by synchronization operations. + + + +A well known way to ensure that a multithreaded program is data-race +free is to ensure that a locking discipline is followed. It is e.g. +possible to associate a mutex with each shared data item, and to hold +a lock on the associated mutex while the shared data is accessed. + + + +All programs that follow a locking discipline are data-race free, but +not all data-race free programs follow a locking discipline. There +exist multithreaded programs where access to shared data is arbitrated +via condition variables, semaphores or barriers. As an example, a +certain class of HPC applications consists of a sequence of +computation steps separated in time by barriers, and where these +barriers are the only means of synchronization. + + + +There exist two different algorithms for verifying the correctness of +multithreaded programs at runtime. The so-called Eraser algorithm +verifies whether all shared memory accesses follow a consistent +locking strategy. And the happens-before data race detectors verify +directly whether all interthread memory accesses are ordered by +synchronization operations. While the happens-before data race +detection algorithm is more complex to implement, and while it is more +sensitive to OS scheduling, it is a general approach that works for +all classes of multithreaded programs. Furthermore, the happens-before +data race detection algorithm does not report any false positives. + + + +DRD is based on the happens-before algorithm, while Helgrind uses a +variant of the Eraser algorithm. + + + + + - + +Using DRD + + Command Line Options -The following end-user options are available: +The following command-line options are available for controlling the +behavior of the DRD tool itself: + + + + + + + Controls whether DRD reports data races + for stack variables. This is disabled by default in order to + accelerate data race detection. Most programs do not share + stack variables over threads. + + + + + + + + + + Print an error message if any mutex or writer lock is held + longer than the specified time (in milliseconds). This option + is intended to allow detection of lock contention. + + + + + + + + + + Controls segment merging. Segment merging is an algorithm to + limit memory usage of the data race detection + algorithm. Disabling segment merging may improve the accuracy + of the so-called 'other segments' displayed in race reports + but can also trigger an out of memory error. + + + + + + + + + + Print an error message if a reader lock is held longer than + the specified time (in milliseconds). This option is intended + to allow detection of lock contention. + + + + + + + + + + Show conflicting segments in race reports. Since this + information can help to find the cause of a data race, this + option is enabled by default. Disabling this option makes the + output of DRD more compact. + + + + + + + + + + Print stack usage at thread exit time. When there is a large + number of threads created in a program it becomes important to + limit the amount of virtual memory allocated for thread + stacks. This option makes it possible to observe the maximum + number of bytes that has been used by the client program for + thread stacks. Note: the DRD tool allocates some temporary + data on the client thread stack. The space needed for this + temporary data is not reported via this option. + + + + + + + + + + Display the names of global, static and stack variables when a + data race is reported. While this information can be very + helpful, by default it is not loaded into memory since for big + programs reading in all debug information at once may cause an + out of memory error. + + + -In addition, the following debugging options are available for -DRD: + +The following options are available for monitoring the behavior of the +process being analyzed with DRD: + + + + + + + + + Trace all load and store activity for the specified + address. This option may be specified more than once. + + + + + + + + + + Trace all barrier activity. + + + + + + + + + + Trace all condition variable activity. + + + + + + + + + + Trace all thread creation and all thread termination events. + + + + + + + + + + Trace all mutex activity. + + + + + + + + + + Trace all reader-writer lock activity. + + + + + + + + + + Trace all semaphore activity. + + + - + - + Data Races - + - + Lock Contention - + - + Misuse of the POSIX threads API - + - + Client Requests @@ -233,10 +503,10 @@ Just as for other Valgrind tools it is possible to pass information from a client program to the DRD tool. - + - + Debugging OpenMP Programs With DRD @@ -244,6 +514,14 @@ Just as for other Valgrind tools it is possible to pass information from a client program to the DRD tool. + +For more information about OpenMP, see also +openmp.org. + + + + + @@ -253,30 +531,74 @@ from a client program to the DRD tool. DRD currently has the following limitations: - DRD has only been tested on the Linux operating - system, and not on any of the other operating systems supported by - Valgrind. + + + DRD has only been tested on the Linux operating system, and not + on any of the other operating systems supported by + Valgrind. + - Of the two POSIX threads implementations for Linux, - only the NPTL (Native POSIX Thread Library) is supported. The older - LinuxThreads library is not supported. + + + Of the two POSIX threads implementations for Linux, only the + NPTL (Native POSIX Thread Library) is supported. The older + LinuxThreads library is not supported. + - When running DRD on a PowerPC CPU, DRD will report - false positives on atomic operations. See also KDE bug 162354. - - DRD, just like memcheck, will refuse to - start on Linux distributions where all symbol information has been - removed from ld.so. This is e.g. the case for openSUSE 10.3 -- see - also - Novell bug 396197. - - If you compile the DRD source code yourself, you need - gcc 3.0 or later. gcc 2.95 is not supported. + + + When running DRD on a PowerPC CPU, DRD will report false + positives on atomic operations. See also Valgrind bug + 162354. + + + + + DRD, just like memcheck, will refuse to start on Linux + distributions where all symbol information has been removed from + ld.so. This is a.o. the case for the PPC editions of openSUSE + and Gentoo. You will have to install the glibc debuginfo package + on these platforms before you can use DRD. See also openSUSE bug + + 396197 and Gentoo bug 214065. + + + + + When DRD prints a report about a data race detected on a stack + variable in a parallel section of an OpenMP program, the report + will contain no information about the context of the data race + location (Allocation context: + unknown). It's not yet clear whether this + behavior is caused by Valgrind or by gcc. + + + + + If you compile the DRD source code yourself, you need gcc 3.0 or + later. gcc 2.95 is not supported. + + + +Feedback + + +If you have any comments, suggestions, feedback or bug reports about +DRD, feel free to either post a message on the Valgrind users mailing +list or to file a bug report. See also &vg-url; for more information about the +Valgrind mailing lists and how to file a bug report. + + + + +