From: Julian Seward Date: Wed, 13 Oct 2010 14:06:00 +0000 (+0000) Subject: Add documentation for exp-dhat. X-Git-Tag: svn/VALGRIND_3_6_0~19 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=32dd8e857c6f3c01c51acb6fda8a2045110b7c8e;p=thirdparty%2Fvalgrind.git Add documentation for exp-dhat. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@11438 --- diff --git a/docs/xml/manual.xml b/docs/xml/manual.xml index 5a01af2a73..b1ec6d5c3f 100644 --- a/docs/xml/manual.xml +++ b/docs/xml/manual.xml @@ -36,6 +36,8 @@ xmlns:xi="http://www.w3.org/2001/XInclude" /> + + %vg-entities; ]> + + + + DHAT: a dynamic heap analysis tool + +To use this tool, you must specify + on the Valgrind +command line. + + + + +Overview + +DHAT is a tool for examining how programs use their heap +allocations. + +It tracks the allocated blocks, and inspects every memory access +to find which block, if any, it is to. The following data is +collected and presented per allocation point (allocation +stack): + + + Total allocation (number of bytes and + blocks) + + maximum live volume (number of bytes and + blocks) + + average block lifetime (number of instructions + between allocation and freeing) + + average number of reads and writes to each byte in + the block ("access ratios") + + for allocation points which always allocate blocks + only of one size, and that size is 4096 bytes or less: counts + showing how often each byte offset inside the block is + accessed. + + +Using these statistics it is possible to identify allocation +points with the following characteristics: + + + + potential process-lifetime leaks: blocks allocated + by the point just accumulate, and are freed only at the end of the + run. + + excessive turnover: points which chew through a lot + of heap, even if it is not held onto for very long + + excessively transient: points which allocate very + short lived blocks + + useless or underused allocations: blocks which are + allocated but not completely filled in, or are filled in but not + subsequently read. + + blocks with inefficient layout -- areas never + accessed, or with hot fields scattered throughout the + block. + + +As with the Massif heap profiler, DHAT measures program progress +by counting instructions, and so presents all age/time related figures +as instruction counts. This sounds a little odd at first, but it +makes runs repeatable in a way which is not possible if CPU time is +used. + + + + + + + +Understanding DHAT's output + + +DHAT provides a lot of useful information on dynamic heap usage. +Most of the art of using it is in interpretation of the resulting +numbers. That is best illustrated via a set of examples. + + + +Interpreting the max-live, tot-alloc and deaths fields + +A simple example + + + +Over the entire run of the program, this stack (allocation +point) allocated 29,520 blocks in total, containing 1,904,700 bytes in +total. By looking at the max-live data, we see that not many blocks +were simultaneously live, though: at the peak, there were 63,490 +allocated bytes in 984 blocks. This tells us that the program is +steadily freeing such blocks as it runs, rather than hanging on to all +of them until the end and freeing them all. + +The deaths entry tells us that 29,520 blocks allocated by this stack +died (were freed) during the run of the program. Since 29,520 is +also the number of blocks allocated in total, that tells us that +all allocated blocks were freed by the end of the program. + +It also tells us that the average age at death was 22,227,424 +instructions. From the summary statistics we see that the program ran +for 1,045,339,534 instructions, and so the average age at death is +about 2% of the program's total run time. + +Example of a potential process-lifetime leak + +This next example (from a different program than the above) +shows a potential process lifetime leak. A process lifetime leak +occurs when a program keeps allocating data, but only frees the +data just before it exits. Hence the program's heap grows constantly +in size, yet Memcheck reports no leak, because the program has +freed up everything at exit. This is particularly a hazard for +long running programs. + + + +There are two tell-tale signs that this might be a +process-lifetime leak. Firstly, the max-live and tot-alloc numbers +are identical. The only way that can happen is if these blocks are +all allocated and then all deallocated. + +Secondly, the average age at death (300 million insns) is 71% of +the total program lifetime (419 million insns), hence this is not a +transient allocation-free spike -- rather, it is spread out over a +large part of the entire run. One interpretation is, roughly, that +all 254 blocks were allocated in the first half of the run, held onto +for the second half, and then freed just before exit. + + + + + +Interpreting the acc-ratios fields + + +A fairly harmless allocation point record + + + +The acc-ratios field tells us that each byte in the blocks +allocated here is read an average of 2.13 times before the block is +deallocated. Given that the blocks have an average age at death of +34,611,026, that's one read per block per approximately every 15 +million instructions. So from that standpoint the blocks aren't +"working" very hard. + +More interesting is the write ratio: each byte is written an +average of 0.91 times. This tells us that some parts of the allocated +blocks are never written, at least 9% on average. To completely +initialise the block would require writing each byte at least once, +and that would give a write ratio of 1.0. The fact that some block +areas are evidently unused might point to data alignment holes or +other layout inefficiencies. + +Well, at least all the blocks are freed (24,240 allocations, +24,240 deaths). + +If all the blocks had been the same size, DHAT would also show +the access counts by block offset, so we could see where exactly these +unused areas are. However, that isn't the case: the blocks have +varying sizes, so DHAT can't perform such an analysis. We can see +that they must have varying sizes since the average block size, 61.13, +isn't a whole number. + + +A more suspicious looking example + + + +Here, both the read and write access ratios are zero. Hence +this point is allocating blocks which are never used, neither read nor +written. Indeed, they are also not freed ("deaths: none") and are +simply leaked. So, here is 180k of completely useless allocation that +could be removed. + +Re-running with Memcheck does indeed report the same leak. What +DHAT can tell us, that Memcheck can't, is that not only are the blocks +leaked, they are also never used. + +Another suspicious example + +Here's one where blocks are allocated, written to, +but never read from. We see this immediately from the zero read +access ratio. They do get freed, though: + + + +In the previous two examples, it is easy to see blocks that are +never written to, or never read from, or some combination of both. +Unfortunately, in C++ code, the situation is less clear. That's +because an object's constructor will write to the underlying block, +and its destructor will read from it. So the block's read and write +ratios will be non-zero even if the object, once constructed, is never +used, but only eventually destructed. + +Really, what we want is to measure only memory accesses in +between the end of an object's construction and the start of its +destruction. Unfortunately I do not know of a reliable way to +determine when those transitions are made. + + + + + +Interpreting "Aggregated access counts by offset" data + +For allocation points that always allocate blocks of the same +size, and which are 4096 bytes or smaller, DHAT counts accesses +per offset, for example: + + + +This is fairly typical, for C++ code running on a 64-bit +platform. Here, we have aggregated access statistics for 5668 blocks, +all of size 56 bytes. Each byte has been accessed at least 5668 +times, except for offsets 12--15, 36--39 and 52--55. These are likely +to be alignment holes. + +Careful interpretation of the numbers reveals useful information. +Groups of N consecutive identical numbers that begin at an N-aligned +offset, for N being 2, 4 or 8, are likely to indicate an N-byte object +in the structure at that point. For example, the first 32 bytes of +this object are likely to have the layout + + + +As a counterexample, it's also clear that, whatever is at offset 32, +it is not a 32-bit value. That's because the last number of the group +(37422) is not the same as the first three (18883 18883 18883). + +This example leads one to enquire (by reading the source code) +whether the zeroes at 12--15 and 52--55 are alignment holes, and +whether 48--51 is indeed a 32-bit type. If so, it might be possible +to place what's at 48--51 at 12--15 instead, which would reduce +the object size from 56 to 48 bytes. + +Bear in mind that the above inferences are all only "maybes". That's +because they are based on dynamic data, not static analysis of the +object layout. For example, the zeroes might not be alignment +holes, but rather just parts of the structure which were not used +at all for this particular run. Experience shows that's unlikely +to be the case, but it could happen. + + + + + + + + + + + + +DHAT Command-line Options + +DHAT-specific command-line options are: + + + + + + + + + + At the end of the run, DHAT sorts the accumulated + allocation points according to some metric, and shows the + highest scoring entries. --show-top-n + controls how many entries are shown. The default of 10 is + quite small. For realistic applications you will probably need + to set it much higher, at least several hundred. + + + + + + + + + At the end of the run, DHAT sorts the accumulated + allocation points according to some metric, and shows the + highest scoring entries. --sort-by + selects the metric used for sorting: + max-bytes-live maximum live bytes [default] + tot-bytes-allocd total allocation (turnover) + max-blocks-live maximum live blocks + This controls the order in which allocation points are + displayed. You can choose to look at allocation points with + the highest maximum liveness, or the highest total turnover, or + by the highest number of live blocks. These give usefully + different pictures of program behaviour. For example, sorting + by maximum live blocks tends to show up allocation points + creating large numbers of small objects. + + + + + +One important point to note is that each allocation stack counts +as a seperate allocation point. Because stacks by default have 12 +frames, this tends to spread data out over multiple allocation points. +You may want to use the flag --num-callers=4 or some such small +number, to reduce the spreading. + + + + + +