From: Nicholas Nethercote Date: Thu, 7 May 2020 01:50:55 +0000 (+1000) Subject: Fix reads and writes counts in DHAT. X-Git-Tag: VALGRIND_3_16_0~32 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=968bddcd4b4a6565fcd120523c7b33bfd77b5c65;p=thirdparty%2Fvalgrind.git Fix reads and writes counts in DHAT. If you do `malloc(100)` followed by `realloc(200)`, DHAT now adds 100 bytes to the read and write counts for the implicit `memcpy`. This gives more reasonable results. I have long been surprised by low writes-per-byte values of around 0.35 for vectors that are grown by doubling. Counting the implicit `memcpy` increases those numbers to well above 0.5, which is what you'd expect. The commit also adds a section to the DHAT docs about `realloc`, because there is some non-obvious behaviour, some of which confused me just a couple of days ago. --- diff --git a/NEWS b/NEWS index 5179c7f894..53908689a6 100644 --- a/NEWS +++ b/NEWS @@ -33,6 +33,10 @@ support for X86/macOS 10.13, AMD64/macOS 10.13 and nanoMIPS/Linux. * DHAT: + - The implicit memcpy done by each call to realloc now counts towards the + read and write counts of resized heap blocks, making those counts higher + and more accurate. + * Cachegrind: - cg_annotate's --auto and --show-percs options now default to 'yes', because diff --git a/dhat/dh_main.c b/dhat/dh_main.c index e13d8b8e3b..565b0ae2d8 100644 --- a/dhat/dh_main.c +++ b/dhat/dh_main.c @@ -62,7 +62,7 @@ static ULong g_max_blocks = 0; static ULong g_max_bytes = 0; static ULong g_max_instrs = 0; -// Values for the entire run. Computed at the end. +// Values for the entire run. Updated each time a block is retired. static ULong g_reads_bytes = 0; static ULong g_writes_bytes = 0; @@ -616,6 +616,13 @@ void* renew_block ( ThreadId tid, void* p_old, SizeT new_req_szB ) // New size is smaller or same; block not moved. resize_Block(bk->ap, bk->req_szB, new_req_szB); bk->req_szB = new_req_szB; + + // Update reads/writes for the implicit copy. Even though we didn't + // actually do a copy, we act like we did, to match up with the fact + // that we treat this as an additional allocation. + bk->reads_bytes += new_req_szB; + bk->writes_bytes += new_req_szB; + return p_old; } else { @@ -636,15 +643,19 @@ void* renew_block ( ThreadId tid, void* p_old, SizeT new_req_szB ) // interval tree at the new place. Do this by removing // and re-adding it. delete_Block_starting_at( (Addr)p_old ); - // now 'bk' is no longer in the tree, but the Block itself - // is still alive + // Now 'bk' is no longer in the tree, but the Block itself + // is still alive. + + // Update reads/writes for the copy. + bk->reads_bytes += bk->req_szB; + bk->writes_bytes += bk->req_szB; // Update the metadata. resize_Block(bk->ap, bk->req_szB, new_req_szB); bk->payload = (Addr)p_new; bk->req_szB = new_req_szB; - // and re-add + // And re-add it to the interval tree. Bool present = VG_(addToFM)( interval_tree, (UWord)bk, (UWord)0/*no val*/); tl_assert(!present); diff --git a/dhat/docs/dh-manual.xml b/dhat/docs/dh-manual.xml index da71720c33..bd318f563e 100644 --- a/dhat/docs/dh-manual.xml +++ b/dhat/docs/dh-manual.xml @@ -620,6 +620,53 @@ optimization. + +Treatment of <computeroutput>realloc</computeroutput> + +realloc is a tricky function and there +are several different ways that DHAT could handle it. + +Imagine a malloc(100) call followed by +a realloc(200) call. This combination is +considered to add two to the total block count, and 300 bytes to the total +bytes count. (An alternative would be to only add one to the total block +count, and 200 bytes to the total bytes count, as if a single +malloc(200) call had occurred. While this +would be defensible from a semantic point of view, it is silly from an +operational point of view, because making two calls to allocator functions is +more expensive than one call, and DHAT is a profiler that aims to help with +runtime costs.) + +Furthermore, the implicit copying of the 100 bytes is added to the reads +and writes counts. Without this, the read and write counts would be +under-measured and misleading. + +However, DHAT only increases the current heap size by 100 bytes for this +combination, and does not change the current block count. (As opposed to +increasing the current heap size by 200 bytes and then decreasing it by 100 +bytes.) As a result, it can only increase the global heap peak (if indeed, +this results in a new peak) by 100 bytes. + +Finally, the allocation point assigned to the block allocated by the +malloc(100) call is retained once the block +is reallocated. Which means that all 300 bytes are attributed to that +allocation point, and no separate allocation point is created for the +realloc(200) call. This may be surprising, +but it has one large benefit. + +Imagine some code that starts with an empty buffer, and then gradually +adds data to that buffer from numerous different points in the code, +reallocating the buffer each time it gets full. (E.g. code generation in a +compiler might work this way.) With the described approach, the first heap +block and all subsequent heap blocks are attributed to the same allocation +point. While this is something of a lie -- the first allocation point isn't +actually responsible for the other allocations -- it is arguably better than +having the allocation points spread around, in a distribution +that unpredictably depends on whenever the reallocation points were +triggered. + + + DHAT Command-line Options diff --git a/dhat/tests/basic.c b/dhat/tests/basic.c index 3ac6617a40..abb7ef4364 100644 --- a/dhat/tests/basic.c +++ b/dhat/tests/basic.c @@ -15,12 +15,14 @@ int main(void) c[i + 1000] = c[i]; // read and write 1000 bytes } - char* r = realloc(m, 3000); + char* r = realloc(m, 3000); // read and write 1000 bytes (memcpy) for (int i = 0; i < 500; i++) { r[i + 2000] = 99; // write 500 bytes } - // totals: 1008 read, 1516 write - free(c); + c = realloc(c, 1000); // read and write 1000 bytes (memcpy) + + free(c); + // totals: 3008 read, 3516 write return 0; } diff --git a/dhat/tests/basic.stderr.exp b/dhat/tests/basic.stderr.exp index cc714443dd..c846d1e33a 100644 --- a/dhat/tests/basic.stderr.exp +++ b/dhat/tests/basic.stderr.exp @@ -1,5 +1,5 @@ -Total: 6,000 bytes in 3 blocks +Total: 7,000 bytes in 4 blocks At t-gmax: 5,000 bytes in 2 blocks At t-end: 3,000 bytes in 1 blocks -Reads: 1,008 bytes -Writes: 1,516 bytes +Reads: 3,008 bytes +Writes: 3,516 bytes