Most notable, the "Function summary" section, which printed one CC for each
`file:function` combination, has been replaced by two sections, "File:function
summary" and "Function:file summary".
These new sections both feature "deep CCs", which have an "outer CC" for the
file (or function), and one or more "inner CCs" for the paired functions (or
files).
Here is a file:function example, which helps show which files have a lot of
events, even if those events are spread across a lot of functions.
```
> 12,427,830 (5.4%, 26.3%) /home/njn/moz/gecko-dev/js/src/ds/LifoAlloc.h:
6,107,862 (2.7%) js::frontend::ParseNodeVerifier::visit(js::frontend::ParseNode*)
3,685,203 (1.6%) js::detail::BumpChunk::setBump(unsigned char*)
1,640,591 (0.7%) js::LifoAlloc::alloc(unsigned long)
711,008 (0.3%) js::detail::BumpChunk::assertInvariants()
```
And here is a function:file example, which shows how heavy inlining can result
in a machine code function being derived from source code from multiple files:
```
> 1,343,736 (0.6%, 35.6%) js::gc::TenuredCell::isMarkedGray() const:
651,108 (0.3%) /home/njn/moz/gecko-dev/js/src/d64/dist/include/js/HeapAPI.h
292,672 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Cell.h
254,854 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Heap.h
```
Previously these patterns were very hard to find, and it was easy to overlook a
hot piece of code because its counts were spread across multiple non-adjacent
entries. I have already found these changes very useful for profiling Rust
code.
Also, cumulative percentages on the outer CCs (e.g. the 26.3% and 35.6% in the
example) tell you what fraction of all events are covered by the entries so
far, something I've wanted for a long time.
Some other, related changes:
- Column event headers are now padded with `_`, e.g. `Ir__________`. This makes
the column/event mapping clearer.
- The "Cachegrind profile" section is now called "Metadata", which is
shorter and clearer.
- A few minor test tweaks, beyond those required for the output changes.
- I converted some doc comments to normal comments. Not standard Python, but
nicer to read, and there are no public APIs here.
- Roughly 2x speedups to `cg_annotate` and smaller improvements for `cg_diff`
and `cg_merge`, due to the following.
- Change the `Cc` class to a type alias for `list[int]`, to avoid the class
overhead (sigh).
- Process event count lines in a single split, instead of a regex
match + split.
- Add the `add_cc_to_ccs` function, which does multiple CC additions in a
single function call.
- Better handling of dicts while reading input, minimizing lookups.
- Pre-computing the missing CC string for each CcPrinter, instead of
regenerating it each time.