Callgrind: Improve self-hosting with outer callgrind tool
This adds an option to change the default handling of jumps
between functions. Usually, a jump between functions is
interpreted as call, because such jumps are typically
generated by compilers on tail recursion optimization, and
we want to present this as call to the user. Thus, such
a jump pushes a call onto callgrinds shadow stack.
The option "--pop-on-jump" changes this to pop+push the
shadow callstack: then, a jump between functions is seen
as a return to the caller and a new call.
The default behaviour is _bad_ for using callgrind with
self-hosting. Valgrinds inner loop VG_(run_innerloop)
jumps to generated code, and this code jumps back to
the inner loop. Thus, every executed BB adds 2 calls
to an ever increasing shadow call stack, leading to
memory consumption increasing with runtime :-(
So: For self-hosting valgrind with an outer callgrind,
always use option "--pop-on-jump" for the outer callgrind.