--- /dev/null
+
+This file describes in detail how Calltree accurately tracks function
+entry/exit, one of those harder-than-you'd-think things.
+
+-----------------------------------------------------------------------------
+Josef's description
+-----------------------------------------------------------------------------
+From: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
+To: Nicholas Nethercote <njn25@cam.ac.uk>
+Cc: valgrind-developers@lists.sourceforge.net
+Subject: [Valgrind-developers] Re: Tracking function entry/exit
+
+On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote:
+> Josef,
+>
+> The topic of tracking function entry/exit has come up a few times on the
+> mailing lists recently. My usual answer is that it's difficult to do
+> correctly. However, you seem to do it with Calltree. I looked at the
+> source code a bit, and it looks like you are doing some reasonably
+> complicated things to get it right, eg. unwinding the stack. How robust
+> is your approach? Can you briefly explain how it works?
+
+A note before describing the mechanism: I need to have a helper call at start
+of every BB anyway, so I use this helper to do the tracking. This of course
+has some overhead, and perhaps can be avoided, but it seems to add to the
+robustness. I have a bug fix here for reentrent entering of a signal handler
+(2 bug reports). Otherwise I have no bug reports, so I assume that the
+mechanism to be quite robust.
+
+I have a shadow call stack for every thread. For signal handlers of a thread,
+I first PUSH a separation marker on the shadow stack, and use the stack as
+normal. The marker is used for unwinding when leaving the signal handler.
+This is fine as there is no scheduling among signal handlers of one thread.
+
+Instrumentation of calltree:
+* Store at the end of each basic block the jmpkind into a tool-global, static
+variable.
+* At the start of every BB, jump to a helper function.
+
+The helper function does the following regarding function call tracking:
+- for a control transfer to another ELF object/ELF section, override jmpkind
+ with a CALL (*1)
+- for a control transfer to the 1st basic block of a function, override
+ jmpkind with a CALL (*2)
+- do unwinding if needed (i.e, POPs of the shadow call stack)
+- if jmpkind is RET and there was no unwinding/POP:
+ - if our call stack is empty, simulate a CALL lasting from beginning
+ (with Valgrind 2.1.x, this is not needed any more, as we run on
+ simulated CPU from first client instruction)
+ - otherwise this is a JMP using a RET instruction (typically used in
+ the runtime linker). Do a POP, setting previous BB address to call
+ site and override jmpkind with a CALL. By this, you get 2 function
+ calls from a calling site.
+- when jmpkind is a CALL, push new function call from previous BB to current
+ BB on shadow call stack.
+- Save current BB address to be available for call to handler in next BB.
+
+Special care is needed at thread switches and enter/leave of signal handlers,
+as we need separate shadow call stacks.
+
+Known bug: We should check for the need of unwinding when ESP is explicitly
+written to. I hope this doesn't create too much overhead.
+
+Remarks:
+(*1) Jumps between ELF objects are function calls to a shared library. This is
+ mainly done to catch the JMP from PLT code.
+(*2) This is what your function tracking skin/tool does. It is needed here
+ mainly to catch tail recursion. In general, for functions doing a
+ "return otherfunction()", GCC produces JMPs with -O2.
+
+Additional points:
+- If I need a name for a function, but there is no debug info, I use the
+ instruction address minus the load offset of the corresponding ELF object
+ (if there is one) to get a relative address for that ELF object. This
+ offset can be used with objdump later in postprocessing tools (e.g.
+ objdump). I would suggest this change even for cachegrind instead of a
+ "???".
+- I introduced the ability to specify functions to be "skipped". This means
+ that execution of these functions is attributed to the calling function.
+ The default is to skip all functions located in PLT sections. Thus, in
+ effect, costs of PLT functions are attributed to callers, and the call to
+ a shared library function starts directly with code in the other ELF
+ object.
+- As Vg 2.1.x does pointerchecking, the instrumentation can't write to
+ memory space of Valgrind any longer. Currently, my tool needs
+ "--pointercheck=no" to be able to run. Jeremy and me already agreed on
+ replacing current LD/ST with a CLD/CST (Client Load/Store) with pointer
+ check and keep original LD/ST for tool usage without pointerchecking.
+
+Looking at these things, it seems possible to do function tracking at end of a
+basic block instead of the beginning of the next BB. This way, we can perhaps
+avoid calls to helpers at every BB.
+
+From my point of view, it would be great to integrate optional function
+tracking into Valgrind core with some hooks.
+
+Josef
+
+
+-----------------------------------------------------------------------------
+Josef's clarification of Nick's summary of Josef's description
+-----------------------------------------------------------------------------
+On Monday 21 June 2004 12:15, Nicholas Nethercote wrote:
+
+> I've paraphrased your description to help me understand it better, but I'm
+> still not quite clear on some points. I looked at the code, but found it
+> hard to understand. Could you help me? I've written my questions in
+> square brackets. Here's the description.
+>
+> --------
+>
+> Data structures:
+>
+> - have a shadow call stack for every thread
+> [not sure exactly what goes on this]
+
+That's the resizable array of struct _call_entry's.
+Probably most important for call tracking is the %ESP value
+directly after a CALL, and a pointer to some struct storing information
+about the call arc or the called function.
+
+The esp value is needed to be able to robustly unwind correctly at %esp
+changes with %esp > stored esp on shadow stack.
+
+> Action at BB start -- depends on jmp_kind from previous BB:
+>
+> - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring,
+> JmpCond or JmpSyscall) and we transferred from one ELF object/section to
+> another, it must be a function call to a shared library -- treat as a
+> call. This catches jmps from PLT code.
+>
+> - If this is the first BB of a function, treat as a call. This catches
+> tail calls (which gcc uses for "return f()" with -O2).
+> [What if a function had a 'goto' back to its beginning? Would that be
+> interpreted as a call?]
+
+Yes. IMHO, there is no way to distinguish between optimized tail recursion
+using a jump and regular jumping. But as most functions need parameters on
+the stack, a normal jump will rarely jump to the first BB of a function,
+wouldn't it?
+
+> - Unwind the shadow call stack if necessary.
+> [when is "necessary"? If the real %esp > the shadow stack %esp?]
+
+Yes. Currently I do this at every BB boundary, but perhaps it should be
+checked at every %esp change. Then, OTOH, it would look strange to attribute
+instructions of one BB to different functions?
+
+> - If this is a function return and there was no shadow stack unwinding,
+> this must be a RET control transfer (typically used in the runtime
+> linker). Pop the shadow call stack, setting the previous BB address to
+> call site and override jmpkind with a CALL. By this, you get 2 function
+> calls from a calling site.
+> [I don't understand this... What is a "RET control transfer"? Why do
+> you end up with 2 function calls -- is that a bad thing?]
+
+If there is a RET instruction, this usually should unwind (i.e. leave a
+function) at least one entry of the shadow call stack. But this doesn't need
+to be the case, i.e. even after a RET, %esp could be lower or equal to the
+one on the shadow stack. E.g. suppose
+
+ PUSH addr
+ RET
+
+This is only another way of saying "JMP addr", and doesn't add/remove any
+stack frame at all.
+Now, if addr is (according to debug information) inside of another function,
+this is a JMP between functions, let's say from B to C. Suppose B was called
+from A, I generate a RETURN event to A and a CALL event from A to C in this
+case.
+
+> - If we're treating the control transfer as a call, push new function call
+> from previous BB to current BB on shadow call stack.
+> [when is this information used?]
+
+I meant: Append a struct call_entry to the shadow stack (together with the
+current %esp value). As I said before, the shadow stack is used for robust
+unwinding.
+
+> - Save current BB address to be available for call to handler in next BB.
+>
+>
+> Other actions:
+>
+> When entering a signal handler, first push a separation marker on the
+> thread's shadow stack, then use it as normal. The marker is used for
+> unwinding when leaving the signal handler. This is fine as there is no
+> scheduling among signal handlers of one thread.
+>
+> Special care is needed at thread switches and enter/leave of signal
+> handlers, as we need separate shadow call stacks.
+> [Do you mean "separate shadow call stacks for each thread"?]
+
+Yes.
+
+> What about stack switching -- does it cope with that? (Not that Valgrind
+> in general does...)
+
+No.
+If you could give me a hint how to do it, I would be pleased. The problem here
+IMHO is: How to distinguish among a stack switch and allocating a huge array
+on the stack?
+
+Josef
+