From: Nicholas Nethercote Date: Tue, 19 Jul 2005 03:30:31 +0000 (+0000) Subject: Added a file containing Josef's explanation of Calltree's function X-Git-Tag: svn/VALGRIND_3_0_0~142 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=37a45c37b95ef1902d4b537af2a6daeb10c2cf46;p=thirdparty%2Fvalgrind.git Added a file containing Josef's explanation of Calltree's function entry/exit tracking. git-svn-id: svn://svn.valgrind.org/valgrind/trunk@4176 --- diff --git a/coregrind/Makefile.am b/coregrind/Makefile.am index 51307e6b7c..d58cd2d098 100644 --- a/coregrind/Makefile.am +++ b/coregrind/Makefile.am @@ -81,7 +81,8 @@ noinst_HEADERS = \ vki_unistd-x86-linux.h EXTRA_DIST = \ - README_MODULES.txt + README_MODULES.txt \ + README_TRACKING_FN_ENTRY_EXIT BUILT_SOURCES = stage2.lds CLEANFILES = stage2.lds diff --git a/coregrind/README_TRACKING_FN_ENTRY_EXIT b/coregrind/README_TRACKING_FN_ENTRY_EXIT new file mode 100644 index 0000000000..40270c8c9a --- /dev/null +++ b/coregrind/README_TRACKING_FN_ENTRY_EXIT @@ -0,0 +1,205 @@ + +This file describes in detail how Calltree accurately tracks function +entry/exit, one of those harder-than-you'd-think things. + +----------------------------------------------------------------------------- +Josef's description +----------------------------------------------------------------------------- +From: Josef Weidendorfer +To: Nicholas Nethercote +Cc: valgrind-developers@lists.sourceforge.net +Subject: [Valgrind-developers] Re: Tracking function entry/exit + +On Sunday 25 January 2004 16:53, Nicholas Nethercote wrote: +> Josef, +> +> The topic of tracking function entry/exit has come up a few times on the +> mailing lists recently. My usual answer is that it's difficult to do +> correctly. However, you seem to do it with Calltree. I looked at the +> source code a bit, and it looks like you are doing some reasonably +> complicated things to get it right, eg. unwinding the stack. How robust +> is your approach? Can you briefly explain how it works? + +A note before describing the mechanism: I need to have a helper call at start +of every BB anyway, so I use this helper to do the tracking. This of course +has some overhead, and perhaps can be avoided, but it seems to add to the +robustness. I have a bug fix here for reentrent entering of a signal handler +(2 bug reports). Otherwise I have no bug reports, so I assume that the +mechanism to be quite robust. + +I have a shadow call stack for every thread. For signal handlers of a thread, +I first PUSH a separation marker on the shadow stack, and use the stack as +normal. The marker is used for unwinding when leaving the signal handler. +This is fine as there is no scheduling among signal handlers of one thread. + +Instrumentation of calltree: +* Store at the end of each basic block the jmpkind into a tool-global, static +variable. +* At the start of every BB, jump to a helper function. + +The helper function does the following regarding function call tracking: +- for a control transfer to another ELF object/ELF section, override jmpkind + with a CALL (*1) +- for a control transfer to the 1st basic block of a function, override + jmpkind with a CALL (*2) +- do unwinding if needed (i.e, POPs of the shadow call stack) +- if jmpkind is RET and there was no unwinding/POP: + - if our call stack is empty, simulate a CALL lasting from beginning + (with Valgrind 2.1.x, this is not needed any more, as we run on + simulated CPU from first client instruction) + - otherwise this is a JMP using a RET instruction (typically used in + the runtime linker). Do a POP, setting previous BB address to call + site and override jmpkind with a CALL. By this, you get 2 function + calls from a calling site. +- when jmpkind is a CALL, push new function call from previous BB to current + BB on shadow call stack. +- Save current BB address to be available for call to handler in next BB. + +Special care is needed at thread switches and enter/leave of signal handlers, +as we need separate shadow call stacks. + +Known bug: We should check for the need of unwinding when ESP is explicitly +written to. I hope this doesn't create too much overhead. + +Remarks: +(*1) Jumps between ELF objects are function calls to a shared library. This is + mainly done to catch the JMP from PLT code. +(*2) This is what your function tracking skin/tool does. It is needed here + mainly to catch tail recursion. In general, for functions doing a + "return otherfunction()", GCC produces JMPs with -O2. + +Additional points: +- If I need a name for a function, but there is no debug info, I use the + instruction address minus the load offset of the corresponding ELF object + (if there is one) to get a relative address for that ELF object. This + offset can be used with objdump later in postprocessing tools (e.g. + objdump). I would suggest this change even for cachegrind instead of a + "???". +- I introduced the ability to specify functions to be "skipped". This means + that execution of these functions is attributed to the calling function. + The default is to skip all functions located in PLT sections. Thus, in + effect, costs of PLT functions are attributed to callers, and the call to + a shared library function starts directly with code in the other ELF + object. +- As Vg 2.1.x does pointerchecking, the instrumentation can't write to + memory space of Valgrind any longer. Currently, my tool needs + "--pointercheck=no" to be able to run. Jeremy and me already agreed on + replacing current LD/ST with a CLD/CST (Client Load/Store) with pointer + check and keep original LD/ST for tool usage without pointerchecking. + +Looking at these things, it seems possible to do function tracking at end of a +basic block instead of the beginning of the next BB. This way, we can perhaps +avoid calls to helpers at every BB. + +From my point of view, it would be great to integrate optional function +tracking into Valgrind core with some hooks. + +Josef + + +----------------------------------------------------------------------------- +Josef's clarification of Nick's summary of Josef's description +----------------------------------------------------------------------------- +On Monday 21 June 2004 12:15, Nicholas Nethercote wrote: + +> I've paraphrased your description to help me understand it better, but I'm +> still not quite clear on some points. I looked at the code, but found it +> hard to understand. Could you help me? I've written my questions in +> square brackets. Here's the description. +> +> -------- +> +> Data structures: +> +> - have a shadow call stack for every thread +> [not sure exactly what goes on this] + +That's the resizable array of struct _call_entry's. +Probably most important for call tracking is the %ESP value +directly after a CALL, and a pointer to some struct storing information +about the call arc or the called function. + +The esp value is needed to be able to robustly unwind correctly at %esp +changes with %esp > stored esp on shadow stack. + +> Action at BB start -- depends on jmp_kind from previous BB: +> +> - If jmp_kind is neither JmpCall nor JmpRet (ie. is JmpNone, JmpBoring, +> JmpCond or JmpSyscall) and we transferred from one ELF object/section to +> another, it must be a function call to a shared library -- treat as a +> call. This catches jmps from PLT code. +> +> - If this is the first BB of a function, treat as a call. This catches +> tail calls (which gcc uses for "return f()" with -O2). +> [What if a function had a 'goto' back to its beginning? Would that be +> interpreted as a call?] + +Yes. IMHO, there is no way to distinguish between optimized tail recursion +using a jump and regular jumping. But as most functions need parameters on +the stack, a normal jump will rarely jump to the first BB of a function, +wouldn't it? + +> - Unwind the shadow call stack if necessary. +> [when is "necessary"? If the real %esp > the shadow stack %esp?] + +Yes. Currently I do this at every BB boundary, but perhaps it should be +checked at every %esp change. Then, OTOH, it would look strange to attribute +instructions of one BB to different functions? + +> - If this is a function return and there was no shadow stack unwinding, +> this must be a RET control transfer (typically used in the runtime +> linker). Pop the shadow call stack, setting the previous BB address to +> call site and override jmpkind with a CALL. By this, you get 2 function +> calls from a calling site. +> [I don't understand this... What is a "RET control transfer"? Why do +> you end up with 2 function calls -- is that a bad thing?] + +If there is a RET instruction, this usually should unwind (i.e. leave a +function) at least one entry of the shadow call stack. But this doesn't need +to be the case, i.e. even after a RET, %esp could be lower or equal to the +one on the shadow stack. E.g. suppose + + PUSH addr + RET + +This is only another way of saying "JMP addr", and doesn't add/remove any +stack frame at all. +Now, if addr is (according to debug information) inside of another function, +this is a JMP between functions, let's say from B to C. Suppose B was called +from A, I generate a RETURN event to A and a CALL event from A to C in this +case. + +> - If we're treating the control transfer as a call, push new function call +> from previous BB to current BB on shadow call stack. +> [when is this information used?] + +I meant: Append a struct call_entry to the shadow stack (together with the +current %esp value). As I said before, the shadow stack is used for robust +unwinding. + +> - Save current BB address to be available for call to handler in next BB. +> +> +> Other actions: +> +> When entering a signal handler, first push a separation marker on the +> thread's shadow stack, then use it as normal. The marker is used for +> unwinding when leaving the signal handler. This is fine as there is no +> scheduling among signal handlers of one thread. +> +> Special care is needed at thread switches and enter/leave of signal +> handlers, as we need separate shadow call stacks. +> [Do you mean "separate shadow call stacks for each thread"?] + +Yes. + +> What about stack switching -- does it cope with that? (Not that Valgrind +> in general does...) + +No. +If you could give me a hint how to do it, I would be pleased. The problem here +IMHO is: How to distinguish among a stack switch and allocating a huge array +on the stack? + +Josef +