Julian Seward [Sun, 11 Feb 2007 05:08:06 +0000 (05:08 +0000)]
Redo the dispatcher's fast-cache mechanism (VG_(tt_fast) et al) to be
more cache friendly. This changes the mechanism from being a table of
pointers to (guest address, translated code pairs) to being a table of
pairs (guest address, pointer to translated code). The effect ranges
from zero up to about 20% performance improvement on memcheck, the
biggest effects being seen for programs which jump around a large
number of blocks of code and whose data set does not fit in L2.
Julian Seward [Thu, 8 Feb 2007 16:25:56 +0000 (16:25 +0000)]
Specialise VG_(ssort) for 4-word elements. This removes about 80% of
all calls to VG_(memcpy). Thanks to cachegrind for showing somebody
was calling VG_(memcpy) a huge number of times, and to callgrind for
finding out who :-)
Julian Seward [Thu, 8 Feb 2007 06:47:19 +0000 (06:47 +0000)]
Add a new flag --cachegrind-log-file to cg_annotate, which tells it
precisely the name of the profile data file it should use (instead of
assuming cachegrind.out.<pid> where <pid> is specified by the --<pid>
flag). The old mechanism is still supported though.
Julian Seward [Wed, 7 Feb 2007 19:55:30 +0000 (19:55 +0000)]
* Add new flag --cachegrind-out-file to specify the output file
basename to be something other than "cachegrind.out".
* Observe the core-supplied --log-file-qualifier, if specified,
in creation of output file names.
* To make the above work, move most of the stuff in cg_pre_clo_init
into cg_post_clo_init, so that the core's determination of the
log file qualifier, if any, is done by the time cachegrind comes
to process its arguments.
Julian Seward [Sat, 13 Jan 2007 22:27:51 +0000 (22:27 +0000)]
When '-d' (one or more) is specified, disallow the client from closing
fd 2 (stderr) since that's what m_debuglog writes to, and the
resulting disappearance of the debug log can be confusing.
Julian Seward [Fri, 12 Jan 2007 19:03:19 +0000 (19:03 +0000)]
ML_(read_callframe_info_dwarf2): deal better with CIEs with no
augmentation (has to do with read_encoded_Addr). This "fix" is a
kludge and may be replaced in future by something cleaner. See
extensive comment addition for the whole sorry tale.
Julian Seward [Thu, 11 Jan 2007 19:42:11 +0000 (19:42 +0000)]
Non-functional change: rename a bunch of variables and field names
that hold various kinds of addresses during debuginfo reading, so as
to make it easier to understand. See comment at top of debuginfo.c.
Julian Seward [Tue, 9 Jan 2007 16:47:20 +0000 (16:47 +0000)]
ML_(generic_PRE_sys_mmap): In the case of a hinted mapping (for the
client) which aspacemgr accepts at the hint address but the kernel
declines, try again as a non-hinted mapping. Fixes ld.so mapping
failures observed on ppc32-linux, although the problem potentially
applies to all Linux targets.
Julian Seward [Mon, 1 Jan 2007 22:07:58 +0000 (22:07 +0000)]
Avoid printf in the recursive routines, so that the intercept of
mempcpy which is called from printf does not mess up the
carefully-balanced call-stack overflow checks that this test does on
ppc64-linux.
Julian Seward [Sun, 31 Dec 2006 00:22:30 +0000 (00:22 +0000)]
Intercept/replace glibc-2.5's __strcpy_chk function for the usual
reasons: it reads word-sized chunks from memory and so produces lots
of errors in SuSE 10.2 (amd64).
Julian Seward [Thu, 28 Dec 2006 20:26:08 +0000 (20:26 +0000)]
Get rid of the core-tool events pre_mutex_lock, post_mutex_lock and
post_mutex_unlock. The core can't detect them anyway any more, so
there's no point in having them.
Callgrind: Throttle calls CLG_(run_thread) after r6413
After the change in r6413, CLG_(run_thread) is called a
lot more often, increasing the polling overhead to check
for a callgrind command file (created by callgrind_control
for controlling a callgrind run in an interactive way).
This reduces the calls to only be done every 5000 BBs,
which gives a similar polling frequency as before.
Julian Seward [Sat, 23 Dec 2006 01:21:12 +0000 (01:21 +0000)]
Change the core-tool interface 'thread_run' event to be more useful:
- Rename the event to 'thread_runstate'.
- Add arguments: pass also a boolean indicating whether the thread
is running or stopping, and a 64-bit int showing how many blocks
overall have run, so tools can make a rough estimate of workload.
The boolean allows tools to see threads starting and stopping.
Prior to this, de-schedule events were invisible to tools.
- Call the callback (hand the event to tools) just before client
code is run, and again immediately after it stops running. This
should give correct sequencing w.r.t posting of thread creation/
destruction events.
In order to make callgrind work without complex changes, I added a
simple impedance-matching function 'clg_thread_runstate_callback'
which hands thread-run events onwards to CLG_(thread_run).
Use this new 'thread_runstate' with care: it will be called before
and after every translation, which means it will be called ~500k
times in a startup of firefox. So the callback needs to be fast.
Julian Seward [Sun, 17 Dec 2006 18:58:55 +0000 (18:58 +0000)]
A naming-only change: rename VG_(set_running) to VG_(acquire_BigLock)
and VG_(set_sleeping) to VG_(release_BigLock). And some other minor
renamings to the thread locking stuff, to make it easier to follow.