Julian Seward [Sat, 25 Aug 2007 07:19:08 +0000 (07:19 +0000)]
Changes to m_hashtable:
Allow hashtables to dynamically resize (patch from Christoph
Bartoschek). Results in the following interface changes:
* HT_construct: no need to supply an initial table size.
Instead, supply a text string used to "name" the table, so
that debugging messages ("resizing the table") can say which
one they are resizing.
* Remove VG_(HT_get_node). This exposes the chain structure to
callers (via the next_ptr parameter), which is a problem since
callers could get some info about the chain structure which then
changes when the table is resized. Fortunately is not used.
* Remove VG_(HT_first_match) and VG_(HT_apply_to_all_nodes) as
they are unused.
* Make the iteration mechanism more paranoid, so any adding or
deleting of nodes part way through an iteration causes VG_(HT_next)
to assert.
* Fix the comment on VG_(HT_to_array) so it no longer speaks
specifically about MC's leak detector.
Julian Seward [Thu, 23 Aug 2007 10:22:44 +0000 (10:22 +0000)]
The drastic increase in the number of per-arena freelists in r6771
exposes a performance problem with doing m_mallocfree.c sanity checks
(at --sanity-level=3, at least), caused by slowness in
listNo_to_pszB_min. This commit fixes the problem by caching the
results of queries to listNo_to_pszB_min.
Julian Seward [Tue, 21 Aug 2007 10:55:26 +0000 (10:55 +0000)]
Previously, each Arena has a linked list of Superblocks, which can
make VG_(arena_free) expensive if many superblocks have to be checked
before the right one is found. This change gives the arena a
dynamically expanding sorted array of superblocks, so that finding the
superblock containing an about-to-be-freed block (findSb) is now
O(log2 n) rather than linear in the number of superblocks in the
arena. Patch from Christoph Bartoschek.
Julian Seward [Mon, 20 Aug 2007 22:57:56 +0000 (22:57 +0000)]
Some improvements for malloc/free intensive programs, inspired by
performance studies by Christoph Bartoschek:
* Increase the number of freelists per arena from 18 to 112, so as
to (drastically) cut down on the amount of freelist searching that
happens.
* Increase the size of the client and tool arenas, so as to reduce
the cost of finding arenas during freeing. This is a kludge; a
better solution would be to use binary search on superblocks, as
Christoph's patches do.
Get rid of VG_(getcwd) and replace it with a pair of functions,
VG_(record_startup_wd) which records the working directory at startup,
and VG_(get_startup_wd) which later tells you what value was recorded.
This works because all uses of VG_(getcwd) serve only to record the
directory at process start anyway. The motivation is that AIX does
not support sys_getcwd directly, so it's easier for the launcher to
ship in the required value using an environment variable. On Linux
sys_getcwd is used as before.
Callgrind manual: rewriting start of section about avoding cycles
This hopefully makes the whole issue with cycles easier to understand.
And no, this does not get rid of the description of cycles, carefully
crafted by Julian ;-)
* Looks a little bit more like the Cachegrind manual
(at least in front)
* Removed the out-of-place general section about profiling
and gprof. Perhaps something like this can be put at
another place
* Notes about Callgrinds problems with call tracing on PPC
* Include usage of callgrind_annotate, and note its lack of
cycle detection
Julian Seward [Tue, 8 May 2007 09:20:25 +0000 (09:20 +0000)]
Add branch-misprediction profiling to Cachegrind. When the (new) flag
--branch-sim=yes is specified, Cachegrind simulates a simple indirect
branch predictor and a conditional branch predictor. The latter
considers both the branch instruction's address and the behaviour of
the last few conditional branches. Return stack prediction is not
modelled.
The new counted events are: conditional branches (Bc), mispredicted
conditional branches (Bcm), indirect branches (Bi) and mispredicted
indirect branches (Bim). Postprocessing tools (cg_annotate, cg_merge)
handle the new events as you would expect. Note that branch
simulation is not enabled by default as it gives a 20%-25% slowdown,
so you need to ask for it explicitly using --branch-sim=yes.
Julian Seward [Sat, 5 May 2007 11:40:35 +0000 (11:40 +0000)]
Fix stack overflow which lead to totally mysterious .bss corruption
and hence to segfaulting in vex on ppc32/64-linux in obscure
circumstances. VKI_MAX_PAGE_SIZE is 64k in recent Valgrinds.
Julian Seward [Tue, 1 May 2007 13:44:08 +0000 (13:44 +0000)]
If gcc supports -fno-stack-protector, use it. This should fix
compilation failures on distros where -fstack-protector is enabled by
default. See #144112.
When doing 'demo' translations for --profile-flags=, make at least
some attempt to discard existing translations first. Otherwise
Cachegrind (rightly) asserts on the basis that it is seeing duplicate
translation requests for the same entry point.
Fix bug 142197: don't free --toolname:foo options after they've been munged,
because tools should be able to assume that they are never freed, just like
other options.
Julian Seward [Mon, 19 Mar 2007 18:38:55 +0000 (18:38 +0000)]
Make ptrace-based launchers able to handle --help, --version etc.
Problem is that --help etc are handled by the tool exe. But a
ptrace-based launch scheme can't run "no program" if the user just
types "valgrind --help" because the launcher depends on starting the
client first and only then attaching valgrind to it using ptrace. So
instead provide a dummy do-nothing program to run when no program is
specified. m_main notices this and acts as if there really had been
no program specified.
This has no effect at all on Linux/ELF program launching.
Julian Seward [Mon, 19 Mar 2007 13:38:11 +0000 (13:38 +0000)]
Document and tidy up one of the more arcane corners of signal
handling: why PRE(sys_sigreturn) has to construct a fake syscall
return value which, when written back to the guest state, leaves it
unchanged. It's only taken me about 3 years to realise why :-)
Fixes to ppc platforms to follow.
Julian Seward [Wed, 14 Mar 2007 11:57:37 +0000 (11:57 +0000)]
Use a 64-bit counter to keep track of the total number of bytes
allocated, rather than SizeT which is word-sized. Your average C++
lardware can easily turn over more than 4G in total in a half hour run
on a 32-bit machine, in which case the counter wraps around.
Julian Seward [Mon, 12 Mar 2007 02:10:23 +0000 (02:10 +0000)]
Add a test for vex ppc64 code generation bug fixed by vex r1739
(When generating 64-bit code, ensure that any addresses used in 4 or 8
byte loads or stores of the form reg+imm have the lowest 2 bits of imm
set to zero, so that they can safely be used in ld/ldu/lda/std/stdu
instructions.)
Julian Seward [Sun, 11 Mar 2007 13:00:34 +0000 (13:00 +0000)]
It appears glibc-2.5's getenv() function steps along environment
strings in 16-bit chunks, which can cause false errors in some cases
(sigh). So do the usual thing and replace it.