Tom Hughes [Mon, 31 Oct 2005 17:05:21 +0000 (17:05 +0000)]
Get core dumping working again - the architecture specific code that
was in the sigframe module has been moved into the coredump module
where it belongs and things fixed up to compiler again.
Julian Seward [Sun, 23 Oct 2005 12:06:55 +0000 (12:06 +0000)]
Don't assume the first statement is an IRMark, since it could instead
be part of a self-check. Instead, copy verbatim any IR preamble
preceding the first IMark. This stops cachegrind asserting on
self-checking translations.
Tom Hughes [Thu, 20 Oct 2005 18:38:08 +0000 (18:38 +0000)]
Don't assert if the DWARF line info reader is given so little data that
it can't even read the length of the block - just report an error as we
do if there isn't enough data for the rest of the block. Fix bug #114757.
Julian Seward [Thu, 20 Oct 2005 01:57:29 +0000 (01:57 +0000)]
Increase the threshold above which new errors are not shown from 300
unique / 30000 total to 1000 unique / 100000 total. Programs are
generally bigger now than 3 years ago.
Julian Seward [Thu, 20 Oct 2005 01:37:15 +0000 (01:37 +0000)]
Remove all remaining references to pointercheck. It's sad to see it
go, but realistically we can't implement it portably, at least without
considerable performance overhead and some additional complexity.
Julian Seward [Thu, 20 Oct 2005 00:31:31 +0000 (00:31 +0000)]
In the spirit of other changes over the past month aimed at supporting
monster-sized programs better, increase the default freelist volume
from 1M to 5M. Maybe even that is too small.
Julian Seward [Wed, 19 Oct 2005 11:23:07 +0000 (11:23 +0000)]
Halve the size of the fast tt lookup cache. This improves ppc32
performance quite a bit, since the cache is emptied quite often on
ppc32, and a smaller cache is less intrusive in the real machine's L2
cache. On x86 the change doesn't seem to have much effect.
Julian Seward [Tue, 18 Oct 2005 12:04:18 +0000 (12:04 +0000)]
Change the core-tool interface so that tools are fully aware of both
the guest extents for the presented translation and also its original
un-redirected guest address. These changes are needed in particular
to make cachegrind's code cache management work properly.
Julian Seward [Tue, 18 Oct 2005 02:30:42 +0000 (02:30 +0000)]
Add extra auxiliary data structures which make it possible to quickly
find and delete all translations intersecting with small address
ranges (8 k or less, currently). This makes it possible to simulate
ppc32 icbi instructions in reasonable time, and finally makes the
ppc32 port run at a usable speed.
The scheme is based around partitioning translations into equivalence
classes based on address ranges. For deletions whose range falls
within a single class, all translations intersecting it can be found
by inspecting just that class and one other. Given that there are 256
classes, this cuts the cost, relative to scanning the entire TC, by
approximately half that factor (viz, 128), assuming the translations
are distributed evenly over the classes.
The whole business is more complex and difficult than I would like.
A detailed comment will later be added.
Very thorough sanity checking has been added
(sanity_check_eclasses_in_sector). This is engaged at
--sanity-level=4 and above.
The TT hash function (HASH_TT) has been improved to reduce its
tendency to cluster TT entries in some circumstances. This has
allowed the TT maximum loading factor to be increased from 66% to 80%
and so the absolute size of the TC (in each sector) to be less than
2^16 entries. The latter change is important for the fast-deletion
changes.
A small Cachegrind cleanup: previously it was copying some things (eg.
instr_size and instr_addr) into Ir events, then later copying those into
instrInfo nodes. Now it just allocates the instrInfo nodes earlier and
copies them in directly. This is a bit more concise and easier to
understand.
Cachegrind cleanups:
- Remove some unnecessary assertions.
- Add in some new ones.
- Make things more concise and readable by factoring out things like
"cgs->events[i+1]" into things like "ev2" in flushEvents().
OSet-ified Cachegrind:
- The instrInfoTable was a VgHashTable, now it's an OSet.
- The CC table was a custom 3-level hash table, now it's an OSet. This
is easier to understand and there's no worrying about whether the hash
array sizes are big enough. It also has the nice property that the
results in the cachegrind.out.<pid> file are now sorted, so they're a bit
easier to read.
I did some testing and the performance difference appears to be negligible;
CC table and InstrInfo table lookups and traversal aren't that critical.
Overhaul the way programs are loaded at startup and exec() works. Now the
checking of programs done in these two places are combined, which avoids
duplicate code and greatly reduces the number of cases in which exec()
fails causing Valgrind to bomb out.
Also, we can now load some programs we could not previously, such as scripts
lacking a "#!" line at the start. Also, the startup failure messages for
bad programs match the shell's messages very closely.
And I added a whole bunch of regtests to test all this.
Julian Seward [Wed, 12 Oct 2005 10:09:23 +0000 (10:09 +0000)]
Redo the way cachegrind generates instrumentation code, so that it can
deal with any IR that happens to show up. This makes it work on ppc32
and should fix occasionally-reported bugs on x86/amd64 where it bombs
due to having to deal with multiple date references in a single
instruction.
The new scheme is based around the idea of a queue of memory events
which are outstanding, in the sense that no IR has yet been generated
to do the relevant helper calls. The presence of the queue --
currently 16 entries deep -- gives cachegrind more scope for combining
multiple memory references into a single helper function call. As a
result it runs 3%-5% faster than the previous version, on x86.
This commit also changes the type of the tool interface function
'tool_discard_basic_block_info' and clarifies its meaning. See
comments in include/pub_tool_tooliface.h.
Julian Seward [Tue, 11 Oct 2005 22:06:29 +0000 (22:06 +0000)]
Make sync checking work on recent x86 kernels (eg SuSE 10) which mark
many 'r' sections also as 'x' because x86 can't really distinguish
them. The change just regards 'x' and 'r' as equivalent on x86.
Checking on ppc32/amd64 is unchanged.
Update cache simulator for 64 bit addresses. This probably won't have
caused many inaccuracies so far because it only matters if addresses
above the 4GB line are used. Thanks to Josef W for the patch.
Julian Seward [Fri, 7 Oct 2005 12:13:21 +0000 (12:13 +0000)]
ppc32 only: improve handling of CmpORD32S, so as to avoid false
positives from ppc code of the form "cmpi %reg,0 ; branch-if-negative
.." where the top bit of %reg is defined but not all of the other bits
are (common-ish enough to cause a considerable number of false
positives if not done right).
Julian Seward [Fri, 7 Oct 2005 11:08:55 +0000 (11:08 +0000)]
Fix the handling of CmpORD32{S,U} which was completely bogus and
would have caused ppc32 to miss many uninitialised value errors.
(Change affects ppc32 only).
Julian Seward [Fri, 7 Oct 2005 09:49:53 +0000 (09:49 +0000)]
Fix a memcheck anomaly observed by Nick: lazy propagation of
undefinedness was not being done properly for scalar shifts and that
could have led to undefined-value errors being falsely reported in the
obscure case where the shift amount was undefined but the end result
of the shift was unused. This commit handles shifts more in
accordance with the maximally-lazy V-bit-testing scheme used by the
rest of memcheck.
Tom Hughes [Thu, 6 Oct 2005 14:49:21 +0000 (14:49 +0000)]
When looking for a heap segment to extend look for the heap limit
address rather than the base address as the heap may have been split
into more than one segment by using mprotect on it...
Tom Hughes [Thu, 6 Oct 2005 09:00:17 +0000 (09:00 +0000)]
Fix realloc wrappers to handle the out of memory case properly - if
the call to VG_(cli_malloc) returns NULL then don't try and copy the
data or register a new block and just leave the old block in place
instead, but still return NULL to the caller.
Changed some overflow-prone counters from UInt to ULong.
Changed some printf specifiers accordingly, plus some more that were
incorrect.
Also put commas in various output numbers, eg. the leak check stats.
This makes them much easier to read when they get big. One
exception is in XML number-only fields such as <leakedbytes>.