Tom Hughes [Fri, 4 Nov 2005 11:31:33 +0000 (11:31 +0000)]
When unwinding the stack on x86/amd64 subtract one from the value of
ip before starting a new pass of the loop.
The reason for this is that (except for the first pass of the loop) the
value of ip is actually a return address, which is therefore after the
instruction that was executing at the time. This means that if there is
a boundary in the CFI information at that point we can wind up using the
wrong CFI data to do the next unwind if we do it based on the return
address.
This most commonly happens with a tail call where we wind up using the
data for the next function to do the unwind and getting hopelessly lost.
Tom Hughes [Wed, 2 Nov 2005 16:15:55 +0000 (16:15 +0000)]
Split faultstatus into the platform independent tests and those
which are x86 specific. The first three x86 specific ones should
work on amd64 as well so I have added those as amd64 tests.
Note that the x86/amd64 tests will still fail as VEX doesn't
always trigger the right sort of signal for faulting instructions
at the moment.
Tom Hughes [Wed, 2 Nov 2005 15:46:07 +0000 (15:46 +0000)]
The memcheck/tests/sigprocmask test is designed to test that we handle
the old style sigprocmask system call correctly without corrupting
memory when we copy out the new (larger) signal mask into the user
provided old (smaller) signal mask.
It therefore makes no sense to run it on amd64 or any other platform
which only has the newer rt_sigprocmask system call, and indeed it
wasn't working because we weren't passing the extra argument which
that call expects.
Tom Hughes [Wed, 2 Nov 2005 14:42:39 +0000 (14:42 +0000)]
The sloppyRXcheck logic in the sync checker was not correct - it was
simply treating R and X as equivalent but the real problem is that
mappings can appear to have X permission entirely indepenent of anything
else with recent x86 kernels.
If a mapping is inside the (deliberately constrained) code segment then
it will appear to have X permission regardless of whether R or X was asked
for when it was mapped, so what we really need to do is allow the kernel
to add X to any mapping but not to take it away if we were expecting it.
Tom Hughes [Mon, 31 Oct 2005 17:05:21 +0000 (17:05 +0000)]
Get core dumping working again - the architecture specific code that
was in the sigframe module has been moved into the coredump module
where it belongs and things fixed up to compiler again.
Julian Seward [Sun, 23 Oct 2005 12:06:55 +0000 (12:06 +0000)]
Don't assume the first statement is an IRMark, since it could instead
be part of a self-check. Instead, copy verbatim any IR preamble
preceding the first IMark. This stops cachegrind asserting on
self-checking translations.
Tom Hughes [Thu, 20 Oct 2005 18:38:08 +0000 (18:38 +0000)]
Don't assert if the DWARF line info reader is given so little data that
it can't even read the length of the block - just report an error as we
do if there isn't enough data for the rest of the block. Fix bug #114757.
Julian Seward [Thu, 20 Oct 2005 01:57:29 +0000 (01:57 +0000)]
Increase the threshold above which new errors are not shown from 300
unique / 30000 total to 1000 unique / 100000 total. Programs are
generally bigger now than 3 years ago.
Julian Seward [Thu, 20 Oct 2005 01:37:15 +0000 (01:37 +0000)]
Remove all remaining references to pointercheck. It's sad to see it
go, but realistically we can't implement it portably, at least without
considerable performance overhead and some additional complexity.
Julian Seward [Thu, 20 Oct 2005 00:31:31 +0000 (00:31 +0000)]
In the spirit of other changes over the past month aimed at supporting
monster-sized programs better, increase the default freelist volume
from 1M to 5M. Maybe even that is too small.
Julian Seward [Wed, 19 Oct 2005 11:23:07 +0000 (11:23 +0000)]
Halve the size of the fast tt lookup cache. This improves ppc32
performance quite a bit, since the cache is emptied quite often on
ppc32, and a smaller cache is less intrusive in the real machine's L2
cache. On x86 the change doesn't seem to have much effect.
Julian Seward [Tue, 18 Oct 2005 12:04:18 +0000 (12:04 +0000)]
Change the core-tool interface so that tools are fully aware of both
the guest extents for the presented translation and also its original
un-redirected guest address. These changes are needed in particular
to make cachegrind's code cache management work properly.
Julian Seward [Tue, 18 Oct 2005 02:30:42 +0000 (02:30 +0000)]
Add extra auxiliary data structures which make it possible to quickly
find and delete all translations intersecting with small address
ranges (8 k or less, currently). This makes it possible to simulate
ppc32 icbi instructions in reasonable time, and finally makes the
ppc32 port run at a usable speed.
The scheme is based around partitioning translations into equivalence
classes based on address ranges. For deletions whose range falls
within a single class, all translations intersecting it can be found
by inspecting just that class and one other. Given that there are 256
classes, this cuts the cost, relative to scanning the entire TC, by
approximately half that factor (viz, 128), assuming the translations
are distributed evenly over the classes.
The whole business is more complex and difficult than I would like.
A detailed comment will later be added.
Very thorough sanity checking has been added
(sanity_check_eclasses_in_sector). This is engaged at
--sanity-level=4 and above.
The TT hash function (HASH_TT) has been improved to reduce its
tendency to cluster TT entries in some circumstances. This has
allowed the TT maximum loading factor to be increased from 66% to 80%
and so the absolute size of the TC (in each sector) to be less than
2^16 entries. The latter change is important for the fast-deletion
changes.
A small Cachegrind cleanup: previously it was copying some things (eg.
instr_size and instr_addr) into Ir events, then later copying those into
instrInfo nodes. Now it just allocates the instrInfo nodes earlier and
copies them in directly. This is a bit more concise and easier to
understand.
Cachegrind cleanups:
- Remove some unnecessary assertions.
- Add in some new ones.
- Make things more concise and readable by factoring out things like
"cgs->events[i+1]" into things like "ev2" in flushEvents().
OSet-ified Cachegrind:
- The instrInfoTable was a VgHashTable, now it's an OSet.
- The CC table was a custom 3-level hash table, now it's an OSet. This
is easier to understand and there's no worrying about whether the hash
array sizes are big enough. It also has the nice property that the
results in the cachegrind.out.<pid> file are now sorted, so they're a bit
easier to read.
I did some testing and the performance difference appears to be negligible;
CC table and InstrInfo table lookups and traversal aren't that critical.
Overhaul the way programs are loaded at startup and exec() works. Now the
checking of programs done in these two places are combined, which avoids
duplicate code and greatly reduces the number of cases in which exec()
fails causing Valgrind to bomb out.
Also, we can now load some programs we could not previously, such as scripts
lacking a "#!" line at the start. Also, the startup failure messages for
bad programs match the shell's messages very closely.
And I added a whole bunch of regtests to test all this.
Julian Seward [Wed, 12 Oct 2005 10:09:23 +0000 (10:09 +0000)]
Redo the way cachegrind generates instrumentation code, so that it can
deal with any IR that happens to show up. This makes it work on ppc32
and should fix occasionally-reported bugs on x86/amd64 where it bombs
due to having to deal with multiple date references in a single
instruction.
The new scheme is based around the idea of a queue of memory events
which are outstanding, in the sense that no IR has yet been generated
to do the relevant helper calls. The presence of the queue --
currently 16 entries deep -- gives cachegrind more scope for combining
multiple memory references into a single helper function call. As a
result it runs 3%-5% faster than the previous version, on x86.
This commit also changes the type of the tool interface function
'tool_discard_basic_block_info' and clarifies its meaning. See
comments in include/pub_tool_tooliface.h.