Tom Hughes [Mon, 7 Nov 2005 15:24:38 +0000 (15:24 +0000)]
Dual architecture support - this commit is a major rework of the build
system that allows multiple copies of valgrind to be built so that we
can build both x86 and amd64 versions of the tools on amd64 machines.
The launcher is then modified to look at the program being run and
decide which tool to use to run it.
Tom Hughes [Fri, 4 Nov 2005 16:46:19 +0000 (16:46 +0000)]
Allow rax/rbx/rcx/rdx (and their narrower counterparts) to be used
again but only after the high registers are exhausted or (and this is
the important point) if they are explicitly requested.
Also, don't use r8 as it confuses things as we can't tell an explicit
request from a generic request for a byte register.
Tom Hughes [Fri, 4 Nov 2005 15:36:05 +0000 (15:36 +0000)]
Fix the amd64 version of gen_insn_test.pl to strip any b/w/d suffix
from register names when generating the clobber list as gcc won't
recognise things like r8d but will recognise r8.
This allows us to use the high number integer registers for the tests
which is something Julian asked for ages ago.
Tom Hughes [Fri, 4 Nov 2005 11:31:33 +0000 (11:31 +0000)]
When unwinding the stack on x86/amd64 subtract one from the value of
ip before starting a new pass of the loop.
The reason for this is that (except for the first pass of the loop) the
value of ip is actually a return address, which is therefore after the
instruction that was executing at the time. This means that if there is
a boundary in the CFI information at that point we can wind up using the
wrong CFI data to do the next unwind if we do it based on the return
address.
This most commonly happens with a tail call where we wind up using the
data for the next function to do the unwind and getting hopelessly lost.
Tom Hughes [Wed, 2 Nov 2005 16:15:55 +0000 (16:15 +0000)]
Split faultstatus into the platform independent tests and those
which are x86 specific. The first three x86 specific ones should
work on amd64 as well so I have added those as amd64 tests.
Note that the x86/amd64 tests will still fail as VEX doesn't
always trigger the right sort of signal for faulting instructions
at the moment.
Tom Hughes [Wed, 2 Nov 2005 15:46:07 +0000 (15:46 +0000)]
The memcheck/tests/sigprocmask test is designed to test that we handle
the old style sigprocmask system call correctly without corrupting
memory when we copy out the new (larger) signal mask into the user
provided old (smaller) signal mask.
It therefore makes no sense to run it on amd64 or any other platform
which only has the newer rt_sigprocmask system call, and indeed it
wasn't working because we weren't passing the extra argument which
that call expects.
Tom Hughes [Wed, 2 Nov 2005 14:42:39 +0000 (14:42 +0000)]
The sloppyRXcheck logic in the sync checker was not correct - it was
simply treating R and X as equivalent but the real problem is that
mappings can appear to have X permission entirely indepenent of anything
else with recent x86 kernels.
If a mapping is inside the (deliberately constrained) code segment then
it will appear to have X permission regardless of whether R or X was asked
for when it was mapped, so what we really need to do is allow the kernel
to add X to any mapping but not to take it away if we were expecting it.
Tom Hughes [Mon, 31 Oct 2005 17:05:21 +0000 (17:05 +0000)]
Get core dumping working again - the architecture specific code that
was in the sigframe module has been moved into the coredump module
where it belongs and things fixed up to compiler again.
Julian Seward [Sun, 23 Oct 2005 12:06:55 +0000 (12:06 +0000)]
Don't assume the first statement is an IRMark, since it could instead
be part of a self-check. Instead, copy verbatim any IR preamble
preceding the first IMark. This stops cachegrind asserting on
self-checking translations.
Tom Hughes [Thu, 20 Oct 2005 18:38:08 +0000 (18:38 +0000)]
Don't assert if the DWARF line info reader is given so little data that
it can't even read the length of the block - just report an error as we
do if there isn't enough data for the rest of the block. Fix bug #114757.
Julian Seward [Thu, 20 Oct 2005 01:57:29 +0000 (01:57 +0000)]
Increase the threshold above which new errors are not shown from 300
unique / 30000 total to 1000 unique / 100000 total. Programs are
generally bigger now than 3 years ago.
Julian Seward [Thu, 20 Oct 2005 01:37:15 +0000 (01:37 +0000)]
Remove all remaining references to pointercheck. It's sad to see it
go, but realistically we can't implement it portably, at least without
considerable performance overhead and some additional complexity.
Julian Seward [Thu, 20 Oct 2005 00:31:31 +0000 (00:31 +0000)]
In the spirit of other changes over the past month aimed at supporting
monster-sized programs better, increase the default freelist volume
from 1M to 5M. Maybe even that is too small.
Julian Seward [Wed, 19 Oct 2005 11:23:07 +0000 (11:23 +0000)]
Halve the size of the fast tt lookup cache. This improves ppc32
performance quite a bit, since the cache is emptied quite often on
ppc32, and a smaller cache is less intrusive in the real machine's L2
cache. On x86 the change doesn't seem to have much effect.
Julian Seward [Tue, 18 Oct 2005 12:04:18 +0000 (12:04 +0000)]
Change the core-tool interface so that tools are fully aware of both
the guest extents for the presented translation and also its original
un-redirected guest address. These changes are needed in particular
to make cachegrind's code cache management work properly.
Julian Seward [Tue, 18 Oct 2005 02:30:42 +0000 (02:30 +0000)]
Add extra auxiliary data structures which make it possible to quickly
find and delete all translations intersecting with small address
ranges (8 k or less, currently). This makes it possible to simulate
ppc32 icbi instructions in reasonable time, and finally makes the
ppc32 port run at a usable speed.
The scheme is based around partitioning translations into equivalence
classes based on address ranges. For deletions whose range falls
within a single class, all translations intersecting it can be found
by inspecting just that class and one other. Given that there are 256
classes, this cuts the cost, relative to scanning the entire TC, by
approximately half that factor (viz, 128), assuming the translations
are distributed evenly over the classes.
The whole business is more complex and difficult than I would like.
A detailed comment will later be added.
Very thorough sanity checking has been added
(sanity_check_eclasses_in_sector). This is engaged at
--sanity-level=4 and above.
The TT hash function (HASH_TT) has been improved to reduce its
tendency to cluster TT entries in some circumstances. This has
allowed the TT maximum loading factor to be increased from 66% to 80%
and so the absolute size of the TC (in each sector) to be less than
2^16 entries. The latter change is important for the fast-deletion
changes.
A small Cachegrind cleanup: previously it was copying some things (eg.
instr_size and instr_addr) into Ir events, then later copying those into
instrInfo nodes. Now it just allocates the instrInfo nodes earlier and
copies them in directly. This is a bit more concise and easier to
understand.
Cachegrind cleanups:
- Remove some unnecessary assertions.
- Add in some new ones.
- Make things more concise and readable by factoring out things like
"cgs->events[i+1]" into things like "ev2" in flushEvents().