Tom Hughes [Sun, 25 Jul 2004 15:43:00 +0000 (15:43 +0000)]
Move the decoding of SFENCE out of the SSE only part of disInstr as it
exists on Athlon's that have MMXEXT support and those don't have SSE state
so won't decode it where it was.
Fix extremely obscure bug in xadd picked up by QEMU's test suite. The
(almost useless) instruction "xadd %reg,%reg" gave the wrong answer
due to a subtlety of the order in which the destination registers are
PUTted to.
gcc sometimes generates "sbbl %reg,%reg" to convert the carry flag
into 0 or -1 in reg. This has no actual dependency on reg, but
memcheck can't see that, and so will yelp if reg contains garbage. A
simple fix is to put zero into reg before we start, zapping any
undefinedness it might otherwise contain.
Add a bunch of asserts to check the results of calls to system malloc().
Assertions are arguably not the right thing here, but the practice is
widespread and we're not planning on making asserts optional, and it's a lot
better than no checking.
2.1.2 is imminent. I've tried to find all the changes since 2.1.1 and
list them here. (Reading 4 months worth of commit logs is sooo
fascinating :-) Please let me know asap of anything I've forgotten or
been erroneous on.
Tom Hughes [Thu, 15 Jul 2004 23:13:37 +0000 (23:13 +0000)]
Implement support for the async I/O system calls in 2.6 kernels. This
requires padding of the address space around calls to io_setup in order
to constrain the kernel's choice of address for the I/O context.
Based on patch from Scott Smith <scott-kde@gelatinous.com> with various
enhancements, this fixes bug #83060.
This commit fixes things so that the client stack can be easily placed
anywhere, even below the client executable, just by changing a single
assignment to VG_(clstk_end). I haven't actually moved the stack, though.
Merged Valgrind's heap and stack. This has two main advantages:
1. It simplifies various things a bit.
2. Valgrind/tools will run out of memory later than currently in many
circumstances. This is good news esp. for Calltree.
Some things were going in V's 128MB heap, and some were going in V's 128MB map
segment. Now all these things are going into a single 256MB map segment.
stage2 has been moved down to 0xb0000000, the start of the 256MB map segment.
The .so files needed by it are placed at 0xb1000000 (that's the map_base).
This required some bootstrapping at startup for memory -- we need to allocate
memory to create the segments skip-list which lets us allocate memory...
solution was to make the first superblock allocated a special static one.
That's pretty simple and enough to get things going.
Removed vg_glibc.c which wasn't doing anything anyway.
Removed VG_(brk) and associated stuff, made all the things that were calling it
call VG_(mmap)() instead.
Removed VG_(valgrind_mmap_end) which was no longer needed.
Rejigged the startup order a bit as necessary.
Moved an important comment from ume.c to vg_main.c where it should be.
A few changes:
- removed an unnecessary VG_(unmap_range)() call in do_brk() -- the
VG_(munmap)() just before it does it anyway.
- inlined mprotect_segment() and munmap_segment() because it's more concise and
easier to understand that way.
- a couple of minor formatting changes
- added and cleaned up a couple of comments
Removed the 'place-holder' behaviour of VG_(mmap). Previously, VG_(mmap) would
add a segment mapping to the segment skip-list, and then often the caller of
VG_(mmap) would do another one for the same segment, just to change the SF_*
flags. Now VG_(mmap) gets passed the appropriate SF_* flags so it can do it
directly. This results in shorter, simpler code, and less work at runtime.
Also, strengthened checking in VG_(mmap), POST(mmap), POST(mmap2) -- now if the
result is not in the right place, it aborts rather than unmapping and
continuing. This is because if it's not in the right place, something has
gone badly wrong.
Minor Makefile.am fix (doesn't actually change behaviour, because automake's
default rules meant 'execve' was being built anyway... but the fix at least
avoids confusion).
Problem was that the malloc-replacing tools (memcheck, addrcheck, massif,
helgrind) would assert if a too-big malloc was attempted. Now they return 0 to
the client. I also cleaned up the code handling heap-block-metadata in Massif
and Addrcheck/Memcheck a little.
This exposed a nasty bug in VG_(client_alloc)() which wasn't checking if
find_map_space() was succeeding before attempting an mmap(). Before I added
the check, very big mallocs (eg 2GB) for Addrcheck were overwriting the client
space at address 0 and causing crashes.
Added a regtest to all the affected skins for this.
Completely overhauled Cachegrind's data structures. With the new
scheme, there are two main structures:
1. The CC table holds a cost centre (CC) for every distinct source code
line, as found using debug/symbol info. It's arranged by files, then
functions, then lines.
2. The instr-info-table holds certain important pieces of info about
each instruction -- instr_addr, instr_size, data_size, its line-CC.
A pointer to the instr's info is passed to the simulation functions,
which is shorter and quicker than passing the pieces individually.
This is nice and simple. Previously, there was a single data structure
(the BBCC table) which mingled the two purposes (maintaining CCs and
caching instruction info). The CC stuff was done at the level of
instructions, and there were different CC types for different kinds of
instructions, and it was pretty yucky. The two simple data structures
together are much less complex than the original single data structure.
As a result, we have the following general improvements:
- Previously, when code was unloaded all its hit/miss counts were stuck
in a single "discard" CC, and so that code would not be annotated. Now
this code is profiled and annotatable just like all other code.
- Source code size is 27% smaller. cg_main.c is now 1472 lines, down
from 2174. Some (1/3?) of this is from removing the special handling
of JIFZ and general compaction, but most is from the data structure
changes. Happily, a lot of the removed code was nasty.
- Object code size (vgskin_cachegrind.so) is 15% smaller.
- cachegrind.out.pid size is about 90+% smaller(!) Annotation time is
accordingly *much* faster. Doing cost-centres at the level of source
code lines rather than instructions makes a big difference, since
there's typically 2--3 instructions per source line. Even better,
when debug info is not present, entire functions (and even files) get
collapsed into a single "???" CC. (This behaviour is no different
to what happened before, it's just the collapsing used to occur in the
annotation script, rather than within Cachegrind.) This is a huge win
for stripped libraries.
- Memory consumption is about 10--20% less, due to fewer CCs.
- Speed is not much changed -- the changes were not in the intensive
parts, so the only likely change is a cache improvement due to using
less memory. SPEC experiments go -3 -- 10% faster, with the "average"
being unchanged or perhaps a tiny bit faster.
I've tested it reasonably thoroughly, it seems extremely similar result
as the old version, which is highly encouraging. (The results aren't
quite the same, because they are so sensitive to memory layout; even
tiny changes to Cachegrind affect the results slightly.)
Some particularly nice changes that happened:
- No longer need an instrumentation prepass; this is because CCs are not
stored grouped by BB, and they're all the same size now. (This makes
various bits of code much simpler than before).
- The actions to take when a BB translation is discarded (due to the
translation table getting full) are much easier -- just chuck all the
instr-info nodes for the BB, without touching the CCs.
- Dumping the cachegrind.out.pid file at the end is much simpler, just
because the CC data structure is much neater.
Some other, specific changes:
- Removed the JIFZ special handling, which never did what it was
intended to do and just complicated things. This changes the results
for REP-prefixed instructions very slightly, but it's not important.
- Abbreviated the FP/MMX/SSE crap by being slightly laxer with size
checking -- not an issue, since this checking was just a pale
imitation of the stricter checking done in codegen anyway.
- Removed "fi" and "fe" handling from cg_annotate, no longer needed due
to neatening of the CC-table.
- Factorised out some code a bit, so fewer monolithic slabs,
particularly in SK_(instrument)().
- Just improved formatting and compacted code in general in various
places.
- Removed the long-commented-out sanity checking code at the bottom.
Add some comments describing the various kinds of magic going on in the
skiplist implementation. Also, fix a bug which allocated way too much memory
for the list head (found by Nick).
Tom Hughes [Sun, 27 Jun 2004 12:48:53 +0000 (12:48 +0000)]
Commit the patch from bug 69508 that seeks to make more of the pthread
stack attribute related functions work properly as it seems to be a
sensible thing to improve even if it isn't enough to get the JVM running
under valgrind now.
Changed (client-heap-size : client-map-seg-size) ratio from 3:1 to 1:2.
As a result, can now mmap much more memory (eg. for Memcheck, 850MB up from
250MB, for Nulgrind 1750MB up from 700MB). The heap is smaller, but that
doesn't matter much, since programs use brk() directly only rarely, and
malloc() falls back on mmap() if brk() fails anyway.
Also changed the debug info printing for memory layout slightly.
Tom Hughes [Sat, 26 Jun 2004 11:27:52 +0000 (11:27 +0000)]
Implement an emulated soft limit for file descriptors in addition to
the current reserved area, which effectively acts as a hard limit. The
setrlimit system call now simply updates the emulated limits as best
as possible - the hard limit is not allowed to move at all and just
returns EPERM if you try and change it.
This should stop reductions in the soft limit causing assertions when
valgrind tries to allocate descriptors from the reserved area.
to be consistent with each other and other options (esp. --input-fd). Also
renamed some related variables. The old names still work, for backwards
compatibility, but they're not documented.
Tom Hughes [Sat, 19 Jun 2004 13:02:34 +0000 (13:02 +0000)]
Don't try and validate the contents of the environment passed to
the execve system call if the envp pointer is null as it causes
valgrind to die with a segmentation fault.