Cleaned up vg_include.h:
- removed various things that are no longer used
- made (module-)local some things that were global
- improved the formatting in places
Removed about 160 lines of code, and non-trivially reduced the number
of global entities.
Restructured the as_*() functions so they are simpler and there is no implicit
global state -- the state is threaded explicitly through via function arguments
and return values. ume.c now has no global variables, which is nice.
Also removed a redundant as_pad() call in stage2's main() which meant
layout_client_space() could be merged with layout_remaining_space().
Also removed a couple of no-longer-used variables and #defines.
Tom Hughes [Thu, 29 Jul 2004 22:40:07 +0000 (22:40 +0000)]
Modify the ipc system call so that only those calls which may block
are treated as blocking.
This fixes bug #86000 because shmat is no longer treated as blocking
and it is therefore no longer possible for two threads to try and use
the same address for the shared memory segment.
Tom Hughes [Thu, 29 Jul 2004 17:44:23 +0000 (17:44 +0000)]
Add support for allowing the POST function for a system call to be called
even when the system call fails, and allow the PRE function to modify the
system call flags.
Also fix nanosleep so that it only marks the returned time as defined
if the system call exited with EINTR due to be interrupted.
Fix the skiplist brokenness Nick found:
- use a simple memset to initialize the next pointer vector
- fix some previously unexercised code as a result of the above
Still haven't verified we're actually getting skipping, but it doesn't
crash with a make regtest.
Er, actually make this test meaningful. It now aborts correctly if you try to
launch stage2 directly, rather than giving an obscure error about the tool
later on.
Tom Hughes [Sun, 25 Jul 2004 15:43:00 +0000 (15:43 +0000)]
Move the decoding of SFENCE out of the SSE only part of disInstr as it
exists on Athlon's that have MMXEXT support and those don't have SSE state
so won't decode it where it was.
Fix extremely obscure bug in xadd picked up by QEMU's test suite. The
(almost useless) instruction "xadd %reg,%reg" gave the wrong answer
due to a subtlety of the order in which the destination registers are
PUTted to.
gcc sometimes generates "sbbl %reg,%reg" to convert the carry flag
into 0 or -1 in reg. This has no actual dependency on reg, but
memcheck can't see that, and so will yelp if reg contains garbage. A
simple fix is to put zero into reg before we start, zapping any
undefinedness it might otherwise contain.
Add a bunch of asserts to check the results of calls to system malloc().
Assertions are arguably not the right thing here, but the practice is
widespread and we're not planning on making asserts optional, and it's a lot
better than no checking.
2.1.2 is imminent. I've tried to find all the changes since 2.1.1 and
list them here. (Reading 4 months worth of commit logs is sooo
fascinating :-) Please let me know asap of anything I've forgotten or
been erroneous on.
Tom Hughes [Thu, 15 Jul 2004 23:13:37 +0000 (23:13 +0000)]
Implement support for the async I/O system calls in 2.6 kernels. This
requires padding of the address space around calls to io_setup in order
to constrain the kernel's choice of address for the I/O context.
Based on patch from Scott Smith <scott-kde@gelatinous.com> with various
enhancements, this fixes bug #83060.
This commit fixes things so that the client stack can be easily placed
anywhere, even below the client executable, just by changing a single
assignment to VG_(clstk_end). I haven't actually moved the stack, though.
Merged Valgrind's heap and stack. This has two main advantages:
1. It simplifies various things a bit.
2. Valgrind/tools will run out of memory later than currently in many
circumstances. This is good news esp. for Calltree.
Some things were going in V's 128MB heap, and some were going in V's 128MB map
segment. Now all these things are going into a single 256MB map segment.
stage2 has been moved down to 0xb0000000, the start of the 256MB map segment.
The .so files needed by it are placed at 0xb1000000 (that's the map_base).
This required some bootstrapping at startup for memory -- we need to allocate
memory to create the segments skip-list which lets us allocate memory...
solution was to make the first superblock allocated a special static one.
That's pretty simple and enough to get things going.
Removed vg_glibc.c which wasn't doing anything anyway.
Removed VG_(brk) and associated stuff, made all the things that were calling it
call VG_(mmap)() instead.
Removed VG_(valgrind_mmap_end) which was no longer needed.
Rejigged the startup order a bit as necessary.
Moved an important comment from ume.c to vg_main.c where it should be.
A few changes:
- removed an unnecessary VG_(unmap_range)() call in do_brk() -- the
VG_(munmap)() just before it does it anyway.
- inlined mprotect_segment() and munmap_segment() because it's more concise and
easier to understand that way.
- a couple of minor formatting changes
- added and cleaned up a couple of comments
Removed the 'place-holder' behaviour of VG_(mmap). Previously, VG_(mmap) would
add a segment mapping to the segment skip-list, and then often the caller of
VG_(mmap) would do another one for the same segment, just to change the SF_*
flags. Now VG_(mmap) gets passed the appropriate SF_* flags so it can do it
directly. This results in shorter, simpler code, and less work at runtime.
Also, strengthened checking in VG_(mmap), POST(mmap), POST(mmap2) -- now if the
result is not in the right place, it aborts rather than unmapping and
continuing. This is because if it's not in the right place, something has
gone badly wrong.
Minor Makefile.am fix (doesn't actually change behaviour, because automake's
default rules meant 'execve' was being built anyway... but the fix at least
avoids confusion).
Problem was that the malloc-replacing tools (memcheck, addrcheck, massif,
helgrind) would assert if a too-big malloc was attempted. Now they return 0 to
the client. I also cleaned up the code handling heap-block-metadata in Massif
and Addrcheck/Memcheck a little.
This exposed a nasty bug in VG_(client_alloc)() which wasn't checking if
find_map_space() was succeeding before attempting an mmap(). Before I added
the check, very big mallocs (eg 2GB) for Addrcheck were overwriting the client
space at address 0 and causing crashes.
Added a regtest to all the affected skins for this.
Completely overhauled Cachegrind's data structures. With the new
scheme, there are two main structures:
1. The CC table holds a cost centre (CC) for every distinct source code
line, as found using debug/symbol info. It's arranged by files, then
functions, then lines.
2. The instr-info-table holds certain important pieces of info about
each instruction -- instr_addr, instr_size, data_size, its line-CC.
A pointer to the instr's info is passed to the simulation functions,
which is shorter and quicker than passing the pieces individually.
This is nice and simple. Previously, there was a single data structure
(the BBCC table) which mingled the two purposes (maintaining CCs and
caching instruction info). The CC stuff was done at the level of
instructions, and there were different CC types for different kinds of
instructions, and it was pretty yucky. The two simple data structures
together are much less complex than the original single data structure.
As a result, we have the following general improvements:
- Previously, when code was unloaded all its hit/miss counts were stuck
in a single "discard" CC, and so that code would not be annotated. Now
this code is profiled and annotatable just like all other code.
- Source code size is 27% smaller. cg_main.c is now 1472 lines, down
from 2174. Some (1/3?) of this is from removing the special handling
of JIFZ and general compaction, but most is from the data structure
changes. Happily, a lot of the removed code was nasty.
- Object code size (vgskin_cachegrind.so) is 15% smaller.
- cachegrind.out.pid size is about 90+% smaller(!) Annotation time is
accordingly *much* faster. Doing cost-centres at the level of source
code lines rather than instructions makes a big difference, since
there's typically 2--3 instructions per source line. Even better,
when debug info is not present, entire functions (and even files) get
collapsed into a single "???" CC. (This behaviour is no different
to what happened before, it's just the collapsing used to occur in the
annotation script, rather than within Cachegrind.) This is a huge win
for stripped libraries.
- Memory consumption is about 10--20% less, due to fewer CCs.
- Speed is not much changed -- the changes were not in the intensive
parts, so the only likely change is a cache improvement due to using
less memory. SPEC experiments go -3 -- 10% faster, with the "average"
being unchanged or perhaps a tiny bit faster.
I've tested it reasonably thoroughly, it seems extremely similar result
as the old version, which is highly encouraging. (The results aren't
quite the same, because they are so sensitive to memory layout; even
tiny changes to Cachegrind affect the results slightly.)
Some particularly nice changes that happened:
- No longer need an instrumentation prepass; this is because CCs are not
stored grouped by BB, and they're all the same size now. (This makes
various bits of code much simpler than before).
- The actions to take when a BB translation is discarded (due to the
translation table getting full) are much easier -- just chuck all the
instr-info nodes for the BB, without touching the CCs.
- Dumping the cachegrind.out.pid file at the end is much simpler, just
because the CC data structure is much neater.
Some other, specific changes:
- Removed the JIFZ special handling, which never did what it was
intended to do and just complicated things. This changes the results
for REP-prefixed instructions very slightly, but it's not important.
- Abbreviated the FP/MMX/SSE crap by being slightly laxer with size
checking -- not an issue, since this checking was just a pale
imitation of the stricter checking done in codegen anyway.
- Removed "fi" and "fe" handling from cg_annotate, no longer needed due
to neatening of the CC-table.
- Factorised out some code a bit, so fewer monolithic slabs,
particularly in SK_(instrument)().
- Just improved formatting and compacted code in general in various
places.
- Removed the long-commented-out sanity checking code at the bottom.
Add some comments describing the various kinds of magic going on in the
skiplist implementation. Also, fix a bug which allocated way too much memory
for the list head (found by Nick).
Tom Hughes [Sun, 27 Jun 2004 12:48:53 +0000 (12:48 +0000)]
Commit the patch from bug 69508 that seeks to make more of the pthread
stack attribute related functions work properly as it seems to be a
sensible thing to improve even if it isn't enough to get the JVM running
under valgrind now.