Add a bunch of asserts to check the results of calls to system malloc().
Assertions are arguably not the right thing here, but the practice is
widespread and we're not planning on making asserts optional, and it's a lot
better than no checking.
2.1.2 is imminent. I've tried to find all the changes since 2.1.1 and
list them here. (Reading 4 months worth of commit logs is sooo
fascinating :-) Please let me know asap of anything I've forgotten or
been erroneous on.
Tom Hughes [Thu, 15 Jul 2004 23:13:37 +0000 (23:13 +0000)]
Implement support for the async I/O system calls in 2.6 kernels. This
requires padding of the address space around calls to io_setup in order
to constrain the kernel's choice of address for the I/O context.
Based on patch from Scott Smith <scott-kde@gelatinous.com> with various
enhancements, this fixes bug #83060.
This commit fixes things so that the client stack can be easily placed
anywhere, even below the client executable, just by changing a single
assignment to VG_(clstk_end). I haven't actually moved the stack, though.
Merged Valgrind's heap and stack. This has two main advantages:
1. It simplifies various things a bit.
2. Valgrind/tools will run out of memory later than currently in many
circumstances. This is good news esp. for Calltree.
Some things were going in V's 128MB heap, and some were going in V's 128MB map
segment. Now all these things are going into a single 256MB map segment.
stage2 has been moved down to 0xb0000000, the start of the 256MB map segment.
The .so files needed by it are placed at 0xb1000000 (that's the map_base).
This required some bootstrapping at startup for memory -- we need to allocate
memory to create the segments skip-list which lets us allocate memory...
solution was to make the first superblock allocated a special static one.
That's pretty simple and enough to get things going.
Removed vg_glibc.c which wasn't doing anything anyway.
Removed VG_(brk) and associated stuff, made all the things that were calling it
call VG_(mmap)() instead.
Removed VG_(valgrind_mmap_end) which was no longer needed.
Rejigged the startup order a bit as necessary.
Moved an important comment from ume.c to vg_main.c where it should be.
A few changes:
- removed an unnecessary VG_(unmap_range)() call in do_brk() -- the
VG_(munmap)() just before it does it anyway.
- inlined mprotect_segment() and munmap_segment() because it's more concise and
easier to understand that way.
- a couple of minor formatting changes
- added and cleaned up a couple of comments
Removed the 'place-holder' behaviour of VG_(mmap). Previously, VG_(mmap) would
add a segment mapping to the segment skip-list, and then often the caller of
VG_(mmap) would do another one for the same segment, just to change the SF_*
flags. Now VG_(mmap) gets passed the appropriate SF_* flags so it can do it
directly. This results in shorter, simpler code, and less work at runtime.
Also, strengthened checking in VG_(mmap), POST(mmap), POST(mmap2) -- now if the
result is not in the right place, it aborts rather than unmapping and
continuing. This is because if it's not in the right place, something has
gone badly wrong.
Minor Makefile.am fix (doesn't actually change behaviour, because automake's
default rules meant 'execve' was being built anyway... but the fix at least
avoids confusion).
Problem was that the malloc-replacing tools (memcheck, addrcheck, massif,
helgrind) would assert if a too-big malloc was attempted. Now they return 0 to
the client. I also cleaned up the code handling heap-block-metadata in Massif
and Addrcheck/Memcheck a little.
This exposed a nasty bug in VG_(client_alloc)() which wasn't checking if
find_map_space() was succeeding before attempting an mmap(). Before I added
the check, very big mallocs (eg 2GB) for Addrcheck were overwriting the client
space at address 0 and causing crashes.
Added a regtest to all the affected skins for this.
Completely overhauled Cachegrind's data structures. With the new
scheme, there are two main structures:
1. The CC table holds a cost centre (CC) for every distinct source code
line, as found using debug/symbol info. It's arranged by files, then
functions, then lines.
2. The instr-info-table holds certain important pieces of info about
each instruction -- instr_addr, instr_size, data_size, its line-CC.
A pointer to the instr's info is passed to the simulation functions,
which is shorter and quicker than passing the pieces individually.
This is nice and simple. Previously, there was a single data structure
(the BBCC table) which mingled the two purposes (maintaining CCs and
caching instruction info). The CC stuff was done at the level of
instructions, and there were different CC types for different kinds of
instructions, and it was pretty yucky. The two simple data structures
together are much less complex than the original single data structure.
As a result, we have the following general improvements:
- Previously, when code was unloaded all its hit/miss counts were stuck
in a single "discard" CC, and so that code would not be annotated. Now
this code is profiled and annotatable just like all other code.
- Source code size is 27% smaller. cg_main.c is now 1472 lines, down
from 2174. Some (1/3?) of this is from removing the special handling
of JIFZ and general compaction, but most is from the data structure
changes. Happily, a lot of the removed code was nasty.
- Object code size (vgskin_cachegrind.so) is 15% smaller.
- cachegrind.out.pid size is about 90+% smaller(!) Annotation time is
accordingly *much* faster. Doing cost-centres at the level of source
code lines rather than instructions makes a big difference, since
there's typically 2--3 instructions per source line. Even better,
when debug info is not present, entire functions (and even files) get
collapsed into a single "???" CC. (This behaviour is no different
to what happened before, it's just the collapsing used to occur in the
annotation script, rather than within Cachegrind.) This is a huge win
for stripped libraries.
- Memory consumption is about 10--20% less, due to fewer CCs.
- Speed is not much changed -- the changes were not in the intensive
parts, so the only likely change is a cache improvement due to using
less memory. SPEC experiments go -3 -- 10% faster, with the "average"
being unchanged or perhaps a tiny bit faster.
I've tested it reasonably thoroughly, it seems extremely similar result
as the old version, which is highly encouraging. (The results aren't
quite the same, because they are so sensitive to memory layout; even
tiny changes to Cachegrind affect the results slightly.)
Some particularly nice changes that happened:
- No longer need an instrumentation prepass; this is because CCs are not
stored grouped by BB, and they're all the same size now. (This makes
various bits of code much simpler than before).
- The actions to take when a BB translation is discarded (due to the
translation table getting full) are much easier -- just chuck all the
instr-info nodes for the BB, without touching the CCs.
- Dumping the cachegrind.out.pid file at the end is much simpler, just
because the CC data structure is much neater.
Some other, specific changes:
- Removed the JIFZ special handling, which never did what it was
intended to do and just complicated things. This changes the results
for REP-prefixed instructions very slightly, but it's not important.
- Abbreviated the FP/MMX/SSE crap by being slightly laxer with size
checking -- not an issue, since this checking was just a pale
imitation of the stricter checking done in codegen anyway.
- Removed "fi" and "fe" handling from cg_annotate, no longer needed due
to neatening of the CC-table.
- Factorised out some code a bit, so fewer monolithic slabs,
particularly in SK_(instrument)().
- Just improved formatting and compacted code in general in various
places.
- Removed the long-commented-out sanity checking code at the bottom.
Add some comments describing the various kinds of magic going on in the
skiplist implementation. Also, fix a bug which allocated way too much memory
for the list head (found by Nick).
Tom Hughes [Sun, 27 Jun 2004 12:48:53 +0000 (12:48 +0000)]
Commit the patch from bug 69508 that seeks to make more of the pthread
stack attribute related functions work properly as it seems to be a
sensible thing to improve even if it isn't enough to get the JVM running
under valgrind now.
Changed (client-heap-size : client-map-seg-size) ratio from 3:1 to 1:2.
As a result, can now mmap much more memory (eg. for Memcheck, 850MB up from
250MB, for Nulgrind 1750MB up from 700MB). The heap is smaller, but that
doesn't matter much, since programs use brk() directly only rarely, and
malloc() falls back on mmap() if brk() fails anyway.
Also changed the debug info printing for memory layout slightly.
Tom Hughes [Sat, 26 Jun 2004 11:27:52 +0000 (11:27 +0000)]
Implement an emulated soft limit for file descriptors in addition to
the current reserved area, which effectively acts as a hard limit. The
setrlimit system call now simply updates the emulated limits as best
as possible - the hard limit is not allowed to move at all and just
returns EPERM if you try and change it.
This should stop reductions in the soft limit causing assertions when
valgrind tries to allocate descriptors from the reserved area.
to be consistent with each other and other options (esp. --input-fd). Also
renamed some related variables. The old names still work, for backwards
compatibility, but they're not documented.
Tom Hughes [Sat, 19 Jun 2004 13:02:34 +0000 (13:02 +0000)]
Don't try and validate the contents of the environment passed to
the execve system call if the envp pointer is null as it causes
valgrind to die with a segmentation fault.
Introduced 4 macros to minimise boilerplate command line processing code.
Nicely cuts around 130 lines of code, spread over the core and several tools.
Tom Hughes [Wed, 16 Jun 2004 20:51:45 +0000 (20:51 +0000)]
Added VG_(cpuid) to replace the various bits of inline assembler used
to query the CPU characteristics as the use of four implicit registers
causes havoc when GCC tries to inline and optimise the assembler.
Fixed up various command line option scenarios:
- If no tool is specified, V now gives a short message and a list of
available tools. This was meant to happen previously, but a bug prevented
it from working properly; it gave the usage message instead.
- If a bad option is given, V now gives a short message rather than the full
--help. This make V consistent with all other programs I looked at.
- Now returning 0 when you do 'valgrind --help' and 'valgrind --version'
as other programs do.
- Removed VG_(startup_logging)() and VG_(shutdown_logging)() as they were
empty and have been for a long time (always?).
- Added various tests for these scenarios. Had to change the regtest
script slightly to allow for malformed command lines.
Fix problem with FC2's vdso (sysinfo) page, which lives at a low,
random address. This gets unmapped as part of the client setup, and
causes syscalls to fail as a result. This patch simply disregards the
sysinfo page. It seems like a blunt fix, but I don't think anything
depends on a sysinfo page.
Tom Hughes [Sun, 13 Jun 2004 12:07:53 +0000 (12:07 +0000)]
When cancelling a thread that is waiting on a condition variable we
need to relock the associated mutex before running the cancellation
handlers.
This patch ensures that the mutex is reaquired in the above case and
also makes pthread_join and pthread_cond_wait act as cancellation points
as required by the POSIX threads standard.
Based on patch from Joseph Link <joelink@joelink.net>.
Tom Hughes [Sun, 13 Jun 2004 09:59:02 +0000 (09:59 +0000)]
Add support for separate debug files, which are just separate ELF files
containing the relevant debug sections and located using the information
in the .gnu_debuglink section of the main file along with some search
rules and checksum logic borrowed from binutils/gdb.
Tom Hughes [Sat, 12 Jun 2004 12:58:22 +0000 (12:58 +0000)]
It appears that NPTL uses a new system for dealing with cleanup
handlers when a thread is cancelled which has the side effect that
programs linked with librt fail on Fedora Core 2 due to librt having
been built against the NPTL header instead of the old pthread headers.
This change extends valgrind's libpthread.so to handle both the old
and new style cleanup handlers in a similar way to NPTL and seems to
be sufficient to get programs linked with librt working again.
Tom Hughes [Fri, 4 Jun 2004 21:42:18 +0000 (21:42 +0000)]
There is no __accept in any libc or libpthread that I can find so
it isn't clear why we were intercepting that and only aliasing accept
to it. Switched to intercepting accept directly instead.