Julian Seward [Sun, 7 May 2006 14:37:03 +0000 (14:37 +0000)]
Increase scheduling quantum to 100k basic blocks. Nowadays even
modest cpus can run 5-10M memcheck'd bbs per second and the previous
limit of 50k gives a 100Hz switch rate, which causes cache pollution
(a known performance problem) and other context-switch overheads.
Julian Seward [Wed, 3 May 2006 22:13:57 +0000 (22:13 +0000)]
Vectorise copy_address_range_perms for common cases. This gives about
40% speedup on artificial programs which just do realloc() and nothing
else, and about a 3-4% speedup on starting kpresenter-1.5.0 and
loading a 16-slide presentation.
Make VG_(run_innerloop) visible for outer Valgrinds
with self hosting. Without this, the symbol has
size 0 and type NOT, and is ignored by the symbol loader.
Callgrind: Improve self-hosting with outer callgrind tool
This adds an option to change the default handling of jumps
between functions. Usually, a jump between functions is
interpreted as call, because such jumps are typically
generated by compilers on tail recursion optimization, and
we want to present this as call to the user. Thus, such
a jump pushes a call onto callgrinds shadow stack.
The option "--pop-on-jump" changes this to pop+push the
shadow callstack: then, a jump between functions is seen
as a return to the caller and a new call.
The default behaviour is _bad_ for using callgrind with
self-hosting. Valgrinds inner loop VG_(run_innerloop)
jumps to generated code, and this code jumps back to
the inner loop. Thus, every executed BB adds 2 calls
to an ever increasing shadow call stack, leading to
memory consumption increasing with runtime :-(
So: For self-hosting valgrind with an outer callgrind,
always use option "--pop-on-jump" for the outer callgrind.
Fix completely bogus asm, which didn't work when compiled with gcc-4.1.0
since it trashed the regs that gcc assigned for %0 and %1 before reading
them. local_sys_write_stderr() for the 3 other targets suffer from the
same problem.
Another fix for interactive control, together with
the --base option, which allows to specify another
directory for dumps and control/result files.
With "--base=/tmp/foo", we want control/result files
in "/tmp", and not in a directory "/tmp/foo/".
- callgrind_control was not working, because it checks the
"command syntax version" to be at most 1 before doing anything.
But callgrind used Valgrinds version for this (3.2.0). Now we
define a separate version COMMAND_VERSION for the syntax format
of control and result files.
Strictly, such a version is not needed for interaction of
callgrind and the script callgrind_control itself, as they are
delivered in the same package. But there are also external
controlling tools (most notable KCachegrind)
- Some systems make it difficult for callgrind_control to
automatically detect running callgrind processes. To make
interactivity work, one has to provide the cwd with -w.
For commands expecting a result from callgrind, this result
was delivered in the wrong result file.
- Fix indentation in one section of Cachegrind
- In the same section, use VG_(percentify) to avoid overflow when computing
information for -v printing.
Recent GCCs (3.4+ at least) optimize static unused functions out, so
making VALGRIND_PRINTF and VALGRIND_PRINTF_BACKTRACE static and
attribute unused proved to be much better than always compiling it as
exported weak function. (Jakub Jelinek)
Add a suppression for yet another glibc string function: __strcpy_chk.
We really ought intercept/replace this, and that would be easy, except
__strcpy_chk uses __chk_fail and I haven't figured out what that
is/does.
On x86, don't use x87 registers for 8-byte FP loads/stores; instead
use an mmx register (which is the same thing in disguise) since mmx
loads/stores are guaranteed to be the identity. This should fix
failures of this test on x86-linux.
Fold in a patch which appeared in FC5's default valgrind build, which
causes V to ignore more DWARF3 CFA expressions on amd64 and so gets
rid of complaints from the CFA reader. Why didn't Red Hat push this
patch upstream? I don't know.
Oops: when adding translations to the auxiliary transtab, don't forget to
ensure D-I cache coherence. Fixes SIGILLs in fn wrapping failures on low end
PowerPCs.
Tweaked Lackey. Main change is that the default instrumentation is now only
added if you specify --basic-counts=yes (which is the default). So
all of the instrumentation is now controlled by a command-line option (one
of --basic-counts, --detailed-counts or --trace-mem) and so if you turn them
all off it behaves like Nulgrind. This makes it clearer what's going on and
easier for newbies to modify.
Minor scheduler tidyings:
- rename os_thread_t to ThreadOSstate
- remove unused ThreadState.syscall_result_set field
- fix some comments
- add an assertion in VG_(scheduler_init)
ppc32-linux: work around assemblers which can't do Altivec, by
emitting the relevant instruction directly. Fixes a build problem on
Debian 3.1 (ppc32).
Tom Hughes [Mon, 3 Apr 2006 16:37:30 +0000 (16:37 +0000)]
Don't use the presence of a filename to decide if a segment in the
initial /proc/self/maps to decide if the segment is an AnonV or FileV
segment as some systems don't report the filename. Use the device
and inode numbers instead. Fixes bug #124528.
Another shadow memory test. This one does a huge number of loads and
stores of char/short/int/int64/double at random offsets and hence
alignments in an array. It does it in a way in which the computation
just computes the expected V bits, and hence can check whether these
seem correct.
Partial fix for the sh-mem.c failure on PPC32. This should make it work
on PPC32 now but break it on the other platforms. Julian will commit a
change to ensure the 32-bit floats are copied through the FP regs on all
platforms to make the broken ones work again.
Simple regression test for callgrind:
run a custom client request.
By doing this, I found out that callgrind.h still defined
client requests for VG 2 :-( Obviously, nobody was using
them. This is fixed, and other small things to make the
test run, too.
(and likewise for the upper-case versions for client request macros).
The old MAKE_* and CHECK_* macros still work for backwards compatibility.
This is much better, because the old names were subtly misleading. For
example:
- "readable" really meant "readable and writable".
- "writable" really meant "writable and maybe readable, depending on how
the read value is used".
- "check_writable" really meant "check writable or readable"
The new names avoid these problems.
The recently-added macro which was called MAKE_DEFINED is now
MAKE_MEM_DEFINED_IF_ADDRESSABLE.
I also corrected the spelling of "addressable" in numerous places in
memcheck.h.