Julian Seward [Sun, 21 May 2006 01:02:31 +0000 (01:02 +0000)]
A couple of IR simplification hacks for the amd64 front end, so as to
avoid false errors from memcheck. Analogous to some of the recent
bunch of commits to x86 front end.
Julian Seward [Sat, 20 May 2006 01:13:38 +0000 (01:13 +0000)]
Change the default load address on all platforms to be 7/8s of a G.
This should make V work on any address space setup in which at least
the first 1 G of address space is usable.
Julian Seward [Sun, 14 May 2006 18:46:55 +0000 (18:46 +0000)]
Add an IR folding rule to convert Add32(x,x) into Shl32(x,1). This
fixes #118466 and it also gets rid of a bunch of false positives for
KDE 3.5.2 built by gcc-4.0.2 on x86, of the form shown below.
Use of uninitialised value of size 4
at 0x4BFC342: QIconSet::pixmap(QIconSet::Size, QIconSet::Mode,
QIconSet::State) const (qiconset.cpp:530)
by 0x4555BE7: KToolBarButton::drawButton(QPainter*)
(ktoolbarbutton.cpp:536)
by 0x4CB8A0A: QButton::paintEvent(QPaintEvent*) (qbutton.cpp:887)
Julian Seward [Sat, 13 May 2006 23:08:06 +0000 (23:08 +0000)]
Add specialisation rules to simplify the IR for 'testl .. ; js ..',
'testw .. ; js ..' and 'testb .. ; js ..'. This gets rid of a bunch of
false errors in Memcheck of the form
==2398== Conditional jump or move depends on uninitialised value(s)
==2398== at 0x6C51B61: KHTMLPart::clear() (khtml_part.cpp:1370)
==2398== by 0x6C61A72: KHTMLPart::begin(KURL const&, int, int)
(khtml_part.cpp:1881)
Julian Seward [Sun, 7 May 2006 14:37:03 +0000 (14:37 +0000)]
Increase scheduling quantum to 100k basic blocks. Nowadays even
modest cpus can run 5-10M memcheck'd bbs per second and the previous
limit of 50k gives a 100Hz switch rate, which causes cache pollution
(a known performance problem) and other context-switch overheads.
Julian Seward [Wed, 3 May 2006 22:13:57 +0000 (22:13 +0000)]
Vectorise copy_address_range_perms for common cases. This gives about
40% speedup on artificial programs which just do realloc() and nothing
else, and about a 3-4% speedup on starting kpresenter-1.5.0 and
loading a 16-slide presentation.
Julian Seward [Mon, 1 May 2006 02:14:17 +0000 (02:14 +0000)]
Counterpart to r1605: in the ppc insn selector, don't use the bits
VexArchInfo.hwcaps to distinguish ppc32 and ppc64. Instead pass
the host arch around. And associated plumbing.
Make VG_(run_innerloop) visible for outer Valgrinds
with self hosting. Without this, the symbol has
size 0 and type NOT, and is ignored by the symbol loader.
Callgrind: Improve self-hosting with outer callgrind tool
This adds an option to change the default handling of jumps
between functions. Usually, a jump between functions is
interpreted as call, because such jumps are typically
generated by compilers on tail recursion optimization, and
we want to present this as call to the user. Thus, such
a jump pushes a call onto callgrinds shadow stack.
The option "--pop-on-jump" changes this to pop+push the
shadow callstack: then, a jump between functions is seen
as a return to the caller and a new call.
The default behaviour is _bad_ for using callgrind with
self-hosting. Valgrinds inner loop VG_(run_innerloop)
jumps to generated code, and this code jumps back to
the inner loop. Thus, every executed BB adds 2 calls
to an ever increasing shadow call stack, leading to
memory consumption increasing with runtime :-(
So: For self-hosting valgrind with an outer callgrind,
always use option "--pop-on-jump" for the outer callgrind.
Don't use the bits VexArchInfo.hwcaps to distinguish ppc32 and ppc64,
since that doesn't work properly. Instead pass the guest arch around
too. Small change with lots of associated plumbing.
Fix completely bogus asm, which didn't work when compiled with gcc-4.1.0
since it trashed the regs that gcc assigned for %0 and %1 before reading
them. local_sys_write_stderr() for the 3 other targets suffer from the
same problem.
Another fix for interactive control, together with
the --base option, which allows to specify another
directory for dumps and control/result files.
With "--base=/tmp/foo", we want control/result files
in "/tmp", and not in a directory "/tmp/foo/".
- callgrind_control was not working, because it checks the
"command syntax version" to be at most 1 before doing anything.
But callgrind used Valgrinds version for this (3.2.0). Now we
define a separate version COMMAND_VERSION for the syntax format
of control and result files.
Strictly, such a version is not needed for interaction of
callgrind and the script callgrind_control itself, as they are
delivered in the same package. But there are also external
controlling tools (most notable KCachegrind)
- Some systems make it difficult for callgrind_control to
automatically detect running callgrind processes. To make
interactivity work, one has to provide the cwd with -w.
For commands expecting a result from callgrind, this result
was delivered in the wrong result file.
- Fix indentation in one section of Cachegrind
- In the same section, use VG_(percentify) to avoid overflow when computing
information for -v printing.
Recent GCCs (3.4+ at least) optimize static unused functions out, so
making VALGRIND_PRINTF and VALGRIND_PRINTF_BACKTRACE static and
attribute unused proved to be much better than always compiling it as
exported weak function. (Jakub Jelinek)
Add a suppression for yet another glibc string function: __strcpy_chk.
We really ought intercept/replace this, and that would be easy, except
__strcpy_chk uses __chk_fail and I haven't figured out what that
is/does.
On x86, don't use x87 registers for 8-byte FP loads/stores; instead
use an mmx register (which is the same thing in disguise) since mmx
loads/stores are guaranteed to be the identity. This should fix
failures of this test on x86-linux.
Fold in a patch which appeared in FC5's default valgrind build, which
causes V to ignore more DWARF3 CFA expressions on amd64 and so gets
rid of complaints from the CFA reader. Why didn't Red Hat push this
patch upstream? I don't know.