Julian Seward [Sun, 18 Dec 2005 02:37:50 +0000 (02:37 +0000)]
When using a custom allocator that allocates with no intervening
blocks, the <= relation is the correct one. In effect asserting <
constitutes an off-by-one error.
Julian Seward [Sat, 17 Dec 2005 20:37:36 +0000 (20:37 +0000)]
findSb: gradually rearrange the superblock list to bring frequently
accessed blocks closer to the front. This speeds up malloc/free
intensive programs because evidently those searches cause a lot of
cache misses (so cachegrind tells us). For perf/heap.c on P4
Northwood, this halves the run-time (!) from 85.8 to 42.9 seconds.
For "real" code (start/exit ktuberling) there is a small but
worthwhile performance gain, of about 2 seconds out of 95.
Improvments to vg_perf:
- show percentage speedup over the first Valgrind when comparing multiple
Valgrind
- don't accept --reps < 0
- avoid div-by-zero if the runtime is measured as zero
Julian Seward [Thu, 15 Dec 2005 14:07:07 +0000 (14:07 +0000)]
- Track vex r1494 (x86/amd64 change of conventions for getting
to translations and back to dispatcher, and also different arg
passing conventions to LibVEX_Translate).
- Rewrite x86 dispatcher to not increment the profiling counters
unless requested by the user. This dramatically reduces the
D1 miss rate and gives considerable performance improvement
on x86. Also, restructure and add comments to dispatch-x86-linux.S
to make it much easier to follow (imo).
Added fp regtest
- needed some hackery to get around VEX's loss of accuracy.
------------------------------
Added test for fsqrt (fp square root)
Enabled stfs(u)(x) (fp single-precision stores)
- VEX implementation not great: ends up rounding twice, losing
accuracy, but is good enough for this test's small fp argument array.
Changed fp arg setup
- no denormals (for VEX inaccuracy)
All fp tests
- don't print CR, XER flags, as VEX doesn't set them.
3 arg fp arith tests (fp 'multiply and add' etc)
- no 'special' fp vals (for VEX inaccuracy)
- zap lo byte (for VEX inaccuracy)
fctiw, fctiwz (fp convert to int)
- zap high 32bits of result (is undefined)
Changed jm_insns.c usage to use one of flags 'i|f|a' to run int|fp|av insns respectively.
Removed integer test insns for jm-vmx.vgtest - already tested in jm-int.vgtest
First attempt at some performance tracking tools. Includes a script vg_perf
(use "make perf" to run) that executes test programs and times their
slowdowns under various tools. It works a lot like the vg_regtest script.
It's a bit rough around the edges -- eg. you can't currently directly
compare two different versions of Valgrind, which would be useful -- but it
is a good start.
There are currently two test programs in perf/. More will be added as time
goes on. This stuff will be built on so that performance changes can be
tracked over time.
Fix minor Cachegrind bug that was occasionally causing misattributions of
counts when a function name was used in more than one module. This showed
up for "???" functions when profiling Valgrind itself.
Take ppc64 startup further along the road
- fixed launcher.c to recognise ppc32/64-linux platforms properly
- lots of assembly fixes to handle func descriptors, toc references, 64bit regs.
- fixed var types in vki-ppc64-linux
Now gets as far as VG_(translate), but dies from a case of invalid orig_addr.
Julian Seward [Sun, 4 Dec 2005 23:27:14 +0000 (23:27 +0000)]
Defensive hacks to detect cases where V corrupts its own heap and/or
uses memory after freeing. Check the redzones for all non-client
frees, and fill all non-client freed areas with garbage. Unroll
VG_(memset) as a precautionary measure against performance lossage.
Julian Seward [Sun, 4 Dec 2005 15:00:06 +0000 (15:00 +0000)]
Now that the man page is built from the XML documentation masters, it
has to have the same status as the HTML/PDF/PS docs, that is, not
built by default because it depends on the ultra-fragile XML
toolchain. So make it use the same hacks, that is, build only at
'make dist' time.
Donna Robinson [Sun, 27 Nov 2005 04:10:00 +0000 (04:10 +0000)]
Post-release changes:
- removed a reference to cachegrind dot org from mc-tech-docs.xml
- in an effort to simplify future borked links, replaced all valgrind
website urls with entities so now we just have to change one string.
- new stylesheet to create the docs to 'fit' into the website
- added build rules 'make website-docs' + make download-docs
to /docs/Makefile.am
Donna Robinson [Fri, 25 Nov 2005 05:36:48 +0000 (05:36 +0000)]
Due to package upgrades (docbook, passivetex), removed some
bug-patches and created some new ones in the stylesheets.
Also tweaked some files to structure the xml properly.
The FAQ and the Quick-Start are now 'articles' inside a book-wrapper,
which is as it should be.
FAQ.xml
- due to various passivetex bug fixes, the faq is now a properly
structured xml qandaset document
quick-start-guide.xml:
manual.xml
- Fixed some passivetex-workaround kludges:
legalnotice -> author
manual-core.xml:
For readability, added '<command>' to varlistentry items
since passivetex (sigh) will no longer indent the para text.
index.xml:
- loads of white-space readability tweaks here and there.
tech-docs.xml
dist-docs.xml
manual.xml
- additional entries to <bookinfo> for compatability with
the rest of the docs.
/docs/Makefile.am
- added stuff to use the new vg-faq2txt.xsl stylesheet
/docs/lib/Makefile.am
- updated to reflect current contents of /docs/lib/
- removed refs to vg-html-single as is never ever used
/docs/lib/vg-fo.xsl
- massively updated to reflect losing old bugs and gaining new ones
/docs/lib/vg-common.xsl
- deleted as only contained two lines common to html and fo,
so not worth the bother of hauling around
/docs/lib/vg-html-chunk.xsl
- added what was in vg-common.xsl
/docs/lib/line-wrap.xsl
/docs/lib/faq2text.xsl
- two new stylesheet files for transforming FAQ.xml to FAQ.txt
Julian Seward [Fri, 25 Nov 2005 02:16:58 +0000 (02:16 +0000)]
ppc32 only: use the signal context structures in a way which also
works with 2.4 kernels. Without this, signal handling and hence
threads don't really work properly on ppc32 on kernel 2.4. Add
comments from Paul M too.
Julian Seward [Sun, 20 Nov 2005 19:08:08 +0000 (19:08 +0000)]
Fix obscure memcheck bug found by Nick. This could cause false
negatives, but only in the following unlikely circumstances: for an
8-byte store, which is handled by by the slow path (due to
misalignment or incomplete addressibility). In this case, the bug
caused the top 32 of the written V bits to be forced to zero
("defined"). This would not have affected the vast majority of 8-byte
stores since almost all of them would either have been handled by the
fast case or would have the top 32 V bits as zero anyway (almost
certainly both).