Julian Seward [Fri, 23 Dec 2005 02:29:58 +0000 (02:29 +0000)]
Deal with function pointer vs function entry crazyness on ppc64-linux.
Memcheck is done, but any tool which generates IR helper calls will
need to be similarly adulterated.
Julian Seward [Thu, 22 Dec 2005 19:28:37 +0000 (19:28 +0000)]
Make async-style syscalls work on ppc64, by using rt_sigprocmask
instead of sigprocmask.
In the process, discover that error handling for
ML_(do_syscall_for_client_WRK) on all platforms has always been
broken, in the sense that the sigprocmasks (which are important) could
silently fail. This commit fixes that up too (only on ppc64-linux at
the moment, so all other platforms are probably broken now).
do_syscall_for_client_WRK() needed a bigger stack to avoid the linkage area.
always use dot_prefix for label calls
not wrapping assembly with
.section ".text"
...
.previous
- ppc64 doesn't like it... seems we can't 'stack' more than one section to pop off with .previous ?
Tom Hughes [Mon, 19 Dec 2005 12:48:03 +0000 (12:48 +0000)]
Check that noinst_PROGRAMS and noinst_LIBRARIES are not empty strings
before trying to run a for loop over them as some versions of bash can't
cope with being asked to loop over an empty list.
Julian Seward [Sun, 18 Dec 2005 02:37:50 +0000 (02:37 +0000)]
When using a custom allocator that allocates with no intervening
blocks, the <= relation is the correct one. In effect asserting <
constitutes an off-by-one error.
Julian Seward [Sat, 17 Dec 2005 20:37:36 +0000 (20:37 +0000)]
findSb: gradually rearrange the superblock list to bring frequently
accessed blocks closer to the front. This speeds up malloc/free
intensive programs because evidently those searches cause a lot of
cache misses (so cachegrind tells us). For perf/heap.c on P4
Northwood, this halves the run-time (!) from 85.8 to 42.9 seconds.
For "real" code (start/exit ktuberling) there is a small but
worthwhile performance gain, of about 2 seconds out of 95.
Fix switchback.c to reflect changes to call of LibVEX_Translate()
Fix test_ppc_jm1.c to reflect direct linking
- main -> __main etc
- vex_printf -> vexxx_printf etc
Fixed up front and backend for 32bit mul,div,cmp,shift in mode64
Backend:
- separated shifts from other alu ops
- gave {shift, mul, div, cmp} ops a bool to indicate 32|64bit insn
- fixed and implemented more mode64 cases
Also improved some IR by moving imm's to right arg of binop - backend assumes this.
All integer ppc32 insns now pass switchback tests in 64bit mode.
(ppc64-only insns not yet fully tested)
Improvments to vg_perf:
- show percentage speedup over the first Valgrind when comparing multiple
Valgrind
- don't accept --reps < 0
- avoid div-by-zero if the runtime is measured as zero
Julian Seward [Thu, 15 Dec 2005 14:07:07 +0000 (14:07 +0000)]
- Track vex r1494 (x86/amd64 change of conventions for getting
to translations and back to dispatcher, and also different arg
passing conventions to LibVEX_Translate).
- Rewrite x86 dispatcher to not increment the profiling counters
unless requested by the user. This dramatically reduces the
D1 miss rate and gives considerable performance improvement
on x86. Also, restructure and add comments to dispatch-x86-linux.S
to make it much easier to follow (imo).
Julian Seward [Thu, 15 Dec 2005 14:02:34 +0000 (14:02 +0000)]
- x86 back end: change code generation convention, so that instead of
dispatchers CALLing generated code which later RETs, dispatchers
jump to generated code and it jumps back to the dispatcher. This
removes two memory references per translation run and by itself
gives a measureable performance improvement on P4. As a result,
there is new plumbing so that the caller of LibVEX_Translate can
supply the address of the dispatcher to jump back to.
This probably breaks all other targets. Do not update.
- Administrative cleanup: LibVEX_Translate has an excessive
number of arguments. Remove them all and instead add a struct
by which the arguments are supplied. Add further comments
about the meaning of some fields.
Added fp regtest
- needed some hackery to get around VEX's loss of accuracy.
------------------------------
Added test for fsqrt (fp square root)
Enabled stfs(u)(x) (fp single-precision stores)
- VEX implementation not great: ends up rounding twice, losing
accuracy, but is good enough for this test's small fp argument array.
Changed fp arg setup
- no denormals (for VEX inaccuracy)
All fp tests
- don't print CR, XER flags, as VEX doesn't set them.
3 arg fp arith tests (fp 'multiply and add' etc)
- no 'special' fp vals (for VEX inaccuracy)
- zap lo byte (for VEX inaccuracy)
fctiw, fctiwz (fp convert to int)
- zap high 32bits of result (is undefined)
Changed jm_insns.c usage to use one of flags 'i|f|a' to run int|fp|av insns respectively.
Removed integer test insns for jm-vmx.vgtest - already tested in jm-int.vgtest
Switchbacker updates
- no longer using home-grown linker - simply compiling and linking switchback.c with test_xxx.c
- updated to handle ppc64 (along with it's weirdo function descriptors...)
- have to be careful not to use exported functions from libvex_arch_linux.a, hence vex_printf -> vexxx_printf in test_xxx.c
First attempt at some performance tracking tools. Includes a script vg_perf
(use "make perf" to run) that executes test programs and times their
slowdowns under various tools. It works a lot like the vg_regtest script.
It's a bit rough around the edges -- eg. you can't currently directly
compare two different versions of Valgrind, which would be useful -- but it
is a good start.
There are currently two test programs in perf/. More will be added as time
goes on. This stuff will be built on so that performance changes can be
tracked over time.