vg_symtab2.c:
Discovered sometimes a SLINE stabs entry is the last one (which broke an
assertion). In such a case, we must guess the line's instruction address
range -- I've guessed 4, arbitrarily.
vg_cachegen.in, vg_cachesim_{I1,D1,L2}.c:
Discovered a bad bug in the cache simulation: when determining if a
references straddles two memory blocks, to find the end of the range I was
adding 'size' to the base address, rather than 'size - 1'. This was
causing way too many straddled references, which would inflate the miss
counts.
vg_improve() -- the ucode optimiser: consistently apply the
no-deferred-updates of %ESP rule, regardless of end use of the ucode.
This seems more consistent, and was exposed following examination of
code causing an assertion failure in the cache profiler. Added an
assertion to check this too, and was surprised I hadn't had an
assertion there in the first place.
New files:
- vg_cachesim.c
- vg_cachesim_{I1,D1,L2}.c
- vg_annotate.in
- vg_cachegen.in
Changes to existing files:
- valgrind/valgrind.in, added option:
--cachesim=no|yes [no]
- Makefile/Makefile.am:
* added vg_cachesim.c to valgrind_so_SOURCES var
* added vg_cachesim_I1.c, vg_cachesim_D1.c, vg_cachesim_L2.c to
noinst_HEADERS var
* added vg_annotate, vg_cachegen to 'bin_SCRIPTS' var, and added empty
targets for them
- vg_main.c:
* added two offsets for cache sim functions (put in positions 17a,17b)
* added option handling (detection of --cachesim=yes which turns off of
--instrument);
* added calls to cachesim initialisation/finalisation functions
- vg_mylibc: added some system call wrappers (for chmod, open_write, etc) for
file writing
- vg_symtab2.c:
* allow it to read symbols if either of --instrument or --cachesim is
used
* made vg_symtab2.c:vg_what_{line,fn}_is_this extern, renaming it as
VG_(what_line_is_this) (and added to vg_include.h)
* completely rewrote the read loop in vg_read_lib_symbols, fixing
several bugs. Much better now, although probably not perfect. It's
also relatively fragile -- I'm using the "die immediately if anything
unexpected happens" approach.
- vg_to_ucode.c:
* in VG_(disBB), patching in x86 instruction size into extra4b field of
JMP instructions at the end of basic blocks if --cachesim=yes.
Shifted things around to do this; also had to fiddle around with
single-step stuff to get this to work, by not sticking extra JMPs on
the end of the single-instruction block if there was already one
there (to avoid breaking an assertion in vg_cachesim.c). Did a
similar thing to avoid an extra JMP on huge basic blocks that are
split.
- vg_translate.c:
* if --cachesim=yes call the cachesim instrumentation phase
* made some functions extern and renamed:
allocCodeBlock() --> VG_(allocCodeBlock)()
freeCodeBlock() --> VG_(freeCodeBlock)()
copyUInstr() --> VG_(copyUInstr)()
(added to vg_include.h too)
- vg_include.c: declared
* cachesim offsets
* exports of vg_cachesim.c
* added four new profiling events (increasing VGP_M_CCS to 24 -- I kept
the spare ones)
* added comment about UInstr.extra4b field being used for instr size in
JMPs for cache simulation
- docs/manual.html:
* Added --cachesim option to section 2.5.
* Added cache profiling stuff as section 7.
Fix really stupid error in computation of timeout point in nonblocking
poll(). After this change, Mozilla-0.9.2.1 and Galeon 0.11.3 finally
behave reasonably on my box.
Fix a subtle (?) bug in sched_do_syscall to with read/write calls for
which the client has already got the fd in nonblocking mode. In such
cases, do not wait for an IO completion -- since the client presumably
handles that somehow.
handle_signal_return: when a waiting read/write syscall is interrupted
by a signal which has been set to non-SARESTART, clean up the waiting_fds
table correctly.
Mess around with aliases to make the exported T/D/W syms look like those
of the real libpthread.so. This is a Good Thing, despite the fact it
temporarily breaks some threaded programs.
Fix many holes and bugs in an attempt to get my libpthread.so to export
the same set of symbols as the real one, which I now realise is crucial
for it to work at all.
Try and give at least some minimal binding for all functions exported
by the real libpthread.so. In the process fix a bunch of stuff, including
adding thread-specific h_errno and resolver state storage. This fixes
licq crashing at startup.
VG_(oursignalhandler): when catching a fatal signal, don't longjmp
back to the scheduler if the signal is already pending. There's
something very suspicious about all this, though.
Once VG_(maybe_add_context) starts ignoring errors, ignore them
right up front, in the VG_(record_*_error) functions. This is an
attempt to avoid excessive performance problems with programs which
have excessive numbers of errors.
- Fast-track pthread_mutex_trylock(), even though programs which use
it extensively are probably badly designed -- they are polling.
- VG_(deliver_signals): return a Bool indicating if any signals
really were delivered. Used only to try and reduce excessive
frequency of system sanity checks.
Detect, print warning, and "correctly" handle implausible requests
such as malloc(negative-argument). You'd be amazed at the stupidity
of some of the programs people run on valgrind.
Fast-track pthread_mutex_{lock,unlock} in the scheduler. This reduces
their cost by about a factor of 20, which fixes the performance probs
observed with Opera.
Fix total b0rkage of signal handling caused by stupidly giving the
same value to VG_USERREQ__READ_MILLISECOND_TIMER and
VG_USERREQ__SIGNAL_RETURNS. Duh.
Various upgrades, with the effect that mozilla now runs, although
it has tremendous performance problems.
* Implement pthread_key_{create,delete} and pthread_{set,get}specific.
* Implement pthread_cond_timedwait. A nuisance.
* New timer infrastructure, based on the RDTSC instruction. This
allows fast, accurate time measurement without swamping the host with
gettimeofday() syscalls.
There's something definitely screwy about the scheduler, making opera
run slowly and mozilla run unbelievably slowly. To be investigated.
- don't check if the compiler supports const. No compiler we care
about doesn't support it
- readd -Werror
- move setting of CFLAGS to the Makefile instead of the configure
script, to avoid that the custom flags we use screw up configure checks
Make the GDB-attach stuff thread-aware, and work (at least partially)
when running multithreaded. Can still cause crashes (assertion failures)
when GDB exits. I think it that's due to my use of libc's system()
call; should roll my own.
Sigh. Remove -Werror because it causes the ./configure test for working
const qualifier to fail. This causes config.h to #define const to nothing,
which causes the whole compilation to fail.
What's the right way to fix this? I really like having -Werror.
VG_(record_free_error) / VG_(record_freemismatch_error) are called
by the scheduler, not by generated code. So pass in the relevant
ThreadState*; don't get it from VG_(get_current_tid)().
Continue trying to extract myself from the pthread_mutex_* swamp.
Fall back to a compromise position, which makes my mutex implementation
initialiser- and structure-compatible with LinuxThreads, and ditto the
upcoming condition var implementation. In particular this means that
((ThreadId)0) is an invalid thread ID, so vg_threads[0] is never used,
and vg_threads[1] specially denotes the "main" thread.
Remove the scheme of having a linked list of threads waiting on
each mutex. It is too difficult to get the right semantics for
when a signal is delivered to a thread blocked in pthread_mutex_lock().
Instead, use the old scheme of each thread stating with its .waited_on_mx
field, which mutex it is waiting for. This makes pthread_mutex_unlock()
less efficient, but at least it all works.
Show backtraces for all threads in vg_assert, VG_(panic) and
VG_(unimplemented). In future this will not be enabled by default due
to the danger of recursion of assertion failures.
Change --trace-pthread= flag to accept none|some|all, for finer level
of pthread event tracing. And allow this info to be passed across to
the client, where vg_libpthread.c uses it to also control verbosity.
Add more pthread wrappers in a failed attempt to get Opera 6.0TP2
to run. Now it creates some threads but segfaults. Also add
wrapper for syscall __NR_mremap; it is way wrong, but finding
a decent description of what mremap() really does is nearly
impossible.
Handle VG_USERREQ__PTHREAD_GET_THREADID and VG_USERREQ__RUNNING_ON_VALGRIND
cheaply, with the trivial-client-request mechanism. The latter is called
once per pthread call, even simple ones like pthread_mutex_[un]lock.
Get rid of the --client-perms= flag. Valgrind now depends critically
on the client-request subsystem, and disabling it is no longer a
sensible thing to do.
Also: in the manual, mention flags --trace-sched= and --trace-pthread=.
Turns out these insns are also available as Grp8 extensions, with
literal bit-offset values. Nuisance. I've #if 0'd out the old code
which implements them since am too lazy to fix them properly, and I
can't find any cases of their use anyway. I'll wait until someone
yelps.
Add fairly comprehensive test case for bt/bts/btc/btc, mem and reg
targets, although size-L (4-byte) only. In any event the jitter
doesn't handle the size 2 case and has never been asked too, AFAIK.
Correctly implement x86 bt/btc/bts/btr insn. Previous impl was wrong:
* Didn't handle correctly operands in memory, where arbitrary signed
bit offsets are allowed. Prior impl will trash the client's stack
and give the wrong answer.
* Was done by a helper function and therefore could give spurious
value errors.
Now the address computations are done in-line.
Old implementation is there, but unused and scheduled for demolition.