Reduce polling delays in poll(), select(), pause() and in the scheduler
idle loop. This reduces some strange non-CPU-bound delays under certain
circumstances.
When the client tries to __NR_close() our logfile, claim the close
succeeded, which is a lie since we just ignore it -- otherwise the
log disappears at that point.
Fixed two CPUID auto-cache detection problems reported by Guillaume Laurent:
- Some Intel cases were missing, giving spurious warnings such as:
--18114-- warning: Unknown Intel cache config value (0x50), ignoring
- The 0x40 case was wrong... its meaning depends on whether you have a P6
core ("no L2 cache present") or a P4 core ("no L3 cache present").
Damn wretched Intel CPUID format.
I was unwittingly assuming P6 cores which meant that P4 cores reporting
no L3 got this bogus warning:
--18114-- warning: L2 cache not installed, ignore L2 results.
So I now don't do anything for that case, and detect a missing L2 cache
by checking if its set by any of the other entries.
Turns out neither was affecting the results, but better to get rid of them
anyway.
Guillaume tested the changes for me so hopefully they work.
Include %defattr(-,root,root) in valgrind.spec.in so that the
ownership of the files is correct even if a non-root user builds the
RPM package. (Matthias Andree <matthias.andree@stud.uni-dortmund.de>)
valgrind's strcmp() implementation (to clients) treats char as signed
whereas the libc implementation it replaces treats char as unsigned.
Fix! God knows how anything much ever worked before now.
In order to handle FPU instructions with data size of 28 and 108 bytes,
implemented a hack: such instructions have their data_size reduced to 16
bytes for cache simulation purposes, to avoid assertion failures coming from
transfers that involve more than two cache lines. Should occur rarely in
practice.
Fixed a big problem with Cachegrind. I was assuming that any instruction that
both read and wrote memory must be doing it to the same address, and was thus
modifying it (eg. 'incl'). But some instructions can read and write different
addresses (eg. pushl %eax, (%ebx)).
Also, it wasn't handling 'rep'-prefixed instructions correctly. The way they
were instrumented meant that an I-cache access was simulated for every
repetition they do, which is most probably not accurate; only one I-cache
access should be simulated.
Fixed both of these. Some largeish changes required, unfortunately:
- Added 'iddCC' type, the cost-centre for instructions that read and write
different addresses. Correspondingly added READ_WRITE_CC as a CC_type.
- Have to do more correspondingly more complicated things to detect what
CC_type an x86 instruction is.
- To handle 'rep' prefixes, now do the I-cache access for such instructions
before the JIFZ UInstr, so only 1 I-cache access is simulated. D-cache
accesses are still done in the same place, so they occur once per
repetition.
- Changed the cache simulation log functions; gone from two to five, we now
have:
This means fewer spill slots (only 2, I think) have the compact call form,
which is unfortunate. Although it's not a problem in the ERASER branch in
which the helpers aren't hard-wired the way they are in this branch.
Problem was that when an exe segment was unloaded, cachesim_notify_munmap() was
being called after symbols were unloaded. But it needs the symbols to do the
lookup required to remove the BBCCs. It was only working some of the time
for exe segments that didn't have any symbols(!)
Fix: now invalidate translations first, unload symbols second. This required
adding VG_(is_munmap_exe)() to determine if an unloaded segment is executable.
Julian Seward [Sun, 25 Aug 2002 20:07:16 +0000 (20:07 +0000)]
1. The license is actually in the file COPYING, not LICENSE as all
the sources claim. Automake seems to have some hard-wired notion
that the license file must be called COPYING, so we have to
rename in all the source files :-(
2. Change the license for valgrind.h ONLY to a BSD-style license
so people can include it in their code. The entire rest of
the system remains under the GPL.
Julian Seward [Tue, 20 Aug 2002 18:18:54 +0000 (18:18 +0000)]
Merge rev 1.16.4.8 from ERASER into VALGRIND_1_0_BRANCH:
Added Cyrille Chepelov's patch for identifying cache params of Duron stepping
A0 which has a bug that causes CPUID to misreport L2 cache size. Untested, I
can only assume it works as I don't have such a machine to try with.
Julian Seward [Tue, 6 Aug 2002 09:06:18 +0000 (09:06 +0000)]
Merge rev 1.91:
Simulate resolver-specific state as per the real libpthread.so, wherein
the root thread (tid 1) always uses _res as exported from libc.so as its
state. This fixes the name lookup problems in KAtlantik.
Only run __libc_freeres() when valgrinding. It may do invalid free()s
which cause the low-level memory manager to crash. When valgrinding
that's all protected, but not when cachegrinding etc.
Some jokers apparently like setting the CPU's AC (Alignment Check) flag
for god-knows-why reasons. This causes VG_{READ,WRITE}_MISALIGNED_WORD
to give bus errors. Redefine them to do the obvious byte-by-byte loads/
stores. Fortunately they are not performance critical.
Assume PUTF modifies %EFLAGS in a completely arbitrary manner, and so
be completely pessimistic if it is encountered during the redundant-flag-
save/restore-elimination pass. This fixes the following mysterious
failure:
At request of Ulrich Drepper, call __libc_freeres() after final __NR_exit
so as to free memory allocated by glibc. This reduces the leaks reported
in glibc, but causes a stack of read/write-after-free errors which have
to be suppressed :-(
The Intel p4 manual suggests inserting a pause instruction in
spin-wait loops as a hint to what the code is doing. In other
respects it acts just like a nop. Pause (0xF3 0x90) currently
causes valgrind to panic. The patch below keeps things running.
vg_signals.c: vg_oursignalhandler(): don't longjmp() on fatal signal if
the scheduler's jmp_buf is not valid. This might avoid at least some
of the following:
vg_scheduler.c:479 (run_thread_for_a_while): Assertion `trc == 0'
failed.
Julian Seward [Sun, 30 Jun 2002 12:44:54 +0000 (12:44 +0000)]
Implement --weird-hacks=truncate-writes to limit the size of write syscalls
to 4096, to possibly avoid deadlocks under very rare circumstances.
Is fully documented and commented.
Julian Seward [Sun, 30 Jun 2002 10:57:30 +0000 (10:57 +0000)]
cleanup_after_thread_exited: also clean up the waiting_fds table on thread
disappearance. This fixes an assertion failure to do with thread nukage
on fork():
vg_scheduler.c:936 (poll_for_ready_fds):
Assertion `vgPlain_is_valid_tid(tid)' failed.