Julian Seward [Fri, 11 Mar 2011 19:10:48 +0000 (19:10 +0000)]
NB: this is an temporary fix, until such time as bug 243935 is fully
resolved.
Add a client request, ANNOTATE_HAPPENS_BEFORE_FORGET_ALL, to notify
Helgrind that it can forget about any h-b edges previously associated
with the specified tag, and release associated resources.
Julian Seward [Fri, 11 Mar 2011 18:38:12 +0000 (18:38 +0000)]
Change the semantics of ANNOTATE_HAPPENS_BEFORE from 'overwrite' to
'add' behaviour, w.r.t. any h-b edges associated with the
synchronisation object prior to the call. This brings the behaviour
into line with DRD and TSan, and is required for correct annotation of
thread safe reference counting. It fixes #243935 -- at least, the
original bug as discussed in comments 0 and 2.
Julian Seward [Thu, 10 Mar 2011 17:40:22 +0000 (17:40 +0000)]
Cleanup: get rid of univ_tsets as it is no longer needed.
Also, fix bug in del_LockN (segfault when the deleted lock is
the last in the list) exposed by r11620. (Prior to r11620,
the last lock in the list was never deleted.)
Julian Seward [Thu, 10 Mar 2011 15:01:14 +0000 (15:01 +0000)]
Bring avg translation size statement closer to reality, for
amd64-linux with --smc-check=all. 350 would be better, but H already
soaks up so much space that a low-side of 320 seems prudent.
Bart Van Assche [Wed, 9 Mar 2011 17:53:28 +0000 (17:53 +0000)]
DRD: Report an error if an invalid argument is passed to pthread_detach(). Do not assume that pthread_detach() returns an error code if its argument is invalid. Should fix #267968.
Julian Seward [Mon, 7 Mar 2011 16:05:35 +0000 (16:05 +0000)]
Add a port to IBM z/Architecture (s390x) running Linux -- Valgrind
side components. (Florian Krohm <britzel@acm.org> and Christian
Borntraeger <borntraeger@de.ibm.com>). Fixes #243404.
For calls (structure jCC), Callgrind maintains for the source
both the BBCC (counter array for the source context of the call, which
includes the BB of the source call position), as well as a jump
number in the source BB to reconstruct the guest instruction address
of the call. In setup_bbcc, this jump number is stored in <passed>, and
used when creating a new jCC on a call.
The value of <passed> got out of sync when we simulate a real jump
between different functions as return/call pair: the call source was
reset for the popped jCC, but not <passed>.
Julian Seward [Mon, 28 Feb 2011 09:03:44 +0000 (09:03 +0000)]
Don't construct the LAOG at all when --track-lockorders=no (as opposed
to previous behaviour, in which it was constructed but any resulting
errors were not shown, hence wasting CPU and memory.) Partial fix
for #255353. (Philippe Waroquiers, philippe.waroquiers@skynet.be)
Julian Seward [Sun, 27 Feb 2011 23:53:32 +0000 (23:53 +0000)]
Back out r11568 (Add a new constructor for empty XArrays,
VG_(newSizedXA)) since r11571 removes the only use of the
functionality that r11568 introduces.
Julian Seward [Sun, 27 Feb 2011 23:04:12 +0000 (23:04 +0000)]
Change the representation of VTSs. Instead of using an XArray of
ScalarTSs, have the ScalarTS array as a trailing array directly on the
VTS structure. This reduces the number of malloc'd blocks per VTS
from 3 to 1, since an XArray always requires 2 malloc'd blocks. At
least for tc19_shadowmem this reduces the total amount of heap
turnover in Arena 'tool' by a factor of 3, and modestly improves
performance whilst modestly reducing overall memory use.
Julian Seward [Thu, 24 Feb 2011 15:25:24 +0000 (15:25 +0000)]
Scalability fix for Helgrind: reduce the size of ScalarTS (scalar
timestamps) from 16 to 8 bytes. This halves the size of vector
timestamps and reduces the amount of memory needed to run programs
that have many threads and/or many synchronisation events.
The tradeoff is that Helgrind must abort the run if the program
creates more than 2^20 (1.0e+6) threads or performs more than 2^44
(1.76e+13) synchronisation events. Neither of these seem like a
significant limitation in practice. It's easy to argue that a limit
of 2^44 synch events would take at a minimum, several CPU months on a
very fast machine.
Julian Seward [Wed, 23 Feb 2011 13:30:53 +0000 (13:30 +0000)]
A scalability fix for Helgrind for running large workloads. When
creating new vector timestamps (VTSs) via tick and join operations,
preallocate the underlying XArray of ScalarTSs (scalar timestamps) at
the likely final size, using new function VG_(newSizedXA) introduced
in r11558. This reduces overall heap turnover (in VG_AR_TOOL) by a
factor of several. Together with revs 11567 and 11568, it mitigates
the worst-case performance falloff in long runs that involve lots of
threads and lots of synchronisation events (a.k.a Vector timestamps).
Julian Seward [Wed, 23 Feb 2011 13:22:24 +0000 (13:22 +0000)]
Add a new constructor for empty XArrays, VG_(newSizedXA). This is
identical to VG_(newXA) but allows passing in a size hint. In the
case where the likely final size of the XArray is known at creation
time, this allows avoiding the repeated (implicit) resizing and
copying of the array as elements are added, which can save a vast
amount of dynamic memory allocation turnover.
Julian Seward [Wed, 23 Feb 2011 13:18:56 +0000 (13:18 +0000)]
Fix a scalability problem observed whilst running Helgrind on a large
workload: when scanning a freelist of a given size for a big-enough
block (to allocate), don't scan all the way around the list. Instead
give up after 100 blocks and try the freelist above. The pathological
case (as observed) is that the freelist contains tens of thousands of
blocks, but all are too small for the current request, hence they are
all visited pointlessly. If the new heuristic is used, the freelist
start point is moved along by one block, so that future searches
eventually inspect the entire freelist, just very slowly.
Also, some improvements to stats gathering, and rename of some
existing stats fields in struct Arena.
Julian Seward [Fri, 11 Feb 2011 16:47:03 +0000 (16:47 +0000)]
Make ld.so:index redir mandatory for glibc-2.12 and later, on x86-linux.
Also, improve the failure message a bit, so as to tell people what package
they need to install, in at least some cases.
Bart Van Assche [Thu, 10 Feb 2011 21:03:47 +0000 (21:03 +0000)]
DRD: don't inline pthread intercepts because in combination with the current fragile implementation of the CALL_FN_* macros inlining intercepts can easily trigger stack alignment errors on Darwin.
Julian Seward [Wed, 9 Feb 2011 12:47:23 +0000 (12:47 +0000)]
_pre_mem_asciiz handlers in both tools: don't segfault if passed an
obviously invalid address. Fixes #255009. Investigation & initial
patch by Philippe Waroquiers (philippe.waroquiers@skynet.be)
When unwinding needs to be done because the stack pointer is reset
(e.g. by a longjmp), it makes no sense to interprete the control
flow change as call, but should be seen as a return.
This indirectly fixes bug 246152. Unwinding potentially changes the
exec state, which is unique for threads, but also for signal handlers.
E.g. this is true for a longjmp out of a signal handler. Exec state
changes modify members of struct CLG_(current_state), such as
CLG_(current_state).bbcc and CLG_(current_state).jmps_passed, which
are backed in CLG_(setup_bbcc)() by last_bbcc and passed, respectivly.
On a exec state change, these local vars go out of sync, and lead
to invalid data passed to CLG_(push_call_stack)() for handling a call,
which triggered data corruption, and the symptoms seen in bug 246152.
As in the given situation, there is no call anymore, there is no call
into CLG_(push_call_stack)(), and the corruption (or since last commit
the failed assertion) is not triggered any more.
Better failed assertion then silent data corruption
This is part 1 of the fix to bug 246152, and makes the bug
reproducable as failed assertion also on Ubuntu 10.10 on 64bit
machines. However, the test needs to be compiled 32bit (-m32).