This patch reduces the size of all tools by about 2MB of text
(depending on the arch).
This has as advantages:
1. somewhat faster build/link time (very probably neglectible)
2. somewhat faster tool startup (probably neglectible for most users,
but regression tests are helped by this)
3. a gain in memory of about 10MB
The valgrind tools are making the assumption that host and guest
are the same. So, no need to drag the full set of archs when
linking a tool.
The VEX library is nicely split in arch independent and arch dependent
objects. Only main_main.c is dragging the various arch specific files.
So, main_main.c (the main entry point of the VEX library) is compiled
only for the current guest/host arch.
The disadvantage of the above is that the VEX lib cannot be used
anymore with host and guest different, while VEX is able to do that
(i.e. does not make the assumption that host and guest are the same).
So, to still allow a VEX user to use the VEX lib in a multi arch setup,
main_main.c is compiled twice:
1. in 'single arch mode', going in the libvex-<arch>-<os>
2. in 'multi arch mode', going in a new lib libvexmultiarch-<arch>-<os>
A VEX user can choose at link time to link with the main_main
that is multi-arch, by linking with both libs (the multi arch being
the first one).
Here is a small (rubbish crashing) standalone usage of the VEX lib,
first linked in single arch, then linked in multi-arch:
In a next commit, some regtests will be added to validate that the two libs
are working properly (and that no arch specific symbol is missing when
linking multi-arch)
Further reduction of the size of the sector TTE tables
For default memcheck configuration, 32 bits) this patch
decreases by 13.6 MB ie. from 89945856 to 76317696.
Note that the type EClassNo is introduced only for readibility
purpose (and avoid some cast). That does not change the size
of the TTEntry.
The TTEntry size is reduced by using unions and/or Bool on 1 bit.
No performance impact detected (outer callgrind/inner memcheck bz2
on x86 shows a small improvement).
Julian Seward [Mon, 30 Mar 2015 08:50:27 +0000 (08:50 +0000)]
Add IR level support for 16 bit floating point types (Ity_F16) and add
four new IROps that use it:
Iop_F16toF64, Iop_F64toF16, Iop_F16toF32, Iop_F32toF16.
Revision 14976 causes a regression : stacktrace produced when the
stack has not yet been extended to cover SP will only contain one
element, as the stack limits are considered to be the limits of
the resvn segment.
This patch fixes that, by taking Resvn/SmUpper segment into
account to properly compute the limits.
It also contains a new regtest that fails with the trunk
(only one function in the stacktrace)
and succeeds with this patch (the 2 expected functions).
The hint given by Valgrind gdbserver when enabling host visibility
in gdbserver was wrongly giving the file load address,
instead of the text segment address start.
This means that GDB was then showing wrong symbols for an address
(typically, symbols slightly before the address being printed).
This patch ensures the hint given is using the text start address.
Helgrind optimisation:
* do VTS pruning only if new threads were declared
very dead since the last pruning round.
* When doing pruning, use the new list of threads very dead
to do the pruning : this decreases the cost of the dichotomic search
in VTS__substract
Florian Krohm [Fri, 27 Mar 2015 08:47:22 +0000 (08:47 +0000)]
Change the minimum allowable value of aspacem_minAddr to
be VKI_PAGE_SIZE. That follows from the requirement that
the address ought to be page aligned and > 0.
Improve --stats=yes:
* give the avg nr of IPs per execontext
* use the newly introduced %f in m_transtab.c ratio
and in the avg nr of execontext per list
Florian Krohm [Thu, 26 Mar 2015 21:55:00 +0000 (21:55 +0000)]
Add function VG_(am_is_valid_for_aspacem_minAddr) so that the parser
for command line options does not need to know what addresses are valid
for aspacem_minAddr.
That information should be hidden in the address space manager.
Have the very detailed gdbsrv debuglog (e.g. exchange of packets
between GDB and V gdbsrv, fetching/setting registers, ...) done
at debuglog level 3 instead of 1.
This allows to do gdbsrv commands at debuglog level 2
without seeing too much trace.
Julian Seward [Thu, 26 Mar 2015 07:18:32 +0000 (07:18 +0000)]
Bug 345215 - Performance improvements for the register allocator
The basic idea is to change the representation of registers (HReg) so
as to give Real registers a unique integer index starting from 0, with
the registers available for allocation numbered consectively from zero
upwards. This allows the register allocator to index into its primary
data structure -- a table tracking the status of each available
register -- using normal array index instead of having to search
sequentially through the table, as now.
It also allows an efficient bitmap-based representation for "set of
Real registers", which is important for the NCODE work.
There are various other perf improvements, most notably in calling
getRegUsage once rather than twice per instruction.
Cost of register allocation is reduced to around 65% ish of what it
previously was. This translates in to speedups close to zero for
compute intensive code up to around 7% for JITing intensive
situations, eg "time perl tests/vg_regtest memcheck/tests/amd64".
This patch further reduces the memory used by TT/TC (by about 15Mb
on 32 bits memcheck default nr of sectors).
Memory is reduced by using UShort typedef-s for Sector no and TTE no.
Note that for TTE no, we had a mixture of UShort, UInt and Int used
depending on the place (a TTE no was in any case constrained to be an UShort).
The bss memory/startup space is also reduced by allocating the htt on demand
(like tt and tc), using mmap the first time a sector is initialised.
Changes:
* pub_core_transtab.h :
* 2 typedef to identify a sector and a tt entry (these 2 types are UShort)
* add 2 #define 'invalid values' for these types
* change the interface to use these types rather than UInt
* m_transtab.c
* use wherever relevant these 2 new types rather than UInt or UShort
* replace the use of -1 by INV_SNO or INV_TTE
* remove now useless typecast from Int/UInt to UShort for tte
* schedule.c: use the new types
Florian Krohm [Mon, 23 Mar 2015 17:13:04 +0000 (17:13 +0000)]
Add VG_(am_is_bogus_client_stack_pointer)(Addr).
The function is used in VG_(client_syscall) to avoid extending the stack
when it is clear that the current value of the stack pointer does not
point into a segment that looks like a stack segment.
See the comments in the code there.
As a side effect of this we can now revert r15018 which increased
the stack size of the alternate stack in memcheck/tests/sigaltstack.c.
The reason is that the belief at the time: "alternate stack is too small"
was not correct. What instead happened was that VG_(client_syscall) called
VG_(extend_stack) without need (the syscall was tgkill) and the new stack
pointer happened to be in a file segment.
In other words: the current stack pointer was still within the alternate
stack, i.e. the alternate stack was (barely) large enough.
Tom Hughes [Sun, 22 Mar 2015 11:01:58 +0000 (11:01 +0000)]
Include the platform name in the unhandled system call message
We often get bug reports for an unhandled system call which don't
make it clear what platform is in use, which makes it impossible
to know which system call it is.
343902 --vgdb=yes doesn't break when --xml=yes is used
Changes ensures that gdbserver is called also when xml is yes.
When gdbserver is set to yes, we have to temporarily reset
xml output to no, as gdbserver output (e.g. print the last error)
has to be printed to gdb.
Florian Krohm [Sat, 21 Mar 2015 10:58:37 +0000 (10:58 +0000)]
Change the GCC demangler to not use VLA. The rationale is that these VLAs
are allocated on the stack and they can become quite large - in particular
when the client is a C++ application using the Boost library.
In combination with the demanglers recursive nature this can quickly lead
to exhaustion of valgrind's per-thread stack (which cannot be dynamically
grown). Additionally, due to the large VLAs (I've seen a 32k array) we
could run out of stack space without issuing a prior warning and instead
just segfault.
Therefore this patch allocates these arrays on the heap and frees them
later. Basically this is a respin of Joseph's r10385.
Change TT/TC hashing data structure (decreases memory by 50MB for memcheck 32bits)
This patch changes the way the transtab entries hash table is done.
Currently, the hash table is an open hash table considered full at 65%.
This means that in average, 1 entry on 3 is unused.
(all the hash table memory will be 'active' for big applications,
as the active entries are normally reasonably distributed over the hash table).
The size of a transtab entry is significant (about 150 Bytes).
To avoid having 35% of the entries unused, the translation table
is split in 2:
An hash table, that will contain an index pointing at the transtab entries.
With this technique, we are adding a small hash table,
but we spare 35% of the translation table.
Performance measurements have shown no degradation,
and some platforms have better performance. Not too clear why,
probably this helps platforms with small caches ?).
Florian Krohm [Mon, 16 Mar 2015 22:03:42 +0000 (22:03 +0000)]
Increase the size of the alternate stack. It was too small.
This was found by accident and there is no known way to detect
an overflow of an alternate stack in the general case.
New Option --avg-transtab-entry-size=<number> can be used to tune
the size of the translation table sectors, either to gain memory
or to avoid too many retranslations.
Fix the following errors detected by makefile check
memcheck/tests/Makefile.am:1: error: wrap8.stderr.exp-ppc64 is missing in EXTRA_DIST
memcheck/tests/Makefile.am:1: error: wrap8.stdout.exp-ppc64 is missing in EXTRA_DIST
memcheck/tests/Makefile.am:1: error: wrap8.stdout.exp2 is in EXTRA_DIST but doesn't exist
memcheck/tests/Makefile.am:1: error: wrap8.stderr.exp2 is in EXTRA_DIST but doesn't exist
Florian Krohm [Sat, 14 Mar 2015 10:15:23 +0000 (10:15 +0000)]
Organise the expected output files for the wrap8 testcase.
There is special behaviout on ppc64 only. Let the filenames
reflect that. At the same time update the ppc specific
output to what it is. The important thing here is that the
stack overflow is detected. Everything else is effectively a
don't care. Should line numbers and such differ in the future
that should be filtered out.
Florian Krohm [Sat, 14 Mar 2015 09:44:04 +0000 (09:44 +0000)]
Update the ppc64 specific results to what they are.
The difference of the expected results as compared to other
platforms is
- Location 0x........ is 2 bytes inside local var "budget"
- declared at varinfo6.c:3115, in frame #2 of thread 1
+ Address 0x........ is on thread 1's stack
+ in frame #2, created by BZ2_blockSort (varinfo6.c:3107)
Should the stderr output of this testcase in the future
match the generic output (varinfo6.stderr.exp) then this is
another incarnation of
https://bugs.kde.org/show_bug.cgi?id=345121
Florian Krohm [Fri, 13 Mar 2015 13:50:08 +0000 (13:50 +0000)]
Sort locks by their guestaddr to make the error output independent
of the dynamically allocated Lock addresses.
This restores helgrind/tests/locked_vs_unlocked2.stderr.exp
from r14931.
While regtesting the patch I've observed intermittent failures
of helgrind/tests/hg05_race2 like so:
--- ../../helgrind/tests/hg05_race2.stderr.exp (revision 15001)
+++ ../../helgrind/tests/hg05_race2.stderr.exp (working copy)
@@ -26,8 +26,7 @@
at 0x........: th (hg05_race2.c:17)
by 0x........: mythread_wrapper (hg_intercepts.c:...)
...
- Location 0x........ is 0 bytes inside foo.poot[5].plop[11],
- declared at hg05_race2.c:24, in frame #x of thread x
+ Address 0x........ is on thread #x's stack
@@ -42,8 +41,7 @@
at 0x........: th (hg05_race2.c:17)
by 0x........: mythread_wrapper (hg_intercepts.c:...)
...
- Location 0x........ is 0 bytes inside foo.poot[5].plop[11],
- declared at hg05_race2.c:24, in frame #x of thread x
+ Address 0x........ is on thread #x's stack
Florian Krohm [Fri, 13 Mar 2015 12:46:49 +0000 (12:46 +0000)]
r2974 moved the inline definition of LibVEX_Alloc from libvex.h
to main_util.c because it caused linker problems with ICC.
See comments in BZ #339542.
This change re-enables inlining of that function by adding it
(renamed as LibVEX_Alloc_inline) to main_util.h.
500+ callsites changed accordingly.
m_transtab.c statistic/tracing
* common up the identical debug and clo_stat traces
* add in the stats the nr of sectors recycled
* add the avg translation size in each sector recycled
and in the final statistics
(no functional change)
Implement command line option --valgrind-stacksize=<number>
This allows to decrease memory usage when using many threads,
if no big stacksize is needed by Valgrind.
If needed (e.g. for demangling big c++ symbols), the V stacksize
can be increased.
Florian Krohm [Thu, 12 Mar 2015 18:56:21 +0000 (18:56 +0000)]
Fix two bugs:
(1) In r14664 VG_(get_fnname_if_entry) was changed to always
return a function name, even if that function was *not* an
entry. That broke callgrind and was also confusing because
it contradicts what "get_fnname_if_entry" suggests.
(2) In r14189 a function call was removed because it was considered
redundant which it was not.
Both bugs were hunted down by Joseph Weidendorfer.
Florian Krohm [Thu, 12 Mar 2015 11:01:12 +0000 (11:01 +0000)]
Fix build problems. The code has been bitrotting for some time.
Note, that while the file compiles and links, not all IROps are handled.
So there may be runtime problems.
Fixes BZ #345079. Patch by Ivo Raisr (ivosh@ivosh.net).
Florian Krohm [Tue, 10 Mar 2015 20:46:58 +0000 (20:46 +0000)]
Issue a warning if a function has more than 5 million bytes of
code. Previously functions exceeding that size were observed in the
field. Assert for 100x that amount.
Florian Krohm [Tue, 10 Mar 2015 16:13:59 +0000 (16:13 +0000)]
Add support for building with -fsanitize=undefined.
- add configure option --enable-ubsan
- add __ubsan helpers (by Julian)
This requires gcc 4.9.2 or later. Not all platforms are supported, though.
With this change and VEX r3099 regression tests pass on amd64
with a valgrind compiled with -fsanitize=undefined.
Florian Krohm [Sat, 7 Mar 2015 23:01:14 +0000 (23:01 +0000)]
New function VG_(am_mmap_client_heap) which swallows
VG_(am_set_segment_isCH_if_SkAnonC).
Rename VG_(am_set_segment_hasT_if_client_segment) to
VG_(am_set_segment_hasT) passing in an address (because that function
cannot possible take a pointer to a *const* segment). Also assert that
the segment containing the address is a client segment. Everything else
is a bug.
update NEWS to indicate that
335907 segfault when running wine's ddrawex/tests/surface.c under valgrind
is assumed to be fixed either by a previous change in 3.10
and/or by the commit for 343173
Rhys Kidd [Sat, 7 Mar 2015 14:57:39 +0000 (14:57 +0000)]
n-i-bz: Replace non-POSIX bzero with proper memset. At least for internal-only, Darwin functionality. Picked up by cppcheck. No regressions within test suite.
Rhys Kidd [Sat, 7 Mar 2015 08:36:20 +0000 (08:36 +0000)]
Fix unhandled syscall: unix:348 (__pthread_chdir) and unhandled syscall: unix:349 (__pthread_fchdir) on OS X
bz#344512
- Support these two undocumented syscalls.
- New regression test case added.
Rhys Kidd [Sat, 7 Mar 2015 05:22:12 +0000 (05:22 +0000)]
Fix stack traces missing penultimate frame
bz#344560
- Also fixes memcheck/tests/badpoll test on OS X
- Problem occurs because the guest stack seen in a system call pre or post
function happens to not have a correct topmost stack frame, as Darwin system
call stubs do not start with the usual function prolog.
- New regression test case added.
- Thanks to Greg Banks for research, patch and test case.
Julian Seward [Thu, 5 Mar 2015 00:52:07 +0000 (00:52 +0000)]
Minor changes in an attempt to improve performance and reduce
the amount of file-reading resulting from DiImage-cache misses.
CACHE_N_ENTRIES:
Increase the DiImage cache size from 256KB to 8MB to deal with
drastically worse locality when reading inline info. The 256KB
setting dates from befre inline-info-reading days.
is_in_CEnt: remove a conditional branch from the hot path (of |get|,
effectively)