Carl Love [Wed, 1 Jul 2015 21:29:12 +0000 (21:29 +0000)]
Backing out patch 1 and 2 from Bugzilla 349790.
The new script (tests/check_ppc64_auxv_cap) in the first patch was
written for the bash shell. I was told by fkrohm that there was an
issue with bash sometime ago and the decision was to use sh instead.
sh maps to bash on a lot of systems but on some it maps to dash. The
script is not compatible with dash.
In retesting the second patch with a fresh svn pull, I found that I
forgot to do the svn add for the new script file. Which causes the
regression test to fail with the second patch applied.
So, I have decided it will be best to just back out patch 1 and 2 for now.
I will fix the script and do this again.
Carl Love [Wed, 1 Jul 2015 19:44:13 +0000 (19:44 +0000)]
Patch 2 of 6
Update all vgtest files to reference the new capability check helper.
This includes a few adjustments to ensure the test is checking for
the proper capability. (i.e. htm versus isa_2_07).
Patch 1 valgrind commit id 15388.
The bugzilla for this commit is 349790
Patch submitted by Will Schmidt <will_schmidt@vnet.ibm.com>
Reviewed and tested by Carl Love <cel@ibm.com>
Carl Love [Wed, 1 Jul 2015 18:48:48 +0000 (18:48 +0000)]
Patch 1 of 6
Rework the aux vector hwcap capability checking utilities.
This is meant to consolidate a number of existing _cap
checking scripts, and allow a better way of checking for
additional capabilites.
The bugzilla for this commit is 349790
Patch submitted by Will Schmidt <will_schmidt@vnet.ibm.com>
Reviewed and tested by Carl Love <cel@ibm.com>
Bart Van Assche [Sun, 28 Jun 2015 16:55:45 +0000 (16:55 +0000)]
xen: Implement the xsm_op hypercall
More recent Xen toolstacks use this for the SID_TO_CONTEXT operation
only, even when XSM is not in use.
XSM is actually an abstraction layer, of which the only current
implementation is FLASK. So this blindly assumes that the backend is
FLASK. Should another XSM backend be invented then we will have to
sort of detecting the correct one.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15384
Bart Van Assche [Sun, 28 Jun 2015 16:48:22 +0000 (16:48 +0000)]
xen: Basic syswrap infrastructure for XEN_sched_op hypercalls
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15381
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15373
Bart Van Assche [Sun, 28 Jun 2015 16:37:54 +0000 (16:37 +0000)]
xen: syswrap XEN_DOMCTL_[gs]et_vcpu_msrs
The XEN_DOMCTL_[gs]et_vcpu_msrs work simiarly to the other get/set pairs,
taking a vcpu, buffer and size. A query with a buffer of NULL is a request
for the maximum size.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15370
Bart Van Assche [Sun, 28 Jun 2015 16:31:54 +0000 (16:31 +0000)]
xen: Add support for new sysctl and domctl interface versions
The change causing the sysctl bump is not in an implemented subop yet, so no
change is required. The change causing the domctl bump is in an implemented
subop, but has also been reverted in favor of a different way of performing
the same actions. Therefore, there is no net difference.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15365
Bart Van Assche [Sun, 28 Jun 2015 16:30:36 +0000 (16:30 +0000)]
xen: refactor the various "version not supported" messages into a single helper
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@15364
Florian Krohm [Tue, 23 Jun 2015 20:31:52 +0000 (20:31 +0000)]
Beef up configury for the undefined behaviour sanitiser.
If the compiler supports -fno-sanitize=alignment use it.
Otherwise, there will be complaints about misaligned
memory accesses. This is needed for GCC 5.1.
If that flag is not supported simply pass in -fsantize=undefined
and assume that it won't check for alignment violations (which
is true for GCC 4.9).
Filter 'New thread' lines
gdb 7.9 reports new threads at different moment than the previous versions.
Filter these new threads lines so as to not be dependent on this
gdb aspect.
324181 mmap does not handle MAP_32BIT (handle it now, rather than fail it)
324181 was previously closed with a solution to always make
MAP_32BIT fail. This is technically correct/according to the doc,
but is not very usable.
This patch ensures that MAP_32BIT mmap is succesful, as long as
aspacemgr gives a range in the first 2GB
(so, compared to a native run, MAP_32BIT will fail much more quickly
as aspacemgr does not reserve the address space below 2GB on a 64 bits).
Far to be perfect, but this is better than nothing.
Added a regression test that test succesful mmap 32 bits till
the 2GB limit is reached.
Florian Krohm [Tue, 9 Jun 2015 21:44:58 +0000 (21:44 +0000)]
Followup to r15323. Cannot use AC_GCC_WARNING_SUBST to detect
whether -Wformat-security is supported. Special handling is needed.
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 accepts -Wformat-security
without -Wformat being present on the command line. Other GCC
versions will issue a warning if -Wformat is missing. r15323
adds -Werror to AC_GCC_WARNING_SUBST and therefore turns that
warning into an error. With the consequence that
-Wformat-security appears to be unsupported -- a false conclusion.
Rhys Kidd [Sat, 6 Jun 2015 04:18:49 +0000 (04:18 +0000)]
Resolve remaining clang warning on OS X. Should be possible to build Valgrind on modern OS X without any warnings (note: does not hold for regression test suite).
Rhys Kidd [Sat, 6 Jun 2015 03:57:34 +0000 (03:57 +0000)]
Resolve clang warning on OS X: m_stacktrace.c:542:7: warning: implicit declaration of function 'vgPlain_is_in_syscall' is invalid in C99 [-Wimplicit-function-declaration]
Florian Krohm [Fri, 5 Jun 2015 21:19:06 +0000 (21:19 +0000)]
clang, as opposed to gcc, does not terminate with a non-zero return code
in case an unrecognised command line option is encountered. configure.ac
however was assuming just that which led to compile time warnings later on.
Add -Werror to the configure bits to make clang behave like gcc in this
regard. Fixes BZ #348565.
Florian Krohm [Fri, 5 Jun 2015 17:09:57 +0000 (17:09 +0000)]
Simplify configury and eliminate AC_GCC_WARNING_COND which was only used
in one place and can be replaced with AC_GCC_WARNING_SUBST_NEW. Adjust
perf/Makefile.am.
Julian Seward [Fri, 5 Jun 2015 13:33:46 +0000 (13:33 +0000)]
arm32-linux only: add handwritten assembly helpers for
MC_(helperc_LOADV32le), MC_(helperc_LOADV16le) and
MC_(helperc_LOADV8). This improves performance by around 5% to 7% in
the best case, for run-of-the-mill integer code.
On platforms that have an accessible redzone below the SP, the unwind logic
should be able to access the redzone.
So, when computing fp_min, substract the redzone.
Currently, only amd64 and ppc64 have a non 0 redzone.
Mark Wielaard [Wed, 3 Jun 2015 09:52:00 +0000 (09:52 +0000)]
Run memcheck/tests/demangle with -q.
The interesting part is the demangled backtrace in the error message.
Suppress the memory allocation/blocks summary which can differ slightly
depending on the underlying arch/libs.
Mark Wielaard [Tue, 2 Jun 2015 20:23:06 +0000 (20:23 +0000)]
GCC 5.1 is too smart. Disable Identical Code Folding for preload libs.
We want to disabled Identical Code Folding for the tools preload shared
objects to get better backraces. For GCC 5.1 -fipa-icf is enabled by
default at -O2.
The optimization reduces code size and may disturb
unwind stacks by replacing a function by equivalent
one with a different name.
Add a configure check to see if GCC supports -fno-ipa-icf.
If it does then add the flag to AM_CFLAGS_PSO_BASE.
Without this GCC will notice some of the preload replacement functions
in vg_replace_strmem are identical and fold them all into one picking
a random (existing) function name. This causes backtraces showing
completely unexpected function names.
Rhys Kidd [Tue, 2 Jun 2015 10:30:15 +0000 (10:30 +0000)]
Darwin11.supp should include suppression for known uninitialised read in pthread_rwlock_init() as required to pass the memcheck/tests/darwin/pth-supp test. bz#196528.
Slightly improve x86 unwind intensive workload.
e.g. perf/memrw is improved by 2% to 3% with this patch.
The unwinding code on x86 is trying to unwind using
either the %ebp-chain or CFI unwinding.
If these 2 techniques fail, then it tries to unwind
using FPO (PDB) debug info.
However, unless running wine or similar, there will never be
such FPO/PDB info.
The function VG_(use_FPO_info) is thus called for nothing
for each 'end of stack'. This function scans all the loaded di
to find a debug info that has some FP, to not find anything.
With this patch, the unwind code on x86 will only call VG_(use_FPO_info) if
some FPO/PDB info was loaded.
The fact that FPO/PDB info was loaded is cached and updated similarly to
cfi cache : each time new debug info is loaded, the cache value is refreshed
using the debuginfo generation.
The patch also changes the name of VG_(CF_info_generation)
to VG_(debuginfo_generation), as this generation is changed for
any kind of load or unload of debug info, not only for CFI based debug
info
Some platforms such as x86 and amd64 have efficient unaligned access.
On these platforms, implement read_/write_<type> by doing a direct
access, rather than calling a function that will read or write
'byte per byte'.
For platforms that do not have efficient unaligned access,
or that do not support at all unaligned access, call function
readUAS_/writeUAS_<type> that works as before.
Currently, direct acecss is activated only for x86 and amd64.
Unclear what other platforms support (efficiently) unaligned access.
On unwind intensive code (such as perf/memrw on amd64), this patch
gives up to 5% improvement.
This patch decreases significantly the memory needed for OldRef and
slightly increases the performance. It also moderately improves
the nr of cases where helgrind can provide the stack trace of the old
access (when using the same amount of memory for the OldRef entries).
The patch also provides a new helgrind monitor command to show
the recorded accesses for an address+len, and adds an optional argument
lock_address to the monitor command 'info locks', to show the info
about just this lock.
Currently, oldref are maintained in a sparse WA, that points to N
entries, as specified by --conflict-cache-size=N.
For each entry (associated to an address), we have the last 5 accesses.
Old entries are recycled in an exact LRU order.
But inside an entry, we could have a recent access, and 4 very
old accesses that are kept 'alive' by a single thread accessing
repetitively the address shared with the 4 other old entries.
The attached patch replaces the sparse WA that maintains the OldREf
by an hash table.
Each OldRef now also only maintains one single access for an address.
As an OldRef now maintains only one access, all the entries are now
strictly in LRU mode.
Memory used for OldRef
-----------------------
For the trunk, an OldRef has a size of 72 bytes (on 32 bits archs)
maintaining up to 5 accesses to the same address.
On 64 bits arch, an OldRef is 104 bytes.
With the patch, an OldRef has a size of 32 bytes (on 32 bits archs)
or 56 bytes (on 64 bits archs).
So, for one single access, the new code needs (on 32 bits)
32 bytes, while the trunk needs only 14.4 bytes.
However, that is the worst case, assuming that the 5 entries in the
accs array are all used.
Looking on 2 big apps (one of them being firefox), we see that
we have very few OldRef entries that have the 5 entries occupied.
On a firefox startup, of the 5x1,000,000 accesses, we only have
1,406,939 accesses that are used.
So, in average, the trunk uses in reality around 52 bytes per access.
The default value for --conflict-cache-size has been doubled to 2000000.
This ensures that the memory used for the OldRef is more or less the
same as the trunk (104Mb for OldRef entries).
Memory used for sparseWA versus hashtable
-----------------------------------------
Looking on 2 big apps (one of them being firefox), we see that
there are big variations on the size of the WA : it can go in a few
seconds from 10MB to 250MB, or can decrease back to 10 MB.
This all depends where the last N accesses were done: if well localised,
the WA will be small.
If the last N accesses were distributed over a big address space,
then the WA will be big: the last level of WA (the biggest memory consumer)
uses slightly more than 1KB (2KB on 64 bits) for each '256 bytes' memory
zone where there is an oldref. So, in the worst case, on 32 bits, we
need > 1_000_000_000 sparseWA memory to keep 1_000_000 OldRef.
The hash table has between 1 to 2 Word overhead per OldRef
(as the chain array is +- doubled each time the hash table is full).
So, unless the OldRef are extremely localised, the overhead of the
hash table will be significantly less.
With the patch, the core arena total alloc is: 5299535/1201448632 totalloc-blocks/bytes
The trunk is 6693111/3959050280 totalloc-blocks/bytes
(so, around 1.20Gb versus 3.95Gb).
This big difference is due to the fact that the sparseWA repetitively
allocates then frees Level0 or LevelN when OldRef in the region covered
by the Level0/N have all been recycled.
In terms of CPU
---------------
With the patch, on amd64, a firefox startup seems slightly faster (around 1%).
The peak memory mmaped/used decreases by 200Mb.
For a libreoffice test, the memory decreases by 230Mb. CPU also decreases
slightly (1%).
In terms of correctness:
-----------------------
The trunk could potentially show not the most recent access
to the memory of a race : the first OldRef entry matching the raced upon
address was used, while we could have a more recent access in a following
OldRef entry. In other words, the trunk only guaranteed to find the
most recent access in an OldRef, but not between the several OldRef that
could cover the raced upon address.
So, assuming it is important to show the most recent access, this patch
ensures we really show the most recent access, even in presence of overlapping
accesses.