Fix 203877 and 301229 increase to 16Mb maximum allowed alignment for memalign() and posix_memalign
Note that VG_(arena_memalign) is not used by core or tools for the moment.
We have one single maxima for both the V core/tools and the client.
Enhanced memcheck/tests/memalign2.c to test 4 Mb and 16 Mb alignments.
Julian Seward [Fri, 15 Jun 2012 16:20:23 +0000 (16:20 +0000)]
Add a hack (disabled by default) that attempts to unwind the stack on
ARM by simply scanning up and looking for words that look like they
might be return addresses. Last-ditch hack for when the CFI trail
goes cold.
Have the inner Valgrind registering the interim_stack asap.
This is needed to have the outer valgrind able to do stack
traces of the inner Valgrind before the main thread runs.
Without this, the outer crashes (segv) when doing a stack trace.
In mtV.txt, an ugly kludge was described to avoid this crash.
This is the clean solution replacing the kludge.
Fix assert in gdbserver for watchpoints watching the same address
GDB can create watchpoints watching the same address.
This was causing assertion failures.
To handle this, hash table (with key watched address) is replaced
by an xarray of address/lengh/kind.
Fully identical watches are ignored (either not inserted, and
not causing a problem if already deleted).
gdbserver_tests/mcwatchpoint enhanced to test duplicated watchpoints
Julian Seward [Wed, 13 Jun 2012 11:10:20 +0000 (11:10 +0000)]
Implement even more instructions generated by "gcc-4.7.0 -mavx -O3".
This is the first point at which coverage for -O3 generated code could
be construed as "somewhat usable".
CDSG needs quad word (16 byte) aligned data structures. Since the stack
on s390 has only 8 byte alignment gcc cant guarantee 16 byte alignment
for local variables. As a global variable gcc can do that.
Julian Seward [Tue, 5 Jun 2012 07:12:15 +0000 (07:12 +0000)]
Add macros I_REPLACE_SONAME_FNNAME_Z{U,Z} for general end-user use.
The I_WRAP_SONAME_FNNAME_Z{U,Z} equivalents have been present for
years. Seems inconsistent for the REPLACE versions to be missing.
Julian Seward [Sun, 3 Jun 2012 22:40:07 +0000 (22:40 +0000)]
m_machine: add new function VG_(machine_get_size_of_largest_guest_register)
cachegrind: use the new function to abort startup if the minumum line
size is smaller than the size of the largest guest register.
Partially derived from a patch by Josef Weidendorfer.
Julian Seward [Sat, 2 Jun 2012 23:48:06 +0000 (23:48 +0000)]
POWER Processor decimal FP support, part 5. (Valgrind side). Bug #299694.
(Carl Love, carll@us.ibm.com and Maynard Johnson, maynardj@us.ibm.com)
This patch adds support for Power Decimal Floating Point (DFP) . This
is the fifth patch set in the series of five to add the DFP
instruction support to Valgrind. Adds support for the ddedpd,
ddedpdq, denbcd, denbcdq, dtstsf, and dtstsfq instructions.
Julian Seward [Sat, 2 Jun 2012 23:47:02 +0000 (23:47 +0000)]
POWER Processor decimal FP support, part 5 (VEX side). Bug #299694.
(Carl Love, carll@us.ibm.com and Maynard Johnson, maynardj@us.ibm.com)
This patch adds support for Power Decimal Floating Point (DFP) . This
is the fifth patch set in the series of five to add the DFP
instruction support to Valgrind. Adds support for the ddedpd,
ddedpdq, denbcd, denbcdq, dtstsf, and dtstsfq instructions.
Florian Krohm [Sat, 2 Jun 2012 20:29:22 +0000 (20:29 +0000)]
Put the Triop member into a separate struct (IRTriop) and link to that
from IRExpr. Reduces size of IRExpr from 40 bytes to 32 bytes on LP64
and from 20 bytes to 16 bytes on ILP32.
Julian Seward [Fri, 1 Jun 2012 16:09:50 +0000 (16:09 +0000)]
Enhance the guest state effects notation on IRDirty calls, so as to be
able to describe accesses to arrays of non-consecutive guest state
sections. This is needed to describe the behaviour of FXSAVE and
FXRSTOR in an environment where we also support AVX.
The IRDirty struct has got smaller (112 bytes vs 136 before, for a 64
bit target) whilst holding more information.
The new facility is then used to describe said FXSAVE and FXRSTOR on
amd64. For x86 there is no change since we don't model AVX state for
x86.
Florian Krohm [Thu, 31 May 2012 15:48:13 +0000 (15:48 +0000)]
Reduce size of an IRStmt from 40 bytes to 32 bytes on LP64
by allocating the details of a PutI statement into a struct
of its own and link to that (as is being done for Dirty and CAS).
Florian Krohm [Thu, 31 May 2012 15:46:18 +0000 (15:46 +0000)]
Reduce size of an IRStmt from 40 bytes to 32 bytes on LP64
by allocating the details of a PutI statement into a struct
of its own and link to that (as is being done for Dirty and CAS).
Fix MacOS passsigalrm.c compilation error due to SIGRTMIN not existing on MacOS
The test will very probably fail on MacOS (as gdb output will contain SIGUSR1
rather than signal SIGRTMIN, but at least it should compile).
(not tested on MacOS; just tested that it still works on linux)
fix the warning introduced by fixing SETTLS clone flag PRE_READ logic
on amd64, vki_modify_ldt_t was defined as void (not very clear why).
sizeof (void) cannot be taken (or more precisely can be taken,
but nobody knows what that means and what gcc does).
So, uncommended the (supposedly) correct definition of the type.
Note that I checked the definition on debian 6.0, kernel 2.6.32
and the structure is still ok.
Still needed to look at the other platforms not properly
handling the *SETTID and the SETTLS flags in clone PRE_READ
logic and/or not defining the type vki_modify_ldt_t
Julian Seward [Sun, 27 May 2012 16:18:13 +0000 (16:18 +0000)]
Remove, or (where it might later come in handy) comment out artefacts
for 256 bit (AVX) code generation on amd64. Although that was the
plan at first, it turns out to be infeasible to generate 256 bit
instructions for the IR created by Memcheck's instrumentation of 256
bit Ity_V256 IR. This is because it would require 256 bit integer
SIMD operations, and AVX as currently available only provides 256 bit
operations for floating point. So, fall back to generating 256 IR
into 128-bit XMM register pairs, and using the existing SSE facilities
in the back end. This change only affects the amd64 back end -- it
does not affect IR, which remains unchanged, and capable of
representing 256 bit vector operations wherever needed.
Julian Seward [Sun, 27 May 2012 13:52:54 +0000 (13:52 +0000)]
Add more test cases for VCMPSS, and reenable disabled tests for VCMPSD
and VEXTRACTF128, now that the implementation has been fixed. Current
status that all so-far implemented AVX instructions are tested by this
file, and none have any detectable failures.
Fix false positive in sys_clone on amd64 when optional args are not given (e.g. child_tidptr)
rev 10493 fixed bug 117564 in syswrap-x86-linux.c.
This commit fixes the same problem in syswrap-amd64-linux.c.
The problem makes memcheck/tests/linux/stack_switch fails (at least on gcc20)
with unexpected
==802== Syscall param clone(child_tidptr) contains uninitialised byte(s)
The problem originates from always checking 3 optional args PRE_read,
while these should be checked only if the corresponding flags are set.
syswrap-{arm,ppc32,ppc64}-linux.c seems to have the same problem
(but no visible effect) : VKI_CLONE_PARENT_SETTID,VKI_CLONE_CHILD_SETTID
and VKI_CLONE_SETTLS not properly handled in the PRE part.
syswrap-s390x-linux.c seems to have the VKI_CLONE_SETTLS part wrong,
but VKI_CLONE_PARENT_SETTID and VKI_CLONE_CHILD_SETTID correct.
Commiting a fix just for amd64 for now.
We probably better make some common code in syswrap-generic.c
to regroup all similar platforms.