Petar Jovanovic [Wed, 27 Feb 2013 23:17:33 +0000 (23:17 +0000)]
mips: adding MIPS64LE support to Valgrind
Necessary changes to Valgrind to support MIPS64LE on Linux.
Minor cleanup/style changes embedded in the patch as well.
The change corresponds to r2687 in VEX.
Patch written by Dejan Jevtic and Petar Jovanovic.
More information about this issue:
https://bugs.kde.org/show_bug.cgi?id=313267
Petar Jovanovic [Wed, 27 Feb 2013 22:57:17 +0000 (22:57 +0000)]
mips: adding MIPS64LE support to VEX
Necessary changes to VEX to support MIPS64LE on Linux.
Minor cleanup/style changes embedded in the patch as well.
Patch written by Dejan Jevtic and Petar Jovanovic.
More information about this issue:
https://bugs.kde.org/show_bug.cgi?id=313267
Fix vassert_fail producing random output for an empty format
vsnprintf does not do any addition to the buffer for an empty
format. So, buf was not null terminated.
This e.g. causes an assert_fail to output random characters
after the failed expression.
Fix by ensuring the buffer of vsnprintf is always null terminated
to start with.
Assertion
valgrind: m_transtab.c:674 (find_TTEntry_from_hcode):
Assertion '(UChar*)sec->tt[tteNo].tcptr <= (UChar*)hcode' failed.
failure (encountered on some platforms while running gdbsrv tests).
The problem is related to invalidated entries and the host_extents
mapping between hostcode and the translation table entry.
The problem: when an entry is invalidated, the translation table
entry is changed to status Deleted. However, the host extent array
element is not cleaned up.
If a search for a host code address (find_TTEntry_from_hcode)
finds this entry, the translation table entry in Deleted status
is considered as a 'not found', which ensures that the invalidated
entry is not used (e.g. for chaining).
This is all ok.
However, it might be that this Deleted entry is re-used
(see function VG_(add_to_transtab), searching for a Empty
or Deleted entry.
If the Deleted entry is re-used, then a search for the
dead host code can give a result pointing to the re-used
entry. That is clearly wrong.
Note that it is unclear if this bug can only be triggered
while using gdbsrv or if this bug can be triggered with
just the "normal" invalidation logic of translation.
gdbsrv being a heavy "user" of invalidation, it might
be it helps to trigger the code. Alternatively, as gdbsrv
invalidation is special (e.g. invalidation of some entries
is done during translation of other entries), it might be
the bug is specific to gdbsrv.
In any case, to avoid the bug:
searching for an host code address must not only
ignore Deleted entries, but must also ignore an entry
found via a host_extent element which is for a Deleted
entry that was re-used afterwards (pointed to by a
newer host_extent element).
Multiple solutions are possible for fixing the bug:
Sol1: cleanup the host_extents array when an entry is deleted.
The cleanup is however deemed costly:
Each invalidate operation must do a search in the host_extents.
The host_extents array must then be "compacted" to remove
the "dead" host extent element from the array.
The compact operation can be avoided if instead of removing
the element, one marks instead the element as "dead"
e.g. by using one bit of UInt len for that:
UInt len : 31;
Bool dead : 1;
This avoids the compact, but still incurrs the cost of
search and modify the host_extent for each entry invalidated.
Invalidating entries seems to be a critical operation
(e.g. specific ECLASS related data structures have been
done to allow fast deletion).
=> it is deemed that a solution not incurring cost during
invaliation is preferrable.
* Sol 2: detect in find_TTEntry_from_hcode
that the host_extent element is re-used, and handle it similarly
to an host_extents which points at a Deleted entry.
This detection is possible as if an entry is re-used after
having been deleted, this implies that its host code will be
after the end of the host code of the deleted entry
(as host code of a sector is not re-used).
The attached patch implements this solution.
* Sol 3: avoid re-using an entry : the entry would then stay
in Deleted state. This is deemed not ok as it would
imply that invalidation of entries will cause a sector to
become full faster.
The patch:
* adds a new function
Bool HostExtent__is_dead (const HostExtent* hx, const Sector* sec)
telling if the host extent hx from sector sec is a dead entry.
* this function is used in find_TTEntry_from_hcode so that
dead host extents are not resulting in host code to be found.
* adds a regression test which caused the assert failure before
(bug was found/reported/isolated in a small test case by Dejan Jevtic).
* To check the logic of HostExtent__is_dead, m_transtab.c sanity check is
completed to verify that the nr of entries in use in a sector is equal
to the nr of non dead entries in the host extent array.
* adds/improves traces in m_transtab.c (enabled at compile
time using #define DEBUG_TRANSTAB).
Some already existing 'if (0)' conditions are replaced
by if (DEBUG_TRANSTAB)
Regression tested on
f12/x86
debian6/amd64 (also with export EXTRA_REGTEST_OPTS=--sanity-level=4)
Petar Jovanovic [Fri, 15 Feb 2013 03:12:17 +0000 (03:12 +0000)]
Add Valgrind's implementation of memmove to avoid link issue
One of the recent changes, r2682 (Make HReg a struct), caused a build
break on several x86_64 and MIPS build bots/platforms that used older
gcc versions. The issue was that compilers generated calls to memmove,
and since it was built with -nodefaultlibs, the entry could not be
resolved. The fix wraps VG_(memmove) in memmove().
Florian Krohm [Thu, 14 Feb 2013 14:28:22 +0000 (14:28 +0000)]
s390: Testcases and vbit-tester changes for the following
DFP insns:
- extract basied exponent
- insert biased exponent
- quantize
- reround to significance
Patch by Maran Pakkirisamy (maranp@linux.vnet.ibm.com).
Part of fixing BZ #307113.
Florian Krohm [Thu, 14 Feb 2013 14:27:12 +0000 (14:27 +0000)]
s390: Support the following DFP insns:
- extract basied exponent
- insert biased exponent
- quantize
- reround to significance
Patch by Maran Pakkirisamy (maranp@linux.vnet.ibm.com).
Part of fixing BZ #307113.
Florian Krohm [Mon, 11 Feb 2013 00:47:35 +0000 (00:47 +0000)]
Make HReg a struct. In the past there were several occurences where
a HReg was assigned to an integer. This worked by accident because the
bits representing the register number (which was meant to be accessed)
happened to be in the right place.
Two new functions: hregIsInvalid and sameHReg.
The HReg struct just wraps the integer that was previously used to
represent a register without changing the encoding.
Florian Krohm [Mon, 11 Feb 2013 00:03:27 +0000 (00:03 +0000)]
s390: Be consistent with emulation warnings about unsupported
rounding modes in absence of the floating-point extension facility.
For some insns we would vassert for others we'd give a warning.
Now we always issue an emulation warning.
Florian Krohm [Fri, 8 Feb 2013 23:34:31 +0000 (23:34 +0000)]
s390: Add testcase for these DFP insns: CDGTRA, CXGTR, CGXTR, and CGDTR (VEX r2680).
Update vbit tester. Patch by Maran Pakkirisamy (maranp@linux.vnet.ibm.com).
Part of fixing BZ #307113.
Florian Krohm [Fri, 8 Feb 2013 20:22:03 +0000 (20:22 +0000)]
s390: Change get_dfp_rounding_mode to map IR rounding modes to
S390_DEP_ROUND_.. values in the range [8;15]. See comments in code.
Patch by Maran Pakkirisamy (maranp@linux.vnet.ibm.com).
Florian Krohm [Sat, 2 Feb 2013 22:58:25 +0000 (22:58 +0000)]
s390: It is not necessary to save/restore the link register when
making a helper call. The link register needs to be saved when
switching between valgrind and client code and the dispatcher code
already does that. Julian suggested this change when he merged the
COMEM branch.
This saves between 6% and 13% of insns on the perf bucket.
Runtime difference is within noise margin.
Florian Krohm [Sat, 2 Feb 2013 00:16:58 +0000 (00:16 +0000)]
s390: Change insn selection to recognize memcpy-like statements.
Add S390_INCN_MEMCPY and generate MVC for that later on. Saves between
0.1 - 1.5% of insns. Observed runtime differences on the perf bucket were
within noise margin.
Bypass warning reported by gcc
gcc reports a warning:
m_stacktrace.c:183: warning: ‘xip_verified’ may be used uninitialized in this function
This warning is a false positive:
xip_verified is assigned in the following branch:
if (UNLIKELY(xip_verif >= CFUNWIND)) {
if (xip_verif == CFUNWIND) {
...
} else {
<<<< here xip_verified is initialised >>>>
}
}
xip_verified is then used only if xip_verif > CFUNWIND.
Assign a rubish value to xip_verified to silence gcc.
(??? there are GCC pragmas that can be used to
disable a warning only on a specific line e.g.
something like:
#pragma GCC diagnostic ignored "-Wuninitialized"
Addr xip_verified; // xip for which we have calculated fpverif_uregs
#pragma GCC diagnostic warning "-Wuninitialized"
instead of
Addr xip_verified = 0; // xip for which we have calculated fpverif_uregs
// 0 assigned to silence false positive -Wuninitialized warning
but the #pragma technique seems not used currently.
better handle and better document the case of multi-locks cycles
In case a lock order violation is detected in a multi lock cycle,
then the current code cannot produce the set of locks and the
stack traces involved in the cycle.
However, it is still possible to produce the stack trace of
the new lock and the other lock between which a cycle was discovered.
Also, add a comment in the code clarifying why the set of locks
establishing the required order cannot (currently) be produced.
* other platforms (e.g. amd64) are first trying to unwind
with cfi info, then with the fp chain.
* fp unwind when code is compiled without frame pointer can
fail and give incomplete stack traces (often terminating
with a random program counter, causing a huge amount of
recorded stack traces).
This patch improves unwinding on x86 by:
* first time an IP is unwound, do the unwind both with
CFI technique and with fp technique.
If results are identical, IP is inserted in a cache of
'fp unwindable' IP
* following unwind of the same IP are then done directly
either with fp unwind or with cfi, depending on the
cached result of the check done during first unwind.
The cache is needed so as to avoid as much as possible cfi unwind,
as this is significantly slower than fp unwind.
Carl Love [Wed, 30 Jan 2013 18:39:57 +0000 (18:39 +0000)]
The Coverity tool was run against the Valgrind source code and identified a
problem in VEX/priv/guest_ppc_toIR.c saying the variable 'insn_suffix' was
assigned but not used. The function _do_vsx_fp_roundToInt() has an
HChar * parameter named 'insn_suffix', and the intention of this function was
to set the insn_suffix appropriately for the passed opcode so that the caller
could use that suffix as needed (some callers needed, and others didn't).
However, since the parameter type is a simple pointer, passed by value,
insn_suffix was only modified locally, and the caller did not see the new
value. Since most of the callers of _do_vsx_fp_roundToInt() ignore the
insn_suffix, I have removed that from the parameter list and moved the code
for ascertaining the appropriate suffix into a new function called
_get_vsx_rdpi_suffix().
This patch is for Bugzilla 314099
The patch was written by Maynard Johnson.
The patch does not add any additional regtest errors. The vbit tester
was also run. No issues were found.
The patch was reviewed, tested and committed by Carl Love
Julian Seward [Tue, 29 Jan 2013 22:14:01 +0000 (22:14 +0000)]
test_reservation(), test_double_pair_instrs(): Fix broken inline assembly
causing segfaults with gcc-4.7. The inline assembly still isn't right,
but it's better than it was before.
Julian Seward [Sat, 26 Jan 2013 11:47:55 +0000 (11:47 +0000)]
Infrastructure cleanup: change type of the condition field of
IRExpr_Mux0X from Ity_I8 to Ity_I1. This makes more sense, makes it
consistent with condition fields in IRStmt_Dirty and IRStmt_Exit, and
avoids some pointless 1Uto8 casting of the condition, in many cases.
Fixes for s390 are from Florian.
Also, make a small extension to ir_opt.c, that allows the constant
folder to look backwards through arbitrary expressions even in flat
IR. This makes it possible to do arbitrary tree folding in ir_opt,
which is where it belongs. Use this to implement the folding rule
CmpNE32(1Uto32(b), 0) ==> b.
Julian Seward [Fri, 25 Jan 2013 09:46:43 +0000 (09:46 +0000)]
Annotate ARMNImm_to_Imm64 with fallthrough markers following
verification against the table in host_arm_defs.h, "Neon Immediate
operand". A particularly nasty piece of code.
Implement the gdbsrv monitor command v.do expensive_sanity_check_general
(useful to check the sanity of valgrind on request and/or from GDB,
when an error is reported by the tool).
Also re-order the NEWS entries to put the internals things after
the user level new functions.
Carl Love [Tue, 22 Jan 2013 20:26:34 +0000 (20:26 +0000)]
Fix implementation of the DFP integer operands.
The implementation of integer operands doesn't really match the documentation
for the Iop. Take for example Iop_ExtractExpD64. It is documented as
D64 -> I64 but the implementation of the UNARY is defined as
UNARY(Ity_D64, Ity_D64). The result is an integer that is stored in an integer
format in a floating point register. On the IBM s390 however, the architecture
stores the integer value in a general purpose register (GPR) not a floating
point register. This issue exists with the implementation of 11 Iops where the
PPC implementation has either a source or destination whose value is an integer
but the value is stored in a floating point register in an integer format. After
reviewing the PPC implementation with the s390 developer, it was agreed the
cleanest way to fix this is to change the PPC implementation. The BINOP will be
changed to be consistent with the Iop description. This means the PPC
instruction implementation of the PPC instruction in guest_ppc_toIR.c will need
to reinterpret integer source operands as integers which will move the value
from a floating point register to an integer register before calling binop().
The underlying PPC implementation of the unop() for the specific Iop will also
need to change to move the value from the integer register back to the floating
point register so the native instruction can be issued with the integer value
in a floating point register. It was decided that making the changed in PPC,
rather then having the s390 reinterpret integers as DFP and then move the value
back to an integer register, was preferable as it makes the implementation of
the unop(), binops(), triop() consistent with the definition of the Iop.
This patch also includes the needed changes for the vbit tester. The Iop
definitions in memcheck/tests/vbit-test/util.c had to be updated to be consitent
with the changes in the Iops as documented below. Also, the function mkLazy3()
in memcheck/mc_translate.c had to be updated to handle the I32 x I8 x I64 -> I64
and I32 x I8 x I128 -> I128 cases.
The specific list of changes are as follows:
Iop name in pub/libvex_ir.h
documented type
type of UNARY/BINARY/TERNARY in priv/ir_defs.c
-------------------------------------------------------
Carl Love [Tue, 22 Jan 2013 20:25:31 +0000 (20:25 +0000)]
Fix implementation of the DFP integer operands.
The implementation of integer operands doesn't really match the documentation
for the Iop. Take for example Iop_ExtractExpD64. It is documented as
D64 -> I64 but the implementation of the UNARY is defined as
UNARY(Ity_D64, Ity_D64). The result is an integer that is stored in an integer
format in a floating point register. On the IBM s390 however, the architecture
stores the integer value in a general purpose register (GPR) not a floating
point register. This issue exists with the implementation of 11 Iops where the
PPC implementation has either a source or destination whose value is an integer
but the value is stored in a floating point register in an integer format. After
reviewing the PPC implementation with the s390 developer, it was agreed the
cleanest way to fix this is to change the PPC implementation. The BINOP will be
changed to be consistent with the Iop description. This means the PPC
instruction implementation of the PPC instruction in guest_ppc_toIR.c will need
to reinterpret integer source operands as integers which will move the value
from a floating point register to an integer register before calling binop().
The underlying PPC implementation of the unop() for the specific Iop will also
need to change to move the value from the integer register back to the floating
point register so the native instruction can be issued with the integer value
in a floating point register. It was decided that making the changed in PPC,
rather then having the s390 reinterpret integers as DFP and then move the value
back to an integer register, was preferable as it makes the implementation of
the unop(), binops(), triop() consistent with the definition of the Iop.
This patch also includes the needed changes for the vbit tester. The Iop
definitions in memcheck/tests/vbit-test/util.c had to be updated to be consitent
with the changes in the Iops as documented below. Also, the function mkLazy3()
in memcheck/mc_translate.c had to be updated to handle the I32 x I8 x I64 -> I64
and I32 x I8 x I128 -> I128 cases.
The specific list of changes are as follows:
Iop name in pub/libvex_ir.h
documented type
type of UNARY/BINARY/TERNARY in priv/ir_defs.c
-------------------------------------------------------
Carl Love [Mon, 21 Jan 2013 18:12:31 +0000 (18:12 +0000)]
The 32-bit DFP value is stored in a 64-bit register in
ppc. The D32 to D64 and D64 to D32 definitions for the
Iop type was specified in VEX/priv/ir_defs.c, function
typeOfPrimop() as:
case Iop_D32toD64:
UNARY(Ity_64, Ity_D64);
case Iop_D64toD32:
BINARY(ity_RMode, Ity_D64, Ity_D64);
since the values resided in a 64-bit register. As part of the s390 DFP support
the definitions were changed to:
case Iop_D32toD64:
UNARY(Ity_32, Ity_D64);
case Iop_D64toD32:
BINARY(ity_RMode, Ity_D64, Ity_D32);
to reflect what they really should be. However, this broke the ppc
implementation. Valgrind would fail and report a mismatch on the types as the
ppc code was using a D64 instead of a D32.
This patch adds support for fetching and storing the Dfp32 operand as a 32-bit
value. The support includes adding the functions iselDfp32Expr() and
iselDfp32Expr_wrk() and additional code to support the DFP32 bit iops.
Florian Krohm [Mon, 21 Jan 2013 13:46:57 +0000 (13:46 +0000)]
xen: Add a missing break to the handling of XEN_DOMCTL_max_vcpus
found by Coverity's checker.
Also fix another missing break XEN_SYSCTL_numainfo found by via a
by-eye check. This one is at the end of the switch so it is benign.
Patch by Ian Campbell <ian.campbell@citrix.com>.
Florian Krohm [Mon, 21 Jan 2013 01:27:22 +0000 (01:27 +0000)]
In mc_translate a NULL guard expression is sometimes passed around
functions to indicate a "true" expression. That caused some confusion
and led people to believe believe, that IRDirty::guard could be NULL.
It cannot.
This confusion was indirectly spotted by coverity's checker who figured
out that IRDirty::guard was sometimes unconditionally dereferenced
and sometimes compared against NULL.
Cleaning this up...
Petar Jovanovic [Mon, 21 Jan 2013 01:01:13 +0000 (01:01 +0000)]
mips: fix link_tool_exe_linux issue for different mips architectures
One issue has been reported on the mailing list by Ilya Smelykh, and the second
issue has been found in development for MIPS64.
The change modifies the way we detect target-arch by reading host_cpu from
config.log rather than asking the toolchain.
Petar Jovanovic [Sun, 20 Jan 2013 18:27:39 +0000 (18:27 +0000)]
mips: additional test case for fix in VEX r2648
Test program that triggers different corner cases related to position of
branch instruction and max size of a translation block (60 instructions).
Fix for these issues has been submitted in r2648.
Petar Jovanovic [Sun, 20 Jan 2013 18:16:45 +0000 (18:16 +0000)]
mips: fix for mips-disassembler when branch is at block_size-2 position
Check if the last instruction in the block is a branch or jump instruction
should happen only if the disassembler was not already stopped.
Incorrect conditional led to a boundary case in which jumps/branches were not
executed when placed on "max_insns - 2" position in the block.
none/tests/mips32/block_size test will be added to Valgrind to describe the case
and check for regressions in future.
Implement --merge-recursive-frames + provide VALGRIND_MONITOR_COMMAND client req.
In a big applications, some recursive algorithms have created
hundreds of thousands of stacktraces, taking a lot of memory.
Option --merge-recursive-frames=<number> tells Valgrind to
detect and merge (collapse) recursive calls when recording stack traces.
The value is changeable using the monitor command
'v.set merge-recursive-frames'.
Also, this provides a new client request: VALGRIND_MONITOR_COMMAND
allowing to execute a gdbsrv monitor command from the client
program.
Florian Krohm [Sun, 20 Jan 2013 03:51:04 +0000 (03:51 +0000)]
Improve the tree builder in IR optimisation. Allow load expressions to be
moved past Put/I statements and dirty helpers, when it is safe to do so.
It is safe, when the statement does not require exact memory exceptions.
New functions stmt_modifies_guest_state and dirty_helper_puts have been
added to determine the side effect on the guest state.
This optimisation enables the use of memory-to-memory insns on
architectures that have those.
Implement a more efficient allocation of small blocks which are never freed.
This generalises the "perm_malloc" function which was in ms_main.c
The new VG_(perm_malloc) is used in ms_main.c
and for execontext : when there are a lot of execontext, this
can save significant memory.