Carl Love [Fri, 5 Jun 2015 18:52:57 +0000 (18:52 +0000)]
Opps, missed a change in the previous patch. Forgot to remove the format
specifier.
The dcbt and dcbtst instructions provide a non-zero hint that describes
a block or data stream to which a program may perform a Store acces,
or indicates the expected use. The field bits[25:21] (bits 6:10 in
the IBM numbering) in the instruction provide the hint.
Valgrind checks that these bits are non-zero. Unfortunately, the test was
being applied to other instructions such as the dcbf instruction causing
it to fail when the field was equal to zero. This patch removes the check
that was being incorrectly applied to all of the instructions.
Carl Love [Fri, 5 Jun 2015 17:58:23 +0000 (17:58 +0000)]
The dcbt and dcbtst instructions provide a non-zero hint that describes
a block or data stream to which a program may perform a Store acces,
or indicates the expected use. The field bits[25:21] (bits 6:10 in
the IBM numbering) in the instruction provide the hint.
Valgrind checks that these bits are non-zero. Unfortunately, the test was
being applied to other instructions such as the dcbf instruction causing
it to fail when the field was equal to zero. This patch removes the check
that was being incorrectly applied to all of the instructions.
Carl Love [Wed, 29 Apr 2015 20:37:29 +0000 (20:37 +0000)]
Improve the error messages for the PPC platform to be more clear when Valgrind detects that
the underlying hardware doesn't have the needed capability. A number of the checks for DFP
support were going to "decode_failure" instead of "decode_noDFP". These issues are also fixed.
Remove VexGuestTILEGXStateAlignment as the guest state size of any architecture
must satisfy the LibVEX_GUEST_STATE_ALIGN requirement. So use that instead.
Carl Love [Wed, 22 Apr 2015 16:15:41 +0000 (16:15 +0000)]
Add support for the TEXASRU register. This register contains information on
transactional memory instruction summary information. This register contains
the upper 32-bits of the transaction information. Note, the valgrind
implementation of transactional memory instructions is limited. Currently, the
contents of the TEXASRU register will always return 0. The lower 64-bits of
the trasnaction information in the TEXASR register will contain the failure
information as setup by Valgrind.
This commit contains the changes needed to support the TEXASRU register on
PPC64.
This support requires changing the value of MAX_REG_WRITE_SIZE in
memcheck/mc_main.c from 1696 to 1712. The change is made in the corresponding
valgrind commit.
Carl Love [Fri, 17 Apr 2015 23:42:40 +0000 (23:42 +0000)]
Add support for the lbarx, lharx, stbcx and sthcs instructions.
The instructions are part of the ISA 2.06 but were not implemented
in all versions of hardware. The four instructions are all supported
in ISA 2.07. The instructions were put under the ISA 2.07 category
of supported instructions in this patch.
Carl Love [Thu, 16 Apr 2015 17:09:09 +0000 (17:09 +0000)]
The following regression tests failures occur on PPC64 little endian only.
The regression test none/tests/jm_vec/isa_2_07 has failures on the lxsiwax and
lxsiwzx instructions. They are loads and the the results are correct for
big endian but not little endian. The little endian result matches the
expected big endian result.
The regresssion test none/tests/test_isa_2_07_part2 has a failure with the
vbpermq instruction. The little endian result matches the expected result for
big endian. The upper and lower 64 bits of the result are not swapped correctly
for little endian.
amd64 front and back ends: track the change of type of Iop_Sqrt32Fx4
and Iop_Sqrt64Fx2 as introduced in r3120, in which they acquired a
rounding-mode argument.
arm64: implement FSQRT 2d_2d, 4s_4s, 2s_2s
AFAICS this completes the AArch64 SIMD implementation, except for the
crypto instructions.
This changes the type of Iop_Sqrt64x2 and Iop_Sqrt32x4 so as to take a
rounding mode argument. This will (temporarily, of course) break all
of the other targets that implement vector fsqrt.
This patch reduces the size of all tools by about 2MB of text
(depending on the arch).
This has as advantages:
1. somewhat faster build/link time (very probably neglectible)
2. somewhat faster tool startup (probably neglectible for most users,
but regression tests are helped by this)
3. a gain in memory of about 10MB
The valgrind tools are making the assumption that host and guest
are the same. So, no need to drag the full set of archs when
linking a tool.
The VEX library is nicely split in arch independent and arch dependent
objects. Only main_main.c is dragging the various arch specific files.
So, main_main.c (the main entry point of the VEX library) is compiled
only for the current guest/host arch.
The disadvantage of the above is that the VEX lib cannot be used
anymore with host and guest different, while VEX is able to do that
(i.e. does not make the assumption that host and guest are the same).
So, to still allow a VEX user to use the VEX lib in a multi arch setup,
main_main.c is compiled twice:
1. in 'single arch mode', going in the libvex-<arch>-<os>
2. in 'multi arch mode', going in a new lib libvexmultiarch-<arch>-<os>
A VEX user can choose at link time to link with the main_main
that is multi-arch, by linking with both libs (the multi arch being
the first one).
Here is a small (rubbish crashing) standalone usage of the VEX lib,
first linked in single arch, then linked in multi-arch:
In a next commit, some regtests will be added to validate that the two libs
are working properly (and that no arch specific symbol is missing when
linking multi-arch)
Julian Seward [Mon, 30 Mar 2015 08:50:27 +0000 (08:50 +0000)]
Add IR level support for 16 bit floating point types (Ity_F16) and add
four new IROps that use it:
Iop_F16toF64, Iop_F64toF16, Iop_F16toF32, Iop_F32toF16.
Julian Seward [Thu, 26 Mar 2015 07:18:32 +0000 (07:18 +0000)]
Bug 345215 - Performance improvements for the register allocator
The basic idea is to change the representation of registers (HReg) so
as to give Real registers a unique integer index starting from 0, with
the registers available for allocation numbered consectively from zero
upwards. This allows the register allocator to index into its primary
data structure -- a table tracking the status of each available
register -- using normal array index instead of having to search
sequentially through the table, as now.
It also allows an efficient bitmap-based representation for "set of
Real registers", which is important for the NCODE work.
There are various other perf improvements, most notably in calling
getRegUsage once rather than twice per instruction.
Cost of register allocation is reduced to around 65% ish of what it
previously was. This translates in to speedups close to zero for
compute intensive code up to around 7% for JITing intensive
situations, eg "time perl tests/vg_regtest memcheck/tests/amd64".
Florian Krohm [Fri, 13 Mar 2015 12:46:49 +0000 (12:46 +0000)]
r2974 moved the inline definition of LibVEX_Alloc from libvex.h
to main_util.c because it caused linker problems with ICC.
See comments in BZ #339542.
This change re-enables inlining of that function by adding it
(renamed as LibVEX_Alloc_inline) to main_util.h.
500+ callsites changed accordingly.
Florian Krohm [Thu, 12 Mar 2015 11:01:12 +0000 (11:01 +0000)]
Fix build problems. The code has been bitrotting for some time.
Note, that while the file compiles and links, not all IROps are handled.
So there may be runtime problems.
Fixes BZ #345079. Patch by Ivo Raisr (ivosh@ivosh.net).
Julian Seward [Wed, 4 Mar 2015 12:35:54 +0000 (12:35 +0000)]
Fix problems due to generating Neon instructions on non-Neon capable
hosts:
* iselNeon64Expr, iselNeonExpr: assert that the host is actually
Neon-capable.
* iselIntExpr_R_wrk, existing cases for Iop_GetElem8x8,
Iop_GetElem16x4, Iop_GetElem32x2, Iop_GetElem8x16, Iop_GetElem16x8,
Iop_GetElem32x4:
Limit these to cases where the host is Neon capable, else we wind up
generating code which can't run on the host.
* iselIntExpr_R_wrk: add alternative implementation for
Iop_GetElem32x2 for non-Neon capable hosts.
Julian Seward [Fri, 27 Feb 2015 13:33:56 +0000 (13:33 +0000)]
Add machinery to try and transform A ^ ((A ^ B) & M)
into (A ^ ~M) | (B & M).
The former is MSVC's optimised idiom for bitfield assignment, the
latter is GCC's idiom. The former causes Memcheck problems because it
doesn't understand that (in this complex case) XORing an undefined
value with itself produces a defined result.
Believed to be working but currently disabled. To re-enable, change
if (0) to if (1) at line 6651. Fixes, to some extent, and when
enabled, bug 344382.
Julian Seward [Fri, 27 Feb 2015 13:22:48 +0000 (13:22 +0000)]
Enhance the CSE pass so it can common up loads from memory. Disabled
by default since this is a somewhat dodgy proposition in the presence
of spinloops and racy accesses.
Julian Seward [Fri, 27 Feb 2015 13:06:43 +0000 (13:06 +0000)]
Tidy up of CSE. Create functions irExpr_to_TmpOrConst,
tmpOrConst_to_IRExpr and subst_AvailExpr_TmpOrConst and use them
instead of in-line code. No functional change.
Julian Seward [Sun, 8 Feb 2015 18:24:38 +0000 (18:24 +0000)]
Implement all remaining FP multiple style instructions:
FMULX d_d_d, s_s_s
FMLA d_d_d[], s_s_s[]
FMLS d_d_d[], s_s_s[]
FMUL d_d_d[], s_s_s[]
FMULX d_d_d[], s_s_s[]
FMULX 2d_2d_2d, 4s_4s_4s, 2s_2s_2s
FMULX 2d_2d_d[], 4s_4s_s[], 2s_2s_s[]
The FMULX variants are currently handed the same as FMUL. This is a
kludge that will have to be fixed at some point.
Julian Seward [Thu, 5 Feb 2015 12:53:20 +0000 (12:53 +0000)]
Make a very minor change to the LibVEX_Translate interface (sub-arg of
needs_self_check) which allows VEX's user to selectively override, on
a per-translation basis, the default precise-exception control setting
that is specified in VexControl::iropt_register_updates. Fix up
plumbing inside iropt so as to used passed-in values rather than the
default one.
Julian Seward [Tue, 27 Jan 2015 23:35:58 +0000 (23:35 +0000)]
Change AMD64Instr_CMov64 so that the source can only be a register
instead of register-or-memory (an AMD64RM). This avoids duplicating
conditional load functionality introduced in r3075 via
AMD64Instr_CLoad and in practice has no effect on the quality of the
generated code.
Julian Seward [Tue, 27 Jan 2015 23:17:02 +0000 (23:17 +0000)]
AMD64 front end: translate AVX2 PMASKMOV load instructions (vector
conditional loads) using IR conditional load statements IRLoadG rather
than the previous rather ingenious hack.
AMD64 back end:
* Add instruction selection etc for 32- and 64-bit conditional loads (IRLoadG)
* Handle dirty helper calls that return a value and that are conditional. These
result from Memcheck's instrumentation of IRLoadGs.
No functional change. This is a cleanup as part of supporting AVX2
PMASKMOV loads and stores by using the existing IR facilities for
conditional loads and stores.
The toUInt() should only be used if we are running in 32-bit mode. The lines
were changed to only convert the pointer to 32-bit if running in 32-bit mode.
There is no bugzilla for this issue. It was noticed by Florian Krohm.
Fix assert
vex: priv/guest_generic_bb_to_IR.c:224 (bb_to_IR): Assertion `vex_control.guest_max_insns < 100' failed.
caused by giving --vex-guest-max-insns=100
100 should be allowed as described by --help-debug:
--vex-guest-max-insns=<1..100> [50]
Florian Krohm [Sun, 4 Jan 2015 17:20:19 +0000 (17:20 +0000)]
Change remaining use of Addr64 in the VEX API to Addr. The reduces
the size of VexGuestExtent to 20 bytes on a 32-bit platform.
Change prototypes of x86g_dirtyhelper_loadF80le and
x86g_dirtyhelper_storeF80le to give the address in the parameter
list type Addr. Likewise for amd64g_dirtyhelper_loadF80le and
amd64g_dirtyhelper_storeF80le.
Update switchback.c - but not tested.
Florian Krohm [Wed, 31 Dec 2014 12:09:38 +0000 (12:09 +0000)]
It has long been assumed that host and guest architectures
are the same - even though the initial design goal was likely
different allowing a cross-valgrind of sorts. But as Julian
put it:
But it's been 12+ years and I've never once heard any mention of
such a thing. So perhaps it's time to give up on that one.
Now let's take advantage of this decision and tighten up the VEX
API using Addr instead of Addr64. As a first step move the definition
of Addr into VEX proper and change the chase_into_ok callback
accordingly.
Florian Krohm [Mon, 29 Dec 2014 22:18:58 +0000 (22:18 +0000)]
As a library, VEX should not export the offsetof and vg_alignof
macros. The latter isn't even used by VEX.
Move them to pub_tool_basics.h.
offsetof also goes to VEX's private header main_util.h.
On amd64, We handle GS similar to FS, i.e. consider it is constant.
Note that FS is not always 0 on linux. It looks rather to be constant
in all threads, and is zero in the main thread.
As values for FS and/or GS differs between platforms (linux or darwin),
FS_CONST and GS_CONST are used.
Note that we cannot easily test that the value of GS or FS is the
expected one, as the value might not be set at the begin of execution
but only set after prctl has been executed.
So, we just hope that effectively GS and FS are constant.
Some trials to set GS to other values that the expected
constant value on linux was causing a SEGV.
So, it looks like this is all effectively protected.
In summary: we were counting somewhat on the luck for FS,
we now similarly count on luch for GS
Florian Krohm [Mon, 15 Dec 2014 21:55:16 +0000 (21:55 +0000)]
Remove quote.txt and newline.txt as they are no longer needed.
Once upon a time those files were used to construct a
header file vex_svnversion.h but that more hassle than it
was worth and eventually it got nuked.
With this change, the user experience will be smoewhat better, e.g.:
VEX: Support for AVX2 requires AVX capabilities
Found: amd64-cx16-rdtscp-sse3-avx2
Cannot continue. Good-bye
Specifically, the patch decouples showing hwcaps and deciding their validity.
show_hwcaps_<ARCH> reports the hwcaps it finds. It never returns NULL.
check_hwcaps checks the hwcaps for feasibility and does not return in case
VEX cannot deal with them.
The function are_valid_hwcaps no longer exists.
Florian Krohm [Wed, 10 Dec 2014 16:08:09 +0000 (16:08 +0000)]
New function vfatal which should be used for user messages
to indicate a situation that can legitimately occur but that
we cannot handle today. The function does not return.