Julian Seward [Wed, 4 Mar 2015 12:35:54 +0000 (12:35 +0000)]
Fix problems due to generating Neon instructions on non-Neon capable
hosts:
* iselNeon64Expr, iselNeonExpr: assert that the host is actually
Neon-capable.
* iselIntExpr_R_wrk, existing cases for Iop_GetElem8x8,
Iop_GetElem16x4, Iop_GetElem32x2, Iop_GetElem8x16, Iop_GetElem16x8,
Iop_GetElem32x4:
Limit these to cases where the host is Neon capable, else we wind up
generating code which can't run on the host.
* iselIntExpr_R_wrk: add alternative implementation for
Iop_GetElem32x2 for non-Neon capable hosts.
Julian Seward [Fri, 27 Feb 2015 13:33:56 +0000 (13:33 +0000)]
Add machinery to try and transform A ^ ((A ^ B) & M)
into (A ^ ~M) | (B & M).
The former is MSVC's optimised idiom for bitfield assignment, the
latter is GCC's idiom. The former causes Memcheck problems because it
doesn't understand that (in this complex case) XORing an undefined
value with itself produces a defined result.
Believed to be working but currently disabled. To re-enable, change
if (0) to if (1) at line 6651. Fixes, to some extent, and when
enabled, bug 344382.
Julian Seward [Fri, 27 Feb 2015 13:22:48 +0000 (13:22 +0000)]
Enhance the CSE pass so it can common up loads from memory. Disabled
by default since this is a somewhat dodgy proposition in the presence
of spinloops and racy accesses.
Julian Seward [Fri, 27 Feb 2015 13:06:43 +0000 (13:06 +0000)]
Tidy up of CSE. Create functions irExpr_to_TmpOrConst,
tmpOrConst_to_IRExpr and subst_AvailExpr_TmpOrConst and use them
instead of in-line code. No functional change.
Julian Seward [Sun, 8 Feb 2015 18:24:38 +0000 (18:24 +0000)]
Implement all remaining FP multiple style instructions:
FMULX d_d_d, s_s_s
FMLA d_d_d[], s_s_s[]
FMLS d_d_d[], s_s_s[]
FMUL d_d_d[], s_s_s[]
FMULX d_d_d[], s_s_s[]
FMULX 2d_2d_2d, 4s_4s_4s, 2s_2s_2s
FMULX 2d_2d_d[], 4s_4s_s[], 2s_2s_s[]
The FMULX variants are currently handed the same as FMUL. This is a
kludge that will have to be fixed at some point.
Julian Seward [Thu, 5 Feb 2015 12:53:20 +0000 (12:53 +0000)]
Make a very minor change to the LibVEX_Translate interface (sub-arg of
needs_self_check) which allows VEX's user to selectively override, on
a per-translation basis, the default precise-exception control setting
that is specified in VexControl::iropt_register_updates. Fix up
plumbing inside iropt so as to used passed-in values rather than the
default one.
Julian Seward [Tue, 27 Jan 2015 23:35:58 +0000 (23:35 +0000)]
Change AMD64Instr_CMov64 so that the source can only be a register
instead of register-or-memory (an AMD64RM). This avoids duplicating
conditional load functionality introduced in r3075 via
AMD64Instr_CLoad and in practice has no effect on the quality of the
generated code.
Julian Seward [Tue, 27 Jan 2015 23:17:02 +0000 (23:17 +0000)]
AMD64 front end: translate AVX2 PMASKMOV load instructions (vector
conditional loads) using IR conditional load statements IRLoadG rather
than the previous rather ingenious hack.
AMD64 back end:
* Add instruction selection etc for 32- and 64-bit conditional loads (IRLoadG)
* Handle dirty helper calls that return a value and that are conditional. These
result from Memcheck's instrumentation of IRLoadGs.
No functional change. This is a cleanup as part of supporting AVX2
PMASKMOV loads and stores by using the existing IR facilities for
conditional loads and stores.
The toUInt() should only be used if we are running in 32-bit mode. The lines
were changed to only convert the pointer to 32-bit if running in 32-bit mode.
There is no bugzilla for this issue. It was noticed by Florian Krohm.
Fix assert
vex: priv/guest_generic_bb_to_IR.c:224 (bb_to_IR): Assertion `vex_control.guest_max_insns < 100' failed.
caused by giving --vex-guest-max-insns=100
100 should be allowed as described by --help-debug:
--vex-guest-max-insns=<1..100> [50]
Florian Krohm [Sun, 4 Jan 2015 17:20:19 +0000 (17:20 +0000)]
Change remaining use of Addr64 in the VEX API to Addr. The reduces
the size of VexGuestExtent to 20 bytes on a 32-bit platform.
Change prototypes of x86g_dirtyhelper_loadF80le and
x86g_dirtyhelper_storeF80le to give the address in the parameter
list type Addr. Likewise for amd64g_dirtyhelper_loadF80le and
amd64g_dirtyhelper_storeF80le.
Update switchback.c - but not tested.
Florian Krohm [Wed, 31 Dec 2014 12:09:38 +0000 (12:09 +0000)]
It has long been assumed that host and guest architectures
are the same - even though the initial design goal was likely
different allowing a cross-valgrind of sorts. But as Julian
put it:
But it's been 12+ years and I've never once heard any mention of
such a thing. So perhaps it's time to give up on that one.
Now let's take advantage of this decision and tighten up the VEX
API using Addr instead of Addr64. As a first step move the definition
of Addr into VEX proper and change the chase_into_ok callback
accordingly.
Florian Krohm [Mon, 29 Dec 2014 22:18:58 +0000 (22:18 +0000)]
As a library, VEX should not export the offsetof and vg_alignof
macros. The latter isn't even used by VEX.
Move them to pub_tool_basics.h.
offsetof also goes to VEX's private header main_util.h.
On amd64, We handle GS similar to FS, i.e. consider it is constant.
Note that FS is not always 0 on linux. It looks rather to be constant
in all threads, and is zero in the main thread.
As values for FS and/or GS differs between platforms (linux or darwin),
FS_CONST and GS_CONST are used.
Note that we cannot easily test that the value of GS or FS is the
expected one, as the value might not be set at the begin of execution
but only set after prctl has been executed.
So, we just hope that effectively GS and FS are constant.
Some trials to set GS to other values that the expected
constant value on linux was causing a SEGV.
So, it looks like this is all effectively protected.
In summary: we were counting somewhat on the luck for FS,
we now similarly count on luch for GS
Florian Krohm [Mon, 15 Dec 2014 21:55:16 +0000 (21:55 +0000)]
Remove quote.txt and newline.txt as they are no longer needed.
Once upon a time those files were used to construct a
header file vex_svnversion.h but that more hassle than it
was worth and eventually it got nuked.
With this change, the user experience will be smoewhat better, e.g.:
VEX: Support for AVX2 requires AVX capabilities
Found: amd64-cx16-rdtscp-sse3-avx2
Cannot continue. Good-bye
Specifically, the patch decouples showing hwcaps and deciding their validity.
show_hwcaps_<ARCH> reports the hwcaps it finds. It never returns NULL.
check_hwcaps checks the hwcaps for feasibility and does not return in case
VEX cannot deal with them.
The function are_valid_hwcaps no longer exists.
Florian Krohm [Wed, 10 Dec 2014 16:08:09 +0000 (16:08 +0000)]
New function vfatal which should be used for user messages
to indicate a situation that can legitimately occur but that
we cannot handle today. The function does not return.
Florian Krohm [Mon, 8 Dec 2014 14:01:33 +0000 (14:01 +0000)]
The long displacement facility is now required. There were a
few spots in the code where this was assumed implicitly.
Ugly fixes were possible, but requiring this facility is not
unreasonable as it has been around sind 2003. So let's just
do this.
Florian Krohm [Fri, 5 Dec 2014 18:55:39 +0000 (18:55 +0000)]
Encountering a PFPO insn in a client program while running on a host
that does not have that insn now causes an emulation error.
Previously, it caused a failing assertion which was incorrect.
Florian Krohm [Sat, 22 Nov 2014 20:10:21 +0000 (20:10 +0000)]
Add function s390_isel_amode_b12_b20 to compile an expression into an
amode that is either S390_AMODE_B12 or S390_AMODE_B20. This is needed
for compare-and-swap insns. As we're currently not generating amodes
using an index register, there was never a problem.
This change future-proofs the code.
Also add a few more asserts for amodes in the s390_insns supporting
translation chaining.
Fixes BZ #269360.
Florian Krohm [Thu, 20 Nov 2014 15:08:56 +0000 (15:08 +0000)]
This change was triggered by BZ #247974 which suggested to include
VEX/test_main.* in the tarball. We don't want to do that because those
files are really just scaffolding for developers to play with and not
meant for general consumption (and are also bitrotting ATM). Therefore,
this patch moves them to the "useful" subdirectory and adds a crude
Makefile there to build the executable.
Makefile-gcc updated accordingly.
Julian Seward [Tue, 11 Nov 2014 12:49:21 +0000 (12:49 +0000)]
Add a nasty temporary kludge to CPUID that allows 64-bit MacOSX 10.10
(Yosemite) to run, until such time as XSAVE and XRSTOR are implemented.
Detailed in the comments. All other targets should be unaffected.
Florian Krohm [Sat, 11 Oct 2014 14:48:38 +0000 (14:48 +0000)]
Merge the memory allocation bits from libvex.h into main_util.c.
This is to avoid linkage problems due to unresolved symbols for
some compilers. See also valgrind r14600 and BZ #339542.
Carl Love [Thu, 9 Oct 2014 21:08:25 +0000 (21:08 +0000)]
This patch makes the needed changes to the lxvw4x for Little Endian.
The data was being loaded in the Big Endian data order for most
cases. The code in host_ppc_isel.c was changed to do a right
shift and to permute the hi and lo registers in the other order
to ensure the data was always loaded in BE order. The lxvw4x
emulation in guest_ppc_toIR.c was changed to permute the data from
the BE order to LE order when running on an LE system.