amd64 front and back ends: track the change of type of Iop_Sqrt32Fx4
and Iop_Sqrt64Fx2 as introduced in r3120, in which they acquired a
rounding-mode argument.
arm64: implement FSQRT 2d_2d, 4s_4s, 2s_2s
AFAICS this completes the AArch64 SIMD implementation, except for the
crypto instructions.
This changes the type of Iop_Sqrt64x2 and Iop_Sqrt32x4 so as to take a
rounding mode argument. This will (temporarily, of course) break all
of the other targets that implement vector fsqrt.
This patch reduces the size of all tools by about 2MB of text
(depending on the arch).
This has as advantages:
1. somewhat faster build/link time (very probably neglectible)
2. somewhat faster tool startup (probably neglectible for most users,
but regression tests are helped by this)
3. a gain in memory of about 10MB
The valgrind tools are making the assumption that host and guest
are the same. So, no need to drag the full set of archs when
linking a tool.
The VEX library is nicely split in arch independent and arch dependent
objects. Only main_main.c is dragging the various arch specific files.
So, main_main.c (the main entry point of the VEX library) is compiled
only for the current guest/host arch.
The disadvantage of the above is that the VEX lib cannot be used
anymore with host and guest different, while VEX is able to do that
(i.e. does not make the assumption that host and guest are the same).
So, to still allow a VEX user to use the VEX lib in a multi arch setup,
main_main.c is compiled twice:
1. in 'single arch mode', going in the libvex-<arch>-<os>
2. in 'multi arch mode', going in a new lib libvexmultiarch-<arch>-<os>
A VEX user can choose at link time to link with the main_main
that is multi-arch, by linking with both libs (the multi arch being
the first one).
Here is a small (rubbish crashing) standalone usage of the VEX lib,
first linked in single arch, then linked in multi-arch:
In a next commit, some regtests will be added to validate that the two libs
are working properly (and that no arch specific symbol is missing when
linking multi-arch)
Julian Seward [Mon, 30 Mar 2015 08:50:27 +0000 (08:50 +0000)]
Add IR level support for 16 bit floating point types (Ity_F16) and add
four new IROps that use it:
Iop_F16toF64, Iop_F64toF16, Iop_F16toF32, Iop_F32toF16.
Julian Seward [Thu, 26 Mar 2015 07:18:32 +0000 (07:18 +0000)]
Bug 345215 - Performance improvements for the register allocator
The basic idea is to change the representation of registers (HReg) so
as to give Real registers a unique integer index starting from 0, with
the registers available for allocation numbered consectively from zero
upwards. This allows the register allocator to index into its primary
data structure -- a table tracking the status of each available
register -- using normal array index instead of having to search
sequentially through the table, as now.
It also allows an efficient bitmap-based representation for "set of
Real registers", which is important for the NCODE work.
There are various other perf improvements, most notably in calling
getRegUsage once rather than twice per instruction.
Cost of register allocation is reduced to around 65% ish of what it
previously was. This translates in to speedups close to zero for
compute intensive code up to around 7% for JITing intensive
situations, eg "time perl tests/vg_regtest memcheck/tests/amd64".
Florian Krohm [Fri, 13 Mar 2015 12:46:49 +0000 (12:46 +0000)]
r2974 moved the inline definition of LibVEX_Alloc from libvex.h
to main_util.c because it caused linker problems with ICC.
See comments in BZ #339542.
This change re-enables inlining of that function by adding it
(renamed as LibVEX_Alloc_inline) to main_util.h.
500+ callsites changed accordingly.
Florian Krohm [Thu, 12 Mar 2015 11:01:12 +0000 (11:01 +0000)]
Fix build problems. The code has been bitrotting for some time.
Note, that while the file compiles and links, not all IROps are handled.
So there may be runtime problems.
Fixes BZ #345079. Patch by Ivo Raisr (ivosh@ivosh.net).
Julian Seward [Wed, 4 Mar 2015 12:35:54 +0000 (12:35 +0000)]
Fix problems due to generating Neon instructions on non-Neon capable
hosts:
* iselNeon64Expr, iselNeonExpr: assert that the host is actually
Neon-capable.
* iselIntExpr_R_wrk, existing cases for Iop_GetElem8x8,
Iop_GetElem16x4, Iop_GetElem32x2, Iop_GetElem8x16, Iop_GetElem16x8,
Iop_GetElem32x4:
Limit these to cases where the host is Neon capable, else we wind up
generating code which can't run on the host.
* iselIntExpr_R_wrk: add alternative implementation for
Iop_GetElem32x2 for non-Neon capable hosts.
Julian Seward [Fri, 27 Feb 2015 13:33:56 +0000 (13:33 +0000)]
Add machinery to try and transform A ^ ((A ^ B) & M)
into (A ^ ~M) | (B & M).
The former is MSVC's optimised idiom for bitfield assignment, the
latter is GCC's idiom. The former causes Memcheck problems because it
doesn't understand that (in this complex case) XORing an undefined
value with itself produces a defined result.
Believed to be working but currently disabled. To re-enable, change
if (0) to if (1) at line 6651. Fixes, to some extent, and when
enabled, bug 344382.
Julian Seward [Fri, 27 Feb 2015 13:22:48 +0000 (13:22 +0000)]
Enhance the CSE pass so it can common up loads from memory. Disabled
by default since this is a somewhat dodgy proposition in the presence
of spinloops and racy accesses.
Julian Seward [Fri, 27 Feb 2015 13:06:43 +0000 (13:06 +0000)]
Tidy up of CSE. Create functions irExpr_to_TmpOrConst,
tmpOrConst_to_IRExpr and subst_AvailExpr_TmpOrConst and use them
instead of in-line code. No functional change.
Julian Seward [Sun, 8 Feb 2015 18:24:38 +0000 (18:24 +0000)]
Implement all remaining FP multiple style instructions:
FMULX d_d_d, s_s_s
FMLA d_d_d[], s_s_s[]
FMLS d_d_d[], s_s_s[]
FMUL d_d_d[], s_s_s[]
FMULX d_d_d[], s_s_s[]
FMULX 2d_2d_2d, 4s_4s_4s, 2s_2s_2s
FMULX 2d_2d_d[], 4s_4s_s[], 2s_2s_s[]
The FMULX variants are currently handed the same as FMUL. This is a
kludge that will have to be fixed at some point.
Julian Seward [Thu, 5 Feb 2015 12:53:20 +0000 (12:53 +0000)]
Make a very minor change to the LibVEX_Translate interface (sub-arg of
needs_self_check) which allows VEX's user to selectively override, on
a per-translation basis, the default precise-exception control setting
that is specified in VexControl::iropt_register_updates. Fix up
plumbing inside iropt so as to used passed-in values rather than the
default one.
Julian Seward [Tue, 27 Jan 2015 23:35:58 +0000 (23:35 +0000)]
Change AMD64Instr_CMov64 so that the source can only be a register
instead of register-or-memory (an AMD64RM). This avoids duplicating
conditional load functionality introduced in r3075 via
AMD64Instr_CLoad and in practice has no effect on the quality of the
generated code.
Julian Seward [Tue, 27 Jan 2015 23:17:02 +0000 (23:17 +0000)]
AMD64 front end: translate AVX2 PMASKMOV load instructions (vector
conditional loads) using IR conditional load statements IRLoadG rather
than the previous rather ingenious hack.
AMD64 back end:
* Add instruction selection etc for 32- and 64-bit conditional loads (IRLoadG)
* Handle dirty helper calls that return a value and that are conditional. These
result from Memcheck's instrumentation of IRLoadGs.
No functional change. This is a cleanup as part of supporting AVX2
PMASKMOV loads and stores by using the existing IR facilities for
conditional loads and stores.
The toUInt() should only be used if we are running in 32-bit mode. The lines
were changed to only convert the pointer to 32-bit if running in 32-bit mode.
There is no bugzilla for this issue. It was noticed by Florian Krohm.
Fix assert
vex: priv/guest_generic_bb_to_IR.c:224 (bb_to_IR): Assertion `vex_control.guest_max_insns < 100' failed.
caused by giving --vex-guest-max-insns=100
100 should be allowed as described by --help-debug:
--vex-guest-max-insns=<1..100> [50]
Florian Krohm [Sun, 4 Jan 2015 17:20:19 +0000 (17:20 +0000)]
Change remaining use of Addr64 in the VEX API to Addr. The reduces
the size of VexGuestExtent to 20 bytes on a 32-bit platform.
Change prototypes of x86g_dirtyhelper_loadF80le and
x86g_dirtyhelper_storeF80le to give the address in the parameter
list type Addr. Likewise for amd64g_dirtyhelper_loadF80le and
amd64g_dirtyhelper_storeF80le.
Update switchback.c - but not tested.
Florian Krohm [Wed, 31 Dec 2014 12:09:38 +0000 (12:09 +0000)]
It has long been assumed that host and guest architectures
are the same - even though the initial design goal was likely
different allowing a cross-valgrind of sorts. But as Julian
put it:
But it's been 12+ years and I've never once heard any mention of
such a thing. So perhaps it's time to give up on that one.
Now let's take advantage of this decision and tighten up the VEX
API using Addr instead of Addr64. As a first step move the definition
of Addr into VEX proper and change the chase_into_ok callback
accordingly.
Florian Krohm [Mon, 29 Dec 2014 22:18:58 +0000 (22:18 +0000)]
As a library, VEX should not export the offsetof and vg_alignof
macros. The latter isn't even used by VEX.
Move them to pub_tool_basics.h.
offsetof also goes to VEX's private header main_util.h.
On amd64, We handle GS similar to FS, i.e. consider it is constant.
Note that FS is not always 0 on linux. It looks rather to be constant
in all threads, and is zero in the main thread.
As values for FS and/or GS differs between platforms (linux or darwin),
FS_CONST and GS_CONST are used.
Note that we cannot easily test that the value of GS or FS is the
expected one, as the value might not be set at the begin of execution
but only set after prctl has been executed.
So, we just hope that effectively GS and FS are constant.
Some trials to set GS to other values that the expected
constant value on linux was causing a SEGV.
So, it looks like this is all effectively protected.
In summary: we were counting somewhat on the luck for FS,
we now similarly count on luch for GS
Florian Krohm [Mon, 15 Dec 2014 21:55:16 +0000 (21:55 +0000)]
Remove quote.txt and newline.txt as they are no longer needed.
Once upon a time those files were used to construct a
header file vex_svnversion.h but that more hassle than it
was worth and eventually it got nuked.
With this change, the user experience will be smoewhat better, e.g.:
VEX: Support for AVX2 requires AVX capabilities
Found: amd64-cx16-rdtscp-sse3-avx2
Cannot continue. Good-bye
Specifically, the patch decouples showing hwcaps and deciding their validity.
show_hwcaps_<ARCH> reports the hwcaps it finds. It never returns NULL.
check_hwcaps checks the hwcaps for feasibility and does not return in case
VEX cannot deal with them.
The function are_valid_hwcaps no longer exists.
Florian Krohm [Wed, 10 Dec 2014 16:08:09 +0000 (16:08 +0000)]
New function vfatal which should be used for user messages
to indicate a situation that can legitimately occur but that
we cannot handle today. The function does not return.
Florian Krohm [Mon, 8 Dec 2014 14:01:33 +0000 (14:01 +0000)]
The long displacement facility is now required. There were a
few spots in the code where this was assumed implicitly.
Ugly fixes were possible, but requiring this facility is not
unreasonable as it has been around sind 2003. So let's just
do this.
Florian Krohm [Fri, 5 Dec 2014 18:55:39 +0000 (18:55 +0000)]
Encountering a PFPO insn in a client program while running on a host
that does not have that insn now causes an emulation error.
Previously, it caused a failing assertion which was incorrect.
Florian Krohm [Sat, 22 Nov 2014 20:10:21 +0000 (20:10 +0000)]
Add function s390_isel_amode_b12_b20 to compile an expression into an
amode that is either S390_AMODE_B12 or S390_AMODE_B20. This is needed
for compare-and-swap insns. As we're currently not generating amodes
using an index register, there was never a problem.
This change future-proofs the code.
Also add a few more asserts for amodes in the s390_insns supporting
translation chaining.
Fixes BZ #269360.
Florian Krohm [Thu, 20 Nov 2014 15:08:56 +0000 (15:08 +0000)]
This change was triggered by BZ #247974 which suggested to include
VEX/test_main.* in the tarball. We don't want to do that because those
files are really just scaffolding for developers to play with and not
meant for general consumption (and are also bitrotting ATM). Therefore,
this patch moves them to the "useful" subdirectory and adds a crude
Makefile there to build the executable.
Makefile-gcc updated accordingly.