Carl Love [Tue, 3 Nov 2015 17:44:55 +0000 (17:44 +0000)]
Add ISA 2.07 vbit test support
The ISA 2.07 support adds new Iops as well as support for some existing
Iops. None of these Iops have been enabled in the vbit tester. This commit
adds the needed support to the files VEX/priv/ir_inject and VEX/pub/libvex.h.
These changes add support for additional immediate operands.
There are additional changes to the memcheck files to complete the ISA 2.07
support.
Florian Krohm [Fri, 16 Oct 2015 17:26:22 +0000 (17:26 +0000)]
Give typeOfPrimop external linkage. This allows us to simplify
memcheck/tests/vbit-test which used to have local copies of certain
functions from ir_defs.c
Mark Wielaard [Thu, 1 Oct 2015 12:31:19 +0000 (12:31 +0000)]
Don't advertise RDRAND in cpuid for Core-i7-4910-like avx2 machine.
Bug#353370. In amd64g_dirtyhelper_CPUID_avx2 we set the RDRAND bit
but we don't implement support for RDRAND. Turn the bit off so programs
don't try to use RDRAND when running under valgrind.
Carl Love [Mon, 21 Sep 2015 21:46:46 +0000 (21:46 +0000)]
Fix, Add support for the Power PC Program Priority Register
Commit r3189 had a typo in it. In function LibVEX_GuestPPC64_initialise()
the value of vex_state->guest_PSPB is initialized to 0x0. The intention was
for it to be initialized to 0x100. This commit fixes the typo.
The original commit message:
Added the Program Priority Register (PPR), support to read and write it
via the mfspr and mtspr instructions as well as the special OR instruction
No Op instructions. The setting of the PPR is dependent on the value in
the Problem State Priority Boost register. Basic support for this register
was added. Not all of the PSPB register functionality was added.
Carl Love [Wed, 16 Sep 2015 22:26:59 +0000 (22:26 +0000)]
Add support for the Power PC Program Priority Register
Added the Program Priority Register (PPR), support to read and write it
via the mfspr and mtspr instructions as well as the special OR instruction
No Op instructions. The setting of the PPR is dependent on the value in
the Problem State Priority Boost register. Basic support for this register
was added. Not all of the PSPB register functionality was added.
s390: Add support for fixbr(a) instructions.
New IROp Iop_RoundF128toInt.
Patch by Andreas Arnez <arnez@linux.vnet.ibm.com>.
Part of fixing BZ #350290.
Petar Jovanovic [Mon, 13 Jul 2015 00:04:28 +0000 (00:04 +0000)]
mips: emit addiu instead of addi
Remove wrong emission of addi when addiu is correct, sufficient and needed.
Attention to this part of the code has been brought by BZ #338924. This
patch fixes the reported issue as well.
Add some functions for misaligned load/store support, and use them
in the x86 and amd64 chainer/unchainer. This makes it possible to
run at least some programs when built with gcc 5.1, with ubsan misaligned
checking enabled.
Petar Jovanovic [Wed, 24 Jun 2015 18:47:39 +0000 (18:47 +0000)]
mips64: do not use 64-bit loads for lwl/lwr instructions
As reported in BZ #346562, lwl/lwr were implemented incorrectly using
64-bit loads. This has led to incorrect "invalid read of size 8"
warnings. This patch fixes it, and it does some formatting to make the
code more readable.
Original version of the patch proposed by Crestez Dan Leonard.
A SSE2 only CPU was reported to the guest as a SSE3 CPU.
The guest code might then select functions calling invalid
instructions.
E.G. giving:
vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0xF
==13094== valgrind: Unrecognised instruction at address 0x496d4d3.
==13094== at 0x496D4D3: __mempcpy_ssse3 (memcpy-ssse3.S:771)
==13094== by 0x125E0B: ??? (in /bin/dash)
as the host hw cap is not SSE3 enabled, while the guest believes
SSE3 can be used.
So, change CPUID so as to report an SSE3 if the hw is SSE3,
and otherwise SSE1 or lower.
Carl Love [Fri, 5 Jun 2015 18:52:57 +0000 (18:52 +0000)]
Opps, missed a change in the previous patch. Forgot to remove the format
specifier.
The dcbt and dcbtst instructions provide a non-zero hint that describes
a block or data stream to which a program may perform a Store acces,
or indicates the expected use. The field bits[25:21] (bits 6:10 in
the IBM numbering) in the instruction provide the hint.
Valgrind checks that these bits are non-zero. Unfortunately, the test was
being applied to other instructions such as the dcbf instruction causing
it to fail when the field was equal to zero. This patch removes the check
that was being incorrectly applied to all of the instructions.
Carl Love [Fri, 5 Jun 2015 17:58:23 +0000 (17:58 +0000)]
The dcbt and dcbtst instructions provide a non-zero hint that describes
a block or data stream to which a program may perform a Store acces,
or indicates the expected use. The field bits[25:21] (bits 6:10 in
the IBM numbering) in the instruction provide the hint.
Valgrind checks that these bits are non-zero. Unfortunately, the test was
being applied to other instructions such as the dcbf instruction causing
it to fail when the field was equal to zero. This patch removes the check
that was being incorrectly applied to all of the instructions.
Carl Love [Wed, 29 Apr 2015 20:37:29 +0000 (20:37 +0000)]
Improve the error messages for the PPC platform to be more clear when Valgrind detects that
the underlying hardware doesn't have the needed capability. A number of the checks for DFP
support were going to "decode_failure" instead of "decode_noDFP". These issues are also fixed.
Remove VexGuestTILEGXStateAlignment as the guest state size of any architecture
must satisfy the LibVEX_GUEST_STATE_ALIGN requirement. So use that instead.
Carl Love [Wed, 22 Apr 2015 16:15:41 +0000 (16:15 +0000)]
Add support for the TEXASRU register. This register contains information on
transactional memory instruction summary information. This register contains
the upper 32-bits of the transaction information. Note, the valgrind
implementation of transactional memory instructions is limited. Currently, the
contents of the TEXASRU register will always return 0. The lower 64-bits of
the trasnaction information in the TEXASR register will contain the failure
information as setup by Valgrind.
This commit contains the changes needed to support the TEXASRU register on
PPC64.
This support requires changing the value of MAX_REG_WRITE_SIZE in
memcheck/mc_main.c from 1696 to 1712. The change is made in the corresponding
valgrind commit.
Carl Love [Fri, 17 Apr 2015 23:42:40 +0000 (23:42 +0000)]
Add support for the lbarx, lharx, stbcx and sthcs instructions.
The instructions are part of the ISA 2.06 but were not implemented
in all versions of hardware. The four instructions are all supported
in ISA 2.07. The instructions were put under the ISA 2.07 category
of supported instructions in this patch.
Carl Love [Thu, 16 Apr 2015 17:09:09 +0000 (17:09 +0000)]
The following regression tests failures occur on PPC64 little endian only.
The regression test none/tests/jm_vec/isa_2_07 has failures on the lxsiwax and
lxsiwzx instructions. They are loads and the the results are correct for
big endian but not little endian. The little endian result matches the
expected big endian result.
The regresssion test none/tests/test_isa_2_07_part2 has a failure with the
vbpermq instruction. The little endian result matches the expected result for
big endian. The upper and lower 64 bits of the result are not swapped correctly
for little endian.
amd64 front and back ends: track the change of type of Iop_Sqrt32Fx4
and Iop_Sqrt64Fx2 as introduced in r3120, in which they acquired a
rounding-mode argument.
arm64: implement FSQRT 2d_2d, 4s_4s, 2s_2s
AFAICS this completes the AArch64 SIMD implementation, except for the
crypto instructions.
This changes the type of Iop_Sqrt64x2 and Iop_Sqrt32x4 so as to take a
rounding mode argument. This will (temporarily, of course) break all
of the other targets that implement vector fsqrt.
This patch reduces the size of all tools by about 2MB of text
(depending on the arch).
This has as advantages:
1. somewhat faster build/link time (very probably neglectible)
2. somewhat faster tool startup (probably neglectible for most users,
but regression tests are helped by this)
3. a gain in memory of about 10MB
The valgrind tools are making the assumption that host and guest
are the same. So, no need to drag the full set of archs when
linking a tool.
The VEX library is nicely split in arch independent and arch dependent
objects. Only main_main.c is dragging the various arch specific files.
So, main_main.c (the main entry point of the VEX library) is compiled
only for the current guest/host arch.
The disadvantage of the above is that the VEX lib cannot be used
anymore with host and guest different, while VEX is able to do that
(i.e. does not make the assumption that host and guest are the same).
So, to still allow a VEX user to use the VEX lib in a multi arch setup,
main_main.c is compiled twice:
1. in 'single arch mode', going in the libvex-<arch>-<os>
2. in 'multi arch mode', going in a new lib libvexmultiarch-<arch>-<os>
A VEX user can choose at link time to link with the main_main
that is multi-arch, by linking with both libs (the multi arch being
the first one).
Here is a small (rubbish crashing) standalone usage of the VEX lib,
first linked in single arch, then linked in multi-arch:
In a next commit, some regtests will be added to validate that the two libs
are working properly (and that no arch specific symbol is missing when
linking multi-arch)
Julian Seward [Mon, 30 Mar 2015 08:50:27 +0000 (08:50 +0000)]
Add IR level support for 16 bit floating point types (Ity_F16) and add
four new IROps that use it:
Iop_F16toF64, Iop_F64toF16, Iop_F16toF32, Iop_F32toF16.
Julian Seward [Thu, 26 Mar 2015 07:18:32 +0000 (07:18 +0000)]
Bug 345215 - Performance improvements for the register allocator
The basic idea is to change the representation of registers (HReg) so
as to give Real registers a unique integer index starting from 0, with
the registers available for allocation numbered consectively from zero
upwards. This allows the register allocator to index into its primary
data structure -- a table tracking the status of each available
register -- using normal array index instead of having to search
sequentially through the table, as now.
It also allows an efficient bitmap-based representation for "set of
Real registers", which is important for the NCODE work.
There are various other perf improvements, most notably in calling
getRegUsage once rather than twice per instruction.
Cost of register allocation is reduced to around 65% ish of what it
previously was. This translates in to speedups close to zero for
compute intensive code up to around 7% for JITing intensive
situations, eg "time perl tests/vg_regtest memcheck/tests/amd64".
Florian Krohm [Fri, 13 Mar 2015 12:46:49 +0000 (12:46 +0000)]
r2974 moved the inline definition of LibVEX_Alloc from libvex.h
to main_util.c because it caused linker problems with ICC.
See comments in BZ #339542.
This change re-enables inlining of that function by adding it
(renamed as LibVEX_Alloc_inline) to main_util.h.
500+ callsites changed accordingly.
Florian Krohm [Thu, 12 Mar 2015 11:01:12 +0000 (11:01 +0000)]
Fix build problems. The code has been bitrotting for some time.
Note, that while the file compiles and links, not all IROps are handled.
So there may be runtime problems.
Fixes BZ #345079. Patch by Ivo Raisr (ivosh@ivosh.net).
Julian Seward [Wed, 4 Mar 2015 12:35:54 +0000 (12:35 +0000)]
Fix problems due to generating Neon instructions on non-Neon capable
hosts:
* iselNeon64Expr, iselNeonExpr: assert that the host is actually
Neon-capable.
* iselIntExpr_R_wrk, existing cases for Iop_GetElem8x8,
Iop_GetElem16x4, Iop_GetElem32x2, Iop_GetElem8x16, Iop_GetElem16x8,
Iop_GetElem32x4:
Limit these to cases where the host is Neon capable, else we wind up
generating code which can't run on the host.
* iselIntExpr_R_wrk: add alternative implementation for
Iop_GetElem32x2 for non-Neon capable hosts.
Julian Seward [Fri, 27 Feb 2015 13:33:56 +0000 (13:33 +0000)]
Add machinery to try and transform A ^ ((A ^ B) & M)
into (A ^ ~M) | (B & M).
The former is MSVC's optimised idiom for bitfield assignment, the
latter is GCC's idiom. The former causes Memcheck problems because it
doesn't understand that (in this complex case) XORing an undefined
value with itself produces a defined result.
Believed to be working but currently disabled. To re-enable, change
if (0) to if (1) at line 6651. Fixes, to some extent, and when
enabled, bug 344382.
Julian Seward [Fri, 27 Feb 2015 13:22:48 +0000 (13:22 +0000)]
Enhance the CSE pass so it can common up loads from memory. Disabled
by default since this is a somewhat dodgy proposition in the presence
of spinloops and racy accesses.
Julian Seward [Fri, 27 Feb 2015 13:06:43 +0000 (13:06 +0000)]
Tidy up of CSE. Create functions irExpr_to_TmpOrConst,
tmpOrConst_to_IRExpr and subst_AvailExpr_TmpOrConst and use them
instead of in-line code. No functional change.