Julian Seward [Fri, 1 Jun 2012 16:09:50 +0000 (16:09 +0000)]
Enhance the guest state effects notation on IRDirty calls, so as to be
able to describe accesses to arrays of non-consecutive guest state
sections. This is needed to describe the behaviour of FXSAVE and
FXRSTOR in an environment where we also support AVX.
The IRDirty struct has got smaller (112 bytes vs 136 before, for a 64
bit target) whilst holding more information.
The new facility is then used to describe said FXSAVE and FXRSTOR on
amd64. For x86 there is no change since we don't model AVX state for
x86.
Florian Krohm [Thu, 31 May 2012 15:46:18 +0000 (15:46 +0000)]
Reduce size of an IRStmt from 40 bytes to 32 bytes on LP64
by allocating the details of a PutI statement into a struct
of its own and link to that (as is being done for Dirty and CAS).
Julian Seward [Sun, 27 May 2012 16:18:13 +0000 (16:18 +0000)]
Remove, or (where it might later come in handy) comment out artefacts
for 256 bit (AVX) code generation on amd64. Although that was the
plan at first, it turns out to be infeasible to generate 256 bit
instructions for the IR created by Memcheck's instrumentation of 256
bit Ity_V256 IR. This is because it would require 256 bit integer
SIMD operations, and AVX as currently available only provides 256 bit
operations for floating point. So, fall back to generating 256 IR
into 128-bit XMM register pairs, and using the existing SSE facilities
in the back end. This change only affects the amd64 back end -- it
does not affect IR, which remains unchanged, and capable of
representing 256 bit vector operations wherever needed.
Julian Seward [Thu, 24 May 2012 06:17:14 +0000 (06:17 +0000)]
Fix incorrect uses of disAMode in some SSE4 instructions that have an
immediate byte as a subopcode. Fixes #294260. (Patrick J. LoPresti,
lopresti@gmail.com)
Julian Seward [Tue, 22 May 2012 23:12:13 +0000 (23:12 +0000)]
Implement
VMOVQ xmm1, r64 = VEX.128.66.0F.W1 7E /r (reg case only)
If this is documented in the Intel manuals, I can't find it.
GNU binutils and GDB seem to have heard of it, though.
Julian Seward [Mon, 21 May 2012 15:45:34 +0000 (15:45 +0000)]
Ensure s390x guest state size is 32-byte aligned, as per increase in
alignment requirements resulting from r12569/r2330.
(Christian Borntraeger <borntraeger@de.ibm.com>)
Florian Krohm [Sat, 12 May 2012 15:26:44 +0000 (15:26 +0000)]
Eliminate helper s390_calculate_icc. Rewrite and factor the code to use
s390_calculate_cond instead. The benefit is that the latter has comprehensive
spec_helpers whereas the former had not.
Florian Krohm [Sat, 12 May 2012 03:44:49 +0000 (03:44 +0000)]
Back out VEX r2326. It was not working correctly. The guard condition
has to be evaluated after argument evaluation. Add clarifying comments
in libvex_ir.h
Florian Krohm [Wed, 9 May 2012 13:31:09 +0000 (13:31 +0000)]
Improve insn selection for helper calls. Attempt to evaluate arguments
into the real register that is mandated by the ABI instead of evaluating
it in a virtual register and then move the result.
Observed savings in insns between 0.5% and 1.4%.
Probably an overrated optimization given current helper functions which
rarely take more than one argument.
Florian Krohm [Sun, 6 May 2012 03:34:55 +0000 (03:34 +0000)]
Add the counter pseudo register to the list of guest registers to
be tracked during insn selection. Saves 0.2% or so of insns depending on
how often insns with implicit loops like MVC are being used.
Florian Krohm [Sat, 5 May 2012 00:01:16 +0000 (00:01 +0000)]
Add NC and OC to the list of insns that get special treatment under EX.
Refactored code such that s390_irgen_xonc can be reused thereby avoiding
code duplication.
Add support for POWER Power Decimal Floating Point (DFP) test class,
test group and test exponent instructions dtstdc, dtstdcq, dtstdg,
dtstdgq, dtstex and dtstexq. Bug #298862. (Carl Love,
carll@us.ibm.com)
For each backend, unify the sets of IRJumpKinds handled for Ist_Exit
and iselNext, so as to avoid potential failures caused by branch sense
switching at the IR level.
tchain optimisation for s390 (VEX bits)
Loading a 64-bit immediate into a register requires 4 insns on a
z900 machine, the oldest model supported. Depending on hardware
capabilities, newer machines can do the same using 2 insns.
Naturally, we want to take advantage of that.
However, currently, in disp_cp_chain_me_to_slowEP/fastEP we assume that
the length of loading a 64-bit immediate is a compile time constant:
S390_TCHAIN_LOAD64_LEN
For what we want to do this constant needs to be a runtime constant.
So in this patch we move this address arithmetic out of the dispatch
code. The general idea being that the value in %r1 does not need to
be adjusted to recover the place to patch. Upon reaching
disp_cp_chain_me_to_slowEP/fastEP %r1 contains the correct address.
We incorrectly stored the archinfo_host argument of iselSB_S390 into
a global variable not realising it points to a stack-allocated variable.
This caused s390_archinfo_host->hwcaps member to change its value
randomly over time. It could have caused invalid code to be generated.
Curious that it did not surface.
More fixes:
- A few dummy_put_IA's were missing, causing asserts to fire.
Mostly for the "load/store conditional" kind of insns
- EX needed some finishing touches
- Assignments to irsb->next are forbidden. We had a few in the "special
opcodes" section. Now fixed, I hope.
With this patch most regressions run through. I see 3 failures in none
and a few more in the memcheck bucket.
Fix s390_tchain_patch_load64; some bytes were mixed up.
Fix unchainXDirect_S390; modified place_to_unchain address
before patching the code there.
Add some convenience functions for insn verification in
chain/unchain machinery.
Avoid magic constants.
Extend CSE to cover CSEing of clean helper calls. This gives a
significant performance improvement in the baseline simulator (20%) on
some pieces of ARM code.
POWER Processor decimal floating point instruction support: part 2
(bug #297497) (Carl Love, carll@us.ibm.com) (VEX side)
This commit adds the second set of patches to add decimal floating
point (DFP) support for POWER to Valgrind. Bugzilla 295221 contains
the first set of patches for the adding the POWER support for the DFP
32, 64 and 128-bit sizes. The first set of patches also added support
for the 64 and 128-bit DFP arithmetic instructions and user test code
for the new DFP instructions. The second set of patches, being
submitted in this bugzilla include support for the DFP shift
instructions and format conversion instructions. Specifically, the
list of Power instructions is: dctdp, drsp, dctfix, dcffix, dctqpq,
dctfixq, drdpq, dcffixq, dscri, dscriq, dscli, dscliq.
Improve the behaviour of 64-to/from-80 bit QNaN conversions, so that
the QNaN produced is "canonical". SNaN conversions are unchanged
(because I don't have a definition of what a canonical SNaN is)
although there are some comment updates. Fixes Mozilla bug #738117.
Initial support for POWER Processor decimal floating point instruction
support -- VEX side changes. See #295221.
This patch adds the DFP 64-bit and 128-bit support, support for the
new IEEE rounding modes and the Add, Subtract, Multiply and Divide
instructions for both 64-bit and 128-bit instructions to Valgrind.
Carl Love (carll@us.ibm.com) and Maynard Johnson (maynardj@us.ibm.com)
Florian Krohm [Tue, 27 Mar 2012 03:09:49 +0000 (03:09 +0000)]
Consolidate guest state offset computation. There is only
one way. No need to precompute them and have them named in
three different ways.... Get rid of libvex_guest_offsets.h
dependency.
Julian Seward [Mon, 26 Mar 2012 09:44:39 +0000 (09:44 +0000)]
gcc seems to have taken to generating "orl $0xFFFFFFFF, %reg32" to get
-1 (32-bit) into a register. [Is this wise? Does the processor know
that this generates no dependency on the previous value of the
register?] Teach the constant folder about such cases, therefore.