tchain optimisation for s390 (VEX bits)
Loading a 64-bit immediate into a register requires 4 insns on a
z900 machine, the oldest model supported. Depending on hardware
capabilities, newer machines can do the same using 2 insns.
Naturally, we want to take advantage of that.
However, currently, in disp_cp_chain_me_to_slowEP/fastEP we assume that
the length of loading a 64-bit immediate is a compile time constant:
S390_TCHAIN_LOAD64_LEN
For what we want to do this constant needs to be a runtime constant.
So in this patch we move this address arithmetic out of the dispatch
code. The general idea being that the value in %r1 does not need to
be adjusted to recover the place to patch. Upon reaching
disp_cp_chain_me_to_slowEP/fastEP %r1 contains the correct address.
We incorrectly stored the archinfo_host argument of iselSB_S390 into
a global variable not realising it points to a stack-allocated variable.
This caused s390_archinfo_host->hwcaps member to change its value
randomly over time. It could have caused invalid code to be generated.
Curious that it did not surface.
More fixes:
- A few dummy_put_IA's were missing, causing asserts to fire.
Mostly for the "load/store conditional" kind of insns
- EX needed some finishing touches
- Assignments to irsb->next are forbidden. We had a few in the "special
opcodes" section. Now fixed, I hope.
With this patch most regressions run through. I see 3 failures in none
and a few more in the memcheck bucket.
Fix s390_tchain_patch_load64; some bytes were mixed up.
Fix unchainXDirect_S390; modified place_to_unchain address
before patching the code there.
Add some convenience functions for insn verification in
chain/unchain machinery.
Avoid magic constants.
Extend CSE to cover CSEing of clean helper calls. This gives a
significant performance improvement in the baseline simulator (20%) on
some pieces of ARM code.
POWER Processor decimal floating point instruction support: part 2
(bug #297497) (Carl Love, carll@us.ibm.com) (VEX side)
This commit adds the second set of patches to add decimal floating
point (DFP) support for POWER to Valgrind. Bugzilla 295221 contains
the first set of patches for the adding the POWER support for the DFP
32, 64 and 128-bit sizes. The first set of patches also added support
for the 64 and 128-bit DFP arithmetic instructions and user test code
for the new DFP instructions. The second set of patches, being
submitted in this bugzilla include support for the DFP shift
instructions and format conversion instructions. Specifically, the
list of Power instructions is: dctdp, drsp, dctfix, dcffix, dctqpq,
dctfixq, drdpq, dcffixq, dscri, dscriq, dscli, dscliq.
Improve the behaviour of 64-to/from-80 bit QNaN conversions, so that
the QNaN produced is "canonical". SNaN conversions are unchanged
(because I don't have a definition of what a canonical SNaN is)
although there are some comment updates. Fixes Mozilla bug #738117.
Initial support for POWER Processor decimal floating point instruction
support -- VEX side changes. See #295221.
This patch adds the DFP 64-bit and 128-bit support, support for the
new IEEE rounding modes and the Add, Subtract, Multiply and Divide
instructions for both 64-bit and 128-bit instructions to Valgrind.
Carl Love (carll@us.ibm.com) and Maynard Johnson (maynardj@us.ibm.com)
Florian Krohm [Tue, 27 Mar 2012 03:09:49 +0000 (03:09 +0000)]
Consolidate guest state offset computation. There is only
one way. No need to precompute them and have them named in
three different ways.... Get rid of libvex_guest_offsets.h
dependency.
Julian Seward [Mon, 26 Mar 2012 09:44:39 +0000 (09:44 +0000)]
gcc seems to have taken to generating "orl $0xFFFFFFFF, %reg32" to get
-1 (32-bit) into a register. [Is this wise? Does the processor know
that this generates no dependency on the previous value of the
register?] Teach the constant folder about such cases, therefore.
Florian Krohm [Mon, 20 Feb 2012 15:01:14 +0000 (15:01 +0000)]
Improve code generation on s390x for assignment of constant
values to guest registers. Motivated by the observation that
piecing together a 64-bit value requires 4 insns on z900 and 2 insns
on newer models. Specifically:
(1) Assigning 0 can be done by using XC
(2) Assigning a value that differs by a small amount from the
value previously assigned can be done using AGSI
(Happens a lot for guest IA updates).
(3) If the new value differs from the previous one only
in the lower word it is sufficient to assign the lower word.
(4) If the new value equals the old value the assignment is redundant
and can be eliminated. This happens surprisingly often.
This buys us somewhere between 5% and 11.8% of insns (as measured
on the perf bucket).
Julian Seward [Thu, 16 Feb 2012 14:18:56 +0000 (14:18 +0000)]
Adds 16 and 32 bit fnsave/frstor, and 0x66 prefix on fldl, to guest
amd64.
The Oracle/Sun HotSpot Java virtual machine uses fnsave and frstor,
which valgrind supports for x86 but not amd64. Even more interesting,
HotSpot uses the 0x66 size prefix on these instructions, and on
fldl. This patch adds the 16- and 32-bit versions of fnsave/frstor to
the amd64 guest, and tolerates the 0x66 size prefix on fldl (but only
on these three fpu instructions, even though the AMD docs say all
other fpu instructions (except fnstenv and fldenv) *ignore* 0x66).
Julian Seward [Thu, 16 Feb 2012 12:36:47 +0000 (12:36 +0000)]
Broadens the range on INT imm8 values that SIGSEGV, allowing Jikes RVM
to work.
Jikes RVM uses INT 0x3F through 0x49, assuming that they result in a
SIGSEGV. The x86 guest currently does this only for INT 0x40 through
0x43. The attached patch extends the range to 0x3F through 0x4F,
covering all existing Jikes RVM INTs and leaving room for it to add a
few more before it runs into this problem again.
Florian Krohm [Mon, 13 Feb 2012 00:06:29 +0000 (00:06 +0000)]
This patch is a follow-up to r2244 which fixed bugzilla #287260 on
some platforms but not on all that we test.
The issue was that cprop_BB did not see that in Add32(t2,t3) the
driving expressions for t2 and t3 were the same. Therefore, the
Add was not replaced with a shift (which is necessary for proper
memcheck operation).
So in this patch:
(1) In cprop_BB, when setting up the "env", record *any* assignment
to a temporary (and not just those that are subject to copy
propagation).
(2) Pass this env down to fold_Expr and then sameIRExprs.
(3) Replace sameIRTemps with sameIRExprs and enhance it. Upon
encountering an RdTmp, check "env" and recurse into the
expression assigned to the temporary.
As a side, the functions sameIcoU32s and sameIRTempsOrIcoU32s
and replaced with sameIRExprs.
(4) Add some machinery to monitor frequency and effectiveness of
sameIRExprs (can be enabled by setting STATS_IROPT).
Julian Seward [Fri, 20 Jan 2012 13:07:24 +0000 (13:07 +0000)]
Merge, from AVX branch, everything up to and including r2242
(revs 2212 - 2242 inclusive). In summary, brings the new decoding
framework into the trunk.
Florian Krohm [Mon, 16 Jan 2012 17:25:55 +0000 (17:25 +0000)]
Remove broken support for TS insn in s390 port. The
atomicity was not modelled.
The insn is not issued (gcc) or used (glibc, libdfp)
and is discouraged in the principles of operations.
No point spending time on it. Fixes #270796
Florian Krohm [Sun, 15 Jan 2012 21:01:16 +0000 (21:01 +0000)]
Add support for the s390's TROO insn. These are the VEX bits.
New hardware capability: VEX_HWCAPS_S390X_ETF2.
Patch by Divya Vyas (divyvyas@linux.vnet.ibm.com).
Partial fix of #273114
Florian Krohm [Thu, 20 Oct 2011 21:15:55 +0000 (21:15 +0000)]
Fix timerfd-syscall testcase on s390x.
This was caused by an interaction of resteering and the infamous
EX insn. This sequence
j someplace
ex ....
with the unconditional jump being subject to restering caused madness.
Such a sequence is found in glibc's syscall.S with the effect that all
system calls > 255 would have run into the same problem as timerfd_*.
Patch by Christian Borntraeger (borntraeger@de.ibm.com).
Support ARM and Thumb "CLREX" instructions since Dalvik generates
them. Mucho hassle for something that is used considerably less often
than once in a blue moon.
Fix an obscure type error in printing of Neon instructions, that
could cause assertion failures under some circumstances. (How come
none of the static checkers etc picked this up before now?)