More fixes:
- A few dummy_put_IA's were missing, causing asserts to fire.
Mostly for the "load/store conditional" kind of insns
- EX needed some finishing touches
- Assignments to irsb->next are forbidden. We had a few in the "special
opcodes" section. Now fixed, I hope.
With this patch most regressions run through. I see 3 failures in none
and a few more in the memcheck bucket.
Fix s390_tchain_patch_load64; some bytes were mixed up.
Fix unchainXDirect_S390; modified place_to_unchain address
before patching the code there.
Add some convenience functions for insn verification in
chain/unchain machinery.
Avoid magic constants.
Initial support for POWER Processor decimal floating point instruction
support -- VEX side changes. See #295221.
This patch adds the DFP 64-bit and 128-bit support, support for the
new IEEE rounding modes and the Add, Subtract, Multiply and Divide
instructions for both 64-bit and 128-bit instructions to Valgrind.
Carl Love (carll@us.ibm.com) and Maynard Johnson (maynardj@us.ibm.com)
Florian Krohm [Tue, 27 Mar 2012 03:09:49 +0000 (03:09 +0000)]
Consolidate guest state offset computation. There is only
one way. No need to precompute them and have them named in
three different ways.... Get rid of libvex_guest_offsets.h
dependency.
Julian Seward [Mon, 26 Mar 2012 09:44:39 +0000 (09:44 +0000)]
gcc seems to have taken to generating "orl $0xFFFFFFFF, %reg32" to get
-1 (32-bit) into a register. [Is this wise? Does the processor know
that this generates no dependency on the previous value of the
register?] Teach the constant folder about such cases, therefore.
Florian Krohm [Mon, 20 Feb 2012 15:01:14 +0000 (15:01 +0000)]
Improve code generation on s390x for assignment of constant
values to guest registers. Motivated by the observation that
piecing together a 64-bit value requires 4 insns on z900 and 2 insns
on newer models. Specifically:
(1) Assigning 0 can be done by using XC
(2) Assigning a value that differs by a small amount from the
value previously assigned can be done using AGSI
(Happens a lot for guest IA updates).
(3) If the new value differs from the previous one only
in the lower word it is sufficient to assign the lower word.
(4) If the new value equals the old value the assignment is redundant
and can be eliminated. This happens surprisingly often.
This buys us somewhere between 5% and 11.8% of insns (as measured
on the perf bucket).
Julian Seward [Thu, 16 Feb 2012 14:18:56 +0000 (14:18 +0000)]
Adds 16 and 32 bit fnsave/frstor, and 0x66 prefix on fldl, to guest
amd64.
The Oracle/Sun HotSpot Java virtual machine uses fnsave and frstor,
which valgrind supports for x86 but not amd64. Even more interesting,
HotSpot uses the 0x66 size prefix on these instructions, and on
fldl. This patch adds the 16- and 32-bit versions of fnsave/frstor to
the amd64 guest, and tolerates the 0x66 size prefix on fldl (but only
on these three fpu instructions, even though the AMD docs say all
other fpu instructions (except fnstenv and fldenv) *ignore* 0x66).
Julian Seward [Thu, 16 Feb 2012 12:36:47 +0000 (12:36 +0000)]
Broadens the range on INT imm8 values that SIGSEGV, allowing Jikes RVM
to work.
Jikes RVM uses INT 0x3F through 0x49, assuming that they result in a
SIGSEGV. The x86 guest currently does this only for INT 0x40 through
0x43. The attached patch extends the range to 0x3F through 0x4F,
covering all existing Jikes RVM INTs and leaving room for it to add a
few more before it runs into this problem again.
Florian Krohm [Mon, 13 Feb 2012 00:06:29 +0000 (00:06 +0000)]
This patch is a follow-up to r2244 which fixed bugzilla #287260 on
some platforms but not on all that we test.
The issue was that cprop_BB did not see that in Add32(t2,t3) the
driving expressions for t2 and t3 were the same. Therefore, the
Add was not replaced with a shift (which is necessary for proper
memcheck operation).
So in this patch:
(1) In cprop_BB, when setting up the "env", record *any* assignment
to a temporary (and not just those that are subject to copy
propagation).
(2) Pass this env down to fold_Expr and then sameIRExprs.
(3) Replace sameIRTemps with sameIRExprs and enhance it. Upon
encountering an RdTmp, check "env" and recurse into the
expression assigned to the temporary.
As a side, the functions sameIcoU32s and sameIRTempsOrIcoU32s
and replaced with sameIRExprs.
(4) Add some machinery to monitor frequency and effectiveness of
sameIRExprs (can be enabled by setting STATS_IROPT).
Julian Seward [Fri, 20 Jan 2012 13:07:24 +0000 (13:07 +0000)]
Merge, from AVX branch, everything up to and including r2242
(revs 2212 - 2242 inclusive). In summary, brings the new decoding
framework into the trunk.
Florian Krohm [Mon, 16 Jan 2012 17:25:55 +0000 (17:25 +0000)]
Remove broken support for TS insn in s390 port. The
atomicity was not modelled.
The insn is not issued (gcc) or used (glibc, libdfp)
and is discouraged in the principles of operations.
No point spending time on it. Fixes #270796
Florian Krohm [Sun, 15 Jan 2012 21:01:16 +0000 (21:01 +0000)]
Add support for the s390's TROO insn. These are the VEX bits.
New hardware capability: VEX_HWCAPS_S390X_ETF2.
Patch by Divya Vyas (divyvyas@linux.vnet.ibm.com).
Partial fix of #273114
Florian Krohm [Thu, 20 Oct 2011 21:15:55 +0000 (21:15 +0000)]
Fix timerfd-syscall testcase on s390x.
This was caused by an interaction of resteering and the infamous
EX insn. This sequence
j someplace
ex ....
with the unconditional jump being subject to restering caused madness.
Such a sequence is found in glibc's syscall.S with the effect that all
system calls > 255 would have run into the same problem as timerfd_*.
Patch by Christian Borntraeger (borntraeger@de.ibm.com).
Support ARM and Thumb "CLREX" instructions since Dalvik generates
them. Mucho hassle for something that is used considerably less often
than once in a blue moon.
Fix an obscure type error in printing of Neon instructions, that
could cause assertion failures under some circumstances. (How come
none of the static checkers etc picked this up before now?)
Add support for IBM Power ISA 2.06 -- stage 3.
The purpose of this bug is to add support for the third and final subset of the
new instructions in IBM Power ISA 2.06 (i.e., IBM POWER7 processor).
(VEX changes. Bug 279994 comment 1).
(Maynard Johnson, maynardj@us.ibm.com)
Tom Hughes [Thu, 11 Aug 2011 14:43:12 +0000 (14:43 +0000)]
Support FEMMS in x86 mode as we already do for amd64. Fix for #204574.
Note, from #124499 where this was discussed for amd64, that FEMMS is
a 3DNow instruction that has identical behaviour to EMMS and is only
supposed on AMD processors for backwards compatibility.
Florian Krohm [Mon, 8 Aug 2011 18:22:58 +0000 (18:22 +0000)]
Handle the invalid opcode 0000.
This is sometimes used by applications on purpose.
Although never executed, we might still decode it because
of chasing unconditional goto/calls.
Florian Krohm [Mon, 1 Aug 2011 22:07:51 +0000 (22:07 +0000)]
For a special opcode the address of the next insn was
not computed correctly. It would point to an insn in
the middle of the the pattern that identifies a special opcode.
That didn't hurt much but was confusing. Now fixed.
Fix an assert.
This occured when we were chasing a branch insn (thereby setting the
disassembly result to Dis_ResteerU and the continueAt field to something
non-zero) and later changing the result kind to Dis_StopHere (because
the next insn is an EX insn). The ContinueAt field remained non-zero
in the case causing an assert down the road.
This should fix the failing test memcheck/tests/linux/timerfd-syscall
And likewise for CmpNEZ operations.
This revision adds tree patterns to optimise some of those
comparisons.
This is particularly beneficial for s390x where moving the
condition code into a GPR is an expensive operation. With this
optimisation an up to 8% reduction in generated code was observed.