Florian Krohm [Mon, 13 Feb 2012 00:06:29 +0000 (00:06 +0000)]
This patch is a follow-up to r2244 which fixed bugzilla #287260 on
some platforms but not on all that we test.
The issue was that cprop_BB did not see that in Add32(t2,t3) the
driving expressions for t2 and t3 were the same. Therefore, the
Add was not replaced with a shift (which is necessary for proper
memcheck operation).
So in this patch:
(1) In cprop_BB, when setting up the "env", record *any* assignment
to a temporary (and not just those that are subject to copy
propagation).
(2) Pass this env down to fold_Expr and then sameIRExprs.
(3) Replace sameIRTemps with sameIRExprs and enhance it. Upon
encountering an RdTmp, check "env" and recurse into the
expression assigned to the temporary.
As a side, the functions sameIcoU32s and sameIRTempsOrIcoU32s
and replaced with sameIRExprs.
(4) Add some machinery to monitor frequency and effectiveness of
sameIRExprs (can be enabled by setting STATS_IROPT).
Julian Seward [Fri, 20 Jan 2012 13:07:24 +0000 (13:07 +0000)]
Merge, from AVX branch, everything up to and including r2242
(revs 2212 - 2242 inclusive). In summary, brings the new decoding
framework into the trunk.
Florian Krohm [Mon, 16 Jan 2012 17:25:55 +0000 (17:25 +0000)]
Remove broken support for TS insn in s390 port. The
atomicity was not modelled.
The insn is not issued (gcc) or used (glibc, libdfp)
and is discouraged in the principles of operations.
No point spending time on it. Fixes #270796
Florian Krohm [Sun, 15 Jan 2012 21:01:16 +0000 (21:01 +0000)]
Add support for the s390's TROO insn. These are the VEX bits.
New hardware capability: VEX_HWCAPS_S390X_ETF2.
Patch by Divya Vyas (divyvyas@linux.vnet.ibm.com).
Partial fix of #273114
Florian Krohm [Thu, 20 Oct 2011 21:15:55 +0000 (21:15 +0000)]
Fix timerfd-syscall testcase on s390x.
This was caused by an interaction of resteering and the infamous
EX insn. This sequence
j someplace
ex ....
with the unconditional jump being subject to restering caused madness.
Such a sequence is found in glibc's syscall.S with the effect that all
system calls > 255 would have run into the same problem as timerfd_*.
Patch by Christian Borntraeger (borntraeger@de.ibm.com).
Support ARM and Thumb "CLREX" instructions since Dalvik generates
them. Mucho hassle for something that is used considerably less often
than once in a blue moon.
Fix an obscure type error in printing of Neon instructions, that
could cause assertion failures under some circumstances. (How come
none of the static checkers etc picked this up before now?)
Add support for IBM Power ISA 2.06 -- stage 3.
The purpose of this bug is to add support for the third and final subset of the
new instructions in IBM Power ISA 2.06 (i.e., IBM POWER7 processor).
(VEX changes. Bug 279994 comment 1).
(Maynard Johnson, maynardj@us.ibm.com)
Tom Hughes [Thu, 11 Aug 2011 14:43:12 +0000 (14:43 +0000)]
Support FEMMS in x86 mode as we already do for amd64. Fix for #204574.
Note, from #124499 where this was discussed for amd64, that FEMMS is
a 3DNow instruction that has identical behaviour to EMMS and is only
supposed on AMD processors for backwards compatibility.
Florian Krohm [Mon, 8 Aug 2011 18:22:58 +0000 (18:22 +0000)]
Handle the invalid opcode 0000.
This is sometimes used by applications on purpose.
Although never executed, we might still decode it because
of chasing unconditional goto/calls.
Florian Krohm [Mon, 1 Aug 2011 22:07:51 +0000 (22:07 +0000)]
For a special opcode the address of the next insn was
not computed correctly. It would point to an insn in
the middle of the the pattern that identifies a special opcode.
That didn't hurt much but was confusing. Now fixed.
Fix an assert.
This occured when we were chasing a branch insn (thereby setting the
disassembly result to Dis_ResteerU and the continueAt field to something
non-zero) and later changing the result kind to Dis_StopHere (because
the next insn is an EX insn). The ContinueAt field remained non-zero
in the case causing an assert down the road.
This should fix the failing test memcheck/tests/linux/timerfd-syscall
And likewise for CmpNEZ operations.
This revision adds tree patterns to optimise some of those
comparisons.
This is particularly beneficial for s390x where moving the
condition code into a GPR is an expensive operation. With this
optimisation an up to 8% reduction in generated code was observed.
Fix BLX r14 in ARM mode, which was broken due to incorrect sequencing
of guest r14 reading vs writing. Thumb mode does not have the same
problem. Bug 277694. (Mans Rullgard, mans@mansr.com)
Complete the implementation of ARM atomic ops: {LD,ST}REX{,B,H,D} in
both ARM and Thumb encodings, for NEON and non-NEON capable backends.
Bug 266035 comments 4, 43, 51. Derived from patches by Jeff Brown
<jeffbrown@google.com>, Igor Saenko <igor.saenko@gmail.com> and
Dr. David Alan Gilbert <david.gilbert@linaro.org>.
Thumb2 front end: improved analysis of IT instructions that might
guard the one being translated, with the goal of proving this
isn't the case more of the time. Reduces the amount of generated
code by about 10% with --tool=none, and performance improvements
(also with --tool=none) of up to 25% have been observed.
Julian Seward [Thu, 16 Jun 2011 11:36:23 +0000 (11:36 +0000)]
Rename and rationalise the vector narrowing and widening primops, so
as to give them a consistent, understandable naming scheme. Finishes
off the process that was begun in r2159.
Julian Seward [Wed, 15 Jun 2011 15:09:37 +0000 (15:09 +0000)]
Partially fix underspecification of saturating narrowing primops that
became apparent whilst looking into the problem of implementing the
SSE4 packusdw instruction. Probably breaks Altivec.
Julian Seward [Tue, 7 Jun 2011 21:28:38 +0000 (21:28 +0000)]
Change the interface to LibVEX_Translate slightly, so as to make the
generation of self-modifying-code checks more flexible. With this
change, the decision about which parts (extents) of the newly created
IRSB need self-checks is deferred until after the IRSB has been
created. This allows the caller to decide, for each extent
individually, whether it needs a self-check, and the caller can make
those decisions based on the addresses of the guest instructions in
the extents.
Julian Seward [Sun, 5 Jun 2011 17:56:03 +0000 (17:56 +0000)]
Improvements to code generation for 32 bit instructions. When
appropriate, generate 32 bit add/sub/and/or/xor/cmp, so as to avoid a
bunch of cases where previously values would have been widened to 64
bits, or shifted left 32 bits, before being used. Reduces the size of
the generated code by up to 2.8%.
Julian Seward [Sun, 29 May 2011 09:29:18 +0000 (09:29 +0000)]
x86 and amd64 back ends: when generating transfers back to the
dispatcher, generate a jump either to the unassisted (GSP unchanged,
the common case) or assisted (GSP changed, request some action before
continuing) dispatcher. This removes two instructions per dispatch
for the common case. Changes for all other targets are interface-only
changes due to change in type of the emit_XXInstr functions.
Julian Seward [Fri, 27 May 2011 13:20:56 +0000 (13:20 +0000)]
Add a field 'UChar delta' to IRStmt_IMark, and use it to carry around
the T bit for the instruction when the instruction is a ARM/Thumb.
This more or less avoids introducing Thumb specific hacks in the IR,
yet makes it possible to identify, from an IMark, whether it refers to
a Thumb or ARM instruction. This is important for the GDB server
integration to work properly on Thumb code.
Julian Seward [Tue, 17 May 2011 16:18:36 +0000 (16:18 +0000)]
s390x: provide clock instructions like STCK
s390x provides user space accessible instructions to get the HW time (e.g. via
store clock STCK). while userspace programs should use gettimeofday and friends
to cope with ntp/system time etc, a lot of programs still make use of STCK.
valgrind should implement these instruction.
(Christian Borntraeger <borntraeger@de.ibm.com> and Divya Vyas)
Julian Seward [Wed, 4 May 2011 09:50:48 +0000 (09:50 +0000)]
Tighten up condition code handling in the back end, so as to placate
IBM's BEAM checker. There is no error in the existing code. However
BEAM doesn't know that when PPCCondCode::test == Pct_ALWAYS then the
::flag field is irrelevant, and so it believes it is being used
uninitialised. Add a Pcf_NONE ::flag value for use in that case, and
add assertions to match. (Untested!)