Florian Krohm [Thu, 20 Oct 2011 21:15:55 +0000 (21:15 +0000)]
Fix timerfd-syscall testcase on s390x.
This was caused by an interaction of resteering and the infamous
EX insn. This sequence
j someplace
ex ....
with the unconditional jump being subject to restering caused madness.
Such a sequence is found in glibc's syscall.S with the effect that all
system calls > 255 would have run into the same problem as timerfd_*.
Patch by Christian Borntraeger (borntraeger@de.ibm.com).
Support ARM and Thumb "CLREX" instructions since Dalvik generates
them. Mucho hassle for something that is used considerably less often
than once in a blue moon.
Fix an obscure type error in printing of Neon instructions, that
could cause assertion failures under some circumstances. (How come
none of the static checkers etc picked this up before now?)
Add support for IBM Power ISA 2.06 -- stage 3.
The purpose of this bug is to add support for the third and final subset of the
new instructions in IBM Power ISA 2.06 (i.e., IBM POWER7 processor).
(VEX changes. Bug 279994 comment 1).
(Maynard Johnson, maynardj@us.ibm.com)
Tom Hughes [Thu, 11 Aug 2011 14:43:12 +0000 (14:43 +0000)]
Support FEMMS in x86 mode as we already do for amd64. Fix for #204574.
Note, from #124499 where this was discussed for amd64, that FEMMS is
a 3DNow instruction that has identical behaviour to EMMS and is only
supposed on AMD processors for backwards compatibility.
Florian Krohm [Mon, 8 Aug 2011 18:22:58 +0000 (18:22 +0000)]
Handle the invalid opcode 0000.
This is sometimes used by applications on purpose.
Although never executed, we might still decode it because
of chasing unconditional goto/calls.
Florian Krohm [Mon, 1 Aug 2011 22:07:51 +0000 (22:07 +0000)]
For a special opcode the address of the next insn was
not computed correctly. It would point to an insn in
the middle of the the pattern that identifies a special opcode.
That didn't hurt much but was confusing. Now fixed.
Fix an assert.
This occured when we were chasing a branch insn (thereby setting the
disassembly result to Dis_ResteerU and the continueAt field to something
non-zero) and later changing the result kind to Dis_StopHere (because
the next insn is an EX insn). The ContinueAt field remained non-zero
in the case causing an assert down the road.
This should fix the failing test memcheck/tests/linux/timerfd-syscall
And likewise for CmpNEZ operations.
This revision adds tree patterns to optimise some of those
comparisons.
This is particularly beneficial for s390x where moving the
condition code into a GPR is an expensive operation. With this
optimisation an up to 8% reduction in generated code was observed.
Fix BLX r14 in ARM mode, which was broken due to incorrect sequencing
of guest r14 reading vs writing. Thumb mode does not have the same
problem. Bug 277694. (Mans Rullgard, mans@mansr.com)
Complete the implementation of ARM atomic ops: {LD,ST}REX{,B,H,D} in
both ARM and Thumb encodings, for NEON and non-NEON capable backends.
Bug 266035 comments 4, 43, 51. Derived from patches by Jeff Brown
<jeffbrown@google.com>, Igor Saenko <igor.saenko@gmail.com> and
Dr. David Alan Gilbert <david.gilbert@linaro.org>.
Thumb2 front end: improved analysis of IT instructions that might
guard the one being translated, with the goal of proving this
isn't the case more of the time. Reduces the amount of generated
code by about 10% with --tool=none, and performance improvements
(also with --tool=none) of up to 25% have been observed.
Julian Seward [Thu, 16 Jun 2011 11:36:23 +0000 (11:36 +0000)]
Rename and rationalise the vector narrowing and widening primops, so
as to give them a consistent, understandable naming scheme. Finishes
off the process that was begun in r2159.
Julian Seward [Wed, 15 Jun 2011 15:09:37 +0000 (15:09 +0000)]
Partially fix underspecification of saturating narrowing primops that
became apparent whilst looking into the problem of implementing the
SSE4 packusdw instruction. Probably breaks Altivec.
Julian Seward [Tue, 7 Jun 2011 21:28:38 +0000 (21:28 +0000)]
Change the interface to LibVEX_Translate slightly, so as to make the
generation of self-modifying-code checks more flexible. With this
change, the decision about which parts (extents) of the newly created
IRSB need self-checks is deferred until after the IRSB has been
created. This allows the caller to decide, for each extent
individually, whether it needs a self-check, and the caller can make
those decisions based on the addresses of the guest instructions in
the extents.
Julian Seward [Sun, 5 Jun 2011 17:56:03 +0000 (17:56 +0000)]
Improvements to code generation for 32 bit instructions. When
appropriate, generate 32 bit add/sub/and/or/xor/cmp, so as to avoid a
bunch of cases where previously values would have been widened to 64
bits, or shifted left 32 bits, before being used. Reduces the size of
the generated code by up to 2.8%.
Julian Seward [Sun, 29 May 2011 09:29:18 +0000 (09:29 +0000)]
x86 and amd64 back ends: when generating transfers back to the
dispatcher, generate a jump either to the unassisted (GSP unchanged,
the common case) or assisted (GSP changed, request some action before
continuing) dispatcher. This removes two instructions per dispatch
for the common case. Changes for all other targets are interface-only
changes due to change in type of the emit_XXInstr functions.
Julian Seward [Fri, 27 May 2011 13:20:56 +0000 (13:20 +0000)]
Add a field 'UChar delta' to IRStmt_IMark, and use it to carry around
the T bit for the instruction when the instruction is a ARM/Thumb.
This more or less avoids introducing Thumb specific hacks in the IR,
yet makes it possible to identify, from an IMark, whether it refers to
a Thumb or ARM instruction. This is important for the GDB server
integration to work properly on Thumb code.
Julian Seward [Tue, 17 May 2011 16:18:36 +0000 (16:18 +0000)]
s390x: provide clock instructions like STCK
s390x provides user space accessible instructions to get the HW time (e.g. via
store clock STCK). while userspace programs should use gettimeofday and friends
to cope with ntp/system time etc, a lot of programs still make use of STCK.
valgrind should implement these instruction.
(Christian Borntraeger <borntraeger@de.ibm.com> and Divya Vyas)
Julian Seward [Wed, 4 May 2011 09:50:48 +0000 (09:50 +0000)]
Tighten up condition code handling in the back end, so as to placate
IBM's BEAM checker. There is no error in the existing code. However
BEAM doesn't know that when PPCCondCode::test == Pct_ALWAYS then the
::flag field is irrelevant, and so it believes it is being used
uninitialised. Add a Pcf_NONE ::flag value for use in that case, and
add assertions to match. (Untested!)
Julian Seward [Mon, 2 May 2011 07:21:04 +0000 (07:21 +0000)]
Split up armg_calculate_flags_nzcv into four functions that compute
the flags individually. This seems to be a net performance win,
because often only one or two of the flags computed by
armg_calculate_flags_nzcv, so time was wasted computing the other
ones.
Julian Seward [Sun, 1 May 2011 18:47:10 +0000 (18:47 +0000)]
Improvements to condition code handling on ARM.
(1) guest_arm_spechelper: add another spec rule for
armg_calculate_condition. Add a spec rules for
armg_calculate_flag_c and armg_calculate_flag_v.
(2) guest_arm_toIR.c: when storing oldC (shifter carry out) and
oldV values in the thunk, be sure to ensure the top 31 bits
are zero. This improves the effectiveness of the new spec
rules (1) by avoiding getting into situations where we have
Mux0X(c, x, And32(x,1)), where in fact x has bits 31:1 as
zero. iropt can't fold that out. So make sure the spec
rules don't generate any unnecessary And32(x,1); hence the
above becomes Mux0X(c, x, x) which iropt can reduce simply
to "x".
Julian Seward [Sun, 1 May 2011 18:36:51 +0000 (18:36 +0000)]
When simplifying (improving) the IR generated by the ARM front end, do
CSE by default. This significantly improves performance for ARM (not
Thumb) code that leans heavily on predicated instructions by commoning
up duplicate condition code evaluations within a single IRSB.
Handle Iop_Not64 when doing 32-bit code generation. Also, assert that
iselWordExpr_R is not asked to handle Iop_Not64 in 32-bit mode.
Fixes #270856. (Maynard Johnson, maynardj@us.ibm.com)
- Remove fixs390 regarding storing the instruction address in the
IP_AT_SYSCALL slot in the guest state. I'm not sure this is used
but it certainly makes sense.
- Remove fixs390 in function s390_irgen_XONC. This was missed in
VEX r2113.
Partial fix for #271501. (Florian Krohm, britzel@acm.org)
s390x: Implement Ist_MBE
VEX IR provides the statement Ist_MBE which is used to implement memory
barriers (Imbe_Fence). We use this statement to implement serialization which
is similar.
Fixes #271385. (Florian Krohm, britzel@acm.org)
s390x: invalid use of R0 as base register
When emitting code for a shift operation with the shift amount operand being in
memory we load the shift amount into R0 and use that register in SLAG etc..
That won't work because the contents of R0 will be ignored when used as a base
reg.
So, let's choose some other register and save/restore it.
s390x: fpr - gpr transfer facility
We need to introduce a new hwcap to model the presence of the fpr - gpr
transfer facility. If it is not available, we cannot use the LDGR and LGDR
insns and need to use a trick similar to what ppc does (write/read stack
location).
Fixes #268619 (vex side).
(Florian Krohm, britzel@acm.org)
Fix up some enum confusion to do with ARMNeonUnOp and ARMNeonUnOpS, as
found by "the IBM checker", and also by clang-2.9. Fixes #271820.
(Florian Krohm, britzel@acm.org)