Julian Seward [Wed, 4 May 2011 09:50:48 +0000 (09:50 +0000)]
Tighten up condition code handling in the back end, so as to placate
IBM's BEAM checker. There is no error in the existing code. However
BEAM doesn't know that when PPCCondCode::test == Pct_ALWAYS then the
::flag field is irrelevant, and so it believes it is being used
uninitialised. Add a Pcf_NONE ::flag value for use in that case, and
add assertions to match. (Untested!)
Julian Seward [Mon, 2 May 2011 07:21:04 +0000 (07:21 +0000)]
Split up armg_calculate_flags_nzcv into four functions that compute
the flags individually. This seems to be a net performance win,
because often only one or two of the flags computed by
armg_calculate_flags_nzcv, so time was wasted computing the other
ones.
Julian Seward [Sun, 1 May 2011 18:47:10 +0000 (18:47 +0000)]
Improvements to condition code handling on ARM.
(1) guest_arm_spechelper: add another spec rule for
armg_calculate_condition. Add a spec rules for
armg_calculate_flag_c and armg_calculate_flag_v.
(2) guest_arm_toIR.c: when storing oldC (shifter carry out) and
oldV values in the thunk, be sure to ensure the top 31 bits
are zero. This improves the effectiveness of the new spec
rules (1) by avoiding getting into situations where we have
Mux0X(c, x, And32(x,1)), where in fact x has bits 31:1 as
zero. iropt can't fold that out. So make sure the spec
rules don't generate any unnecessary And32(x,1); hence the
above becomes Mux0X(c, x, x) which iropt can reduce simply
to "x".
Julian Seward [Sun, 1 May 2011 18:36:51 +0000 (18:36 +0000)]
When simplifying (improving) the IR generated by the ARM front end, do
CSE by default. This significantly improves performance for ARM (not
Thumb) code that leans heavily on predicated instructions by commoning
up duplicate condition code evaluations within a single IRSB.
Handle Iop_Not64 when doing 32-bit code generation. Also, assert that
iselWordExpr_R is not asked to handle Iop_Not64 in 32-bit mode.
Fixes #270856. (Maynard Johnson, maynardj@us.ibm.com)
- Remove fixs390 regarding storing the instruction address in the
IP_AT_SYSCALL slot in the guest state. I'm not sure this is used
but it certainly makes sense.
- Remove fixs390 in function s390_irgen_XONC. This was missed in
VEX r2113.
Partial fix for #271501. (Florian Krohm, britzel@acm.org)
s390x: Implement Ist_MBE
VEX IR provides the statement Ist_MBE which is used to implement memory
barriers (Imbe_Fence). We use this statement to implement serialization which
is similar.
Fixes #271385. (Florian Krohm, britzel@acm.org)
s390x: invalid use of R0 as base register
When emitting code for a shift operation with the shift amount operand being in
memory we load the shift amount into R0 and use that register in SLAG etc..
That won't work because the contents of R0 will be ignored when used as a base
reg.
So, let's choose some other register and save/restore it.
s390x: fpr - gpr transfer facility
We need to introduce a new hwcap to model the presence of the fpr - gpr
transfer facility. If it is not available, we cannot use the LDGR and LGDR
insns and need to use a trick similar to what ppc does (write/read stack
location).
Fixes #268619 (vex side).
(Florian Krohm, britzel@acm.org)
Fix up some enum confusion to do with ARMNeonUnOp and ARMNeonUnOpS, as
found by "the IBM checker", and also by clang-2.9. Fixes #271820.
(Florian Krohm, britzel@acm.org)
Fix up enum confusion between PPCAvOp and PPCAvFpOp, as found by
"the IBM checker", and also by clang-2.9. Fixes #271579.
(Florian Krohm, britzel@acm.org)
s390x: reconsider "long displacement" requirement. We currently
require that the host supports accessing memory using long
displacement. On older machines e.g. z900 that is an expensive
operation, because it is millicoded. It would be a performance win to
relax that requirement. (VEX side changes.) See #268620.
(Florian Krohm, britzel@acm.org)
s390x: minor code generation tweaks. There were a few loose ends
(identified by fixs390) in code generation that are fixed by the
attached patch:
- use of SLFI insn if available
- unnecessary vpanic
An out-of-date comment is also removed.
Julian Seward [Thu, 24 Mar 2011 11:14:02 +0000 (11:14 +0000)]
Handle more cases of SUB (SP minus immediate/register). Also
tighten up checks for SP plus register. Fixes #269078.
(Ulrich Weigand, uweigand@de.ibm.com)
Julian Seward [Tue, 22 Mar 2011 16:51:38 +0000 (16:51 +0000)]
Emit Ain_Imm64 (64-bit immediate constant loads to register) using a
short form when the immediate is < 2^20. Gives a 3% code size
reduction for Helgrind with --ignore-stack-refs=yes.
Julian Seward [Mon, 14 Mar 2011 12:35:18 +0000 (12:35 +0000)]
Wrap up "__attribute__((regparm(n)))" inside a macro so it is only
visible on x86, so as to avoid producing compiler warnings on targets
for which it is ignored. Fixes #247223. (Modified version of patch
from Bart Van Assche).
Julian Seward [Mon, 7 Mar 2011 16:04:07 +0000 (16:04 +0000)]
Add a port to IBM z/Architecture (s390x) running Linux -- VEX
side components. (Florian Krohm <britzel@acm.org> and Christian
Borntraeger <borntraeger@de.ibm.com>). Fixes #243404.
Julian Seward [Thu, 10 Feb 2011 12:20:02 +0000 (12:20 +0000)]
Handle Ico_V128(0xFFFF), created by more aggressive constant folding
in ir_opt.c. Fixes #262985 (a regression from 3.5.0).
(Maynard Johnson, maynardj@us.ibm.com)
Julian Seward [Mon, 11 Oct 2010 18:03:13 +0000 (18:03 +0000)]
Fix bogus register constraints for ARM mode LDREX and STREX.
Derived from a patch by Rodrigo Belem <rodrigo.belem@openbossa.org>
Partially fixes #253636.
Julian Seward [Tue, 5 Oct 2010 22:29:49 +0000 (22:29 +0000)]
Thumb instructions: instead of generating tons of lardy boilerplate IR
to compute the guarding condition for instructions, and then leaning
heavily on ir_opt to almost always fold it out, avoid generating it in
the first place if it's not necessary, as per the ITxxx optimisation
analysis. This reduces startup time of Thumb applications by 0%-30%
by reducing the amount of time the JIT has to spend translating. No
effect on ARM instructions since those don't require a complex IR
preamble to establish the gating condition.
Julian Seward [Fri, 1 Oct 2010 14:06:22 +0000 (14:06 +0000)]
Improve constant folding of expressions of the form 'op(t,t)'
where t is an IRTemp. This superficially fixes #213865, although
it doesn't actually fix all the Intel-prescribed dependency-breaking
cases tterrib listed there. The newly-handled cases here are:
Increase the size of the JIT's scratch working area from 4MB to 5MB.
This is needed to handle long blocks of NEON code with Memcheck
--track-origins=yes.
Support the DCBZL instruction. Also, query the host CPU at startup
time to find out how much space DCBZL really clears, and make the
guest CPU act accordingly. (VEX-side changes)
(Dave Goodell, goodell@mcs.anl.gov)