Julian Seward [Mon, 11 Oct 2010 18:03:13 +0000 (18:03 +0000)]
Fix bogus register constraints for ARM mode LDREX and STREX.
Derived from a patch by Rodrigo Belem <rodrigo.belem@openbossa.org>
Partially fixes #253636.
Julian Seward [Tue, 5 Oct 2010 22:29:49 +0000 (22:29 +0000)]
Thumb instructions: instead of generating tons of lardy boilerplate IR
to compute the guarding condition for instructions, and then leaning
heavily on ir_opt to almost always fold it out, avoid generating it in
the first place if it's not necessary, as per the ITxxx optimisation
analysis. This reduces startup time of Thumb applications by 0%-30%
by reducing the amount of time the JIT has to spend translating. No
effect on ARM instructions since those don't require a complex IR
preamble to establish the gating condition.
Julian Seward [Fri, 1 Oct 2010 14:06:22 +0000 (14:06 +0000)]
Improve constant folding of expressions of the form 'op(t,t)'
where t is an IRTemp. This superficially fixes #213865, although
it doesn't actually fix all the Intel-prescribed dependency-breaking
cases tterrib listed there. The newly-handled cases here are:
Increase the size of the JIT's scratch working area from 4MB to 5MB.
This is needed to handle long blocks of NEON code with Memcheck
--track-origins=yes.
Support the DCBZL instruction. Also, query the host CPU at startup
time to find out how much space DCBZL really clears, and make the
guest CPU act accordingly. (VEX-side changes)
(Dave Goodell, goodell@mcs.anl.gov)
Julian Seward [Sun, 22 Aug 2010 12:59:02 +0000 (12:59 +0000)]
Merge from branches/THUMB: new IR primops and associated
infrastructure, needed to represent NEON instructions. Way more new
ones than I would like, but I can't see a way to avoid having them.
Julian Seward [Sun, 22 Aug 2010 12:54:56 +0000 (12:54 +0000)]
Merge from branches/THUMB: hwcaps for ARM. May get simplified since
in fact ARM v5 and v6 are not supported targets -- ARMv7 remains the
minimum supported target.
Julian Seward [Sun, 22 Aug 2010 12:44:20 +0000 (12:44 +0000)]
Merge from branches/THUMB: front end changes to support:
* Thumb integer instructions
* NEON in both ARM and Thumb mode
* VFP in both ARM and Thumb mode
* infrastructure to support APSR.Q flag representation
Julian Seward [Sun, 22 Aug 2010 12:38:53 +0000 (12:38 +0000)]
Merge from branches/THUMB: A spechelper interface change that allows
the helper to look back at the previous IR statements. May be backed
out if it turns out no longer to be needed for optimising Thumb
translations.
Julian Seward [Tue, 17 Aug 2010 22:52:08 +0000 (22:52 +0000)]
Add a moderately comprehensive implementation of the SSE4.2 string
instructions PCMP{I,E}STR{I,M}. They are an absolute nightmare of
complexity. Most of the 8-bit data processing variants are supported,
but none of the 16-bit variants.
Also add support for PINSRB and PTEST.
With these changes, I believe Valgrind supports all the SSE4.2
instructions used in glibc-2.11 on x86_64-linux, as well as anything
that gcc can emit. So that gives fairly good coverage.
Currently these instructions are handled, but CPUID still claims to be
an older, non-SSE4 capable Core 2, so that software that correctly
checks CPU features should not use them. Following further testing I
will enable the relevant SSE4.2 bits in CPUID.
Julian Seward [Fri, 6 Aug 2010 07:59:38 +0000 (07:59 +0000)]
Add partial support for the SSE 4.2 PCMPISTRI instruction, at least
for (some of) the sub-cases that glibc uses (64-bit mode only). Also,
prepare for transitioning CPUID in 64-bit mode to indicate SSE4.2
support (not yet enabled).
Be warned, this commit will require a from-clean rebuild of Valgrind.
Don't trash the ELF ABI redzone for amd64 when emulating BT{,S,R,C}
reg,reg. Fixes (well, at least, makes an appalling kludge a bit less
appalling) #245925.
Handle mov[ua[pd G(xmm) -> E(xmm) case, which is something binutils
doesn't produce, presumably because it uses the E->G encoding for xmm
reg-reg moves. Fixes #238713. (Pierre Willenbrock,
pierre@pirsoft.de).
Support the SSE4 insn 'roundss' in 32-bit mode. Lack of this was
causing problems for people running 32-bit apps on MacOSX 10.6 on
newer hardware. Fixes #241377.
Julian Seward [Fri, 18 Jun 2010 08:17:41 +0000 (08:17 +0000)]
Implement SSE4 instructions: PCMPGTQ PMAXUD PMINUD PMAXSB PMINSB PMULLD
I believe this covers everything that gcc-4.4 and gcc-4.5 will generate
with "-O3 -msse4.2". Note, this commit changes the set of IR ops and so
requires a from-scratch rebuild of the tree.
Julian Seward [Mon, 7 Jun 2010 16:22:22 +0000 (16:22 +0000)]
Implement SIDT and SGDT as pass-throughs to the host. It's a pretty
bad thing to do, but I can't think of a way to virtualise these
properly. Patch from Alexander Potapenko. See
https://bugs.kde.org/show_bug.cgi?id=205241#c38
Julian Seward [Tue, 4 May 2010 08:48:43 +0000 (08:48 +0000)]
Handle v7 memory fence instructions: ISB DSB DMB and their v6 equivalents:
mcr 15,0,r0,c7,c5,4 mcr 15,0,r0,c7,c10,4 mcr 15,0,r0,c7,c10,5
respectively. Re-emit them in the v6 form so as not to inhibit possible
support for v6-only platforms in the future. Extended version of a patch
from Alexander Potapenko (glider@google.com). Fixes bug 228060.
Julian Seward [Sun, 21 Feb 2010 20:40:53 +0000 (20:40 +0000)]
CVTPI2PD (which converts 2 x I32 in M64 or MMX to 2 x F64 in XMM):
only switch the x87 FPU to MMX mode in the case where the source
operand is in memory, not in an MMX register. This fixes #210264.
This is all very fishy.
* it's inconsistent with all other instructions which convert between
values in (MMX or M64) and XMM, in that they put the FPU in MMX mode
even if the source is memory, not MMX. (for example, CVTPI2PS).
At least, that's what the Intel docs appear to say.
* the AMD documentation makes no mention at all of this. For example
it makes no differentiation in this matter between CVTPI2PD and
CVTPI2PS.
I wonder if Intel surreptitiously changed the behaviour of CVTPI2PD
since this code was written circa 5 years ago. Or, whether the Intel
and AMD implementations differ in this respect.
Julian Seward [Sun, 17 Jan 2010 15:47:01 +0000 (15:47 +0000)]
x86/amd64 front ends: don't chase a conditional branch that leads
back to the start of the trace. It's better to leave the IR loop
unroller to handle such cases.