Julian Seward [Fri, 3 Feb 2006 16:08:03 +0000 (16:08 +0000)]
An overhaul of VEX's floating point handling, to facilitate correct
simulation of IEEE rounding modes in all FP operations.
The fundamental change is to add a third argument to the basic
floating point primops, eg AddF64, MulF64, etc, indicating the
(IR-encoded) rounding mode to be used for that operation.
Unfortunately IR did not have any way to support three-argument
primops, which means a new kind of IRExpr has been added: a ternary
op, IRExpr_Triop, which is simply a 3-argument form of the existing IR
binary operation node. The unfortunate side effect is that the size
of the union type IRExpr has increased from 16 to 20 bytes on 32-bit
platforms, and hence the JIT chews through more memory, but this does
not appear to have a measurable effect on the JIT's performance, at
least as measured by Valgrind's perf suite.
* Add IRExpr_Triop, and add handling code to dozens of places which
examine IRExprs.
* Rename/retype a bunch of floating point IR primops to take a 3rd
rounding mode argument (which is always the first arg).
* Add extra primops AddF64r32 et al, which do double-precision FP
operations and then round to single precision, still within a 64-bit
type. This is needed to simulate PPC's fadds et al without double
rounding.
* Adjust the PPC->IR front end, to generate these new primops and
rounding modes.
* Cause the IR optimiser to do a CSE pass on blocks containing any
floating point operations. This commons up the IR rounding mode
computations, which is important for generating efficient code from
the backend.
* Adjust the IR->PPC back end, so as to emit instructions to set the
rounding mode before each FP operation. Well, at least in
principle. In practice there is a bit of cleverness to avoid
repeatedly setting it to the same value. This depends on both the
abovementioned CSE pass, and on the SSA property of IR (cool stuff,
SSA!). The effect is that for most blocks containing FP code, the
rounding mode is set just once, at the start of the block, and the
resulting overhead is minimal. See comment on
set_FPU_rounding_mode().
This change requires followup changes in memcheck. Also, the
x86/amd64 front/back ends are temporarily broken.
Julian Seward [Fri, 27 Jan 2006 21:20:15 +0000 (21:20 +0000)]
Change the way Vex represents architecture variants into something
more flexible. Prior to this change, the type VexSubArch effectively
imposed a total ordering on subarchitecture capabilities, which was
overly restrictive. This change moves to effectively using a bit-set,
allowing some features (instruction groups) to be supported or not
supported independently of each other.
Julian Seward [Wed, 25 Jan 2006 21:29:48 +0000 (21:29 +0000)]
Change the way the ppc backend does ppc32/64 float-integer
conversions. fctiw/fctid/fcfid/stfiwx are now represented explicitly
and are generated by the instruction selector. This removes the need
for hdefs.c to know anything about scratch areas on the stack and
scratch FP registers.
Julian Seward [Tue, 24 Jan 2006 01:01:17 +0000 (01:01 +0000)]
Vex can't simulate floor() or ceil() correctly on ppc32/64 from
glibc-2.3.4 onwards, so just replace the functions with the older
glibc implementation. This is an ugly kludge.
Julian Seward [Sun, 22 Jan 2006 01:15:36 +0000 (01:15 +0000)]
Two unrelated changes:
- create an IMark at the start of the IR for the ppc64 magic return stub
as cachegrind will barf if it doesn't find one in a BB
- ppc64: for the same reason that _NRADDR is set to zero at the start of
redirect block which is a function replacement entry (as opposed to a
function wrapper entry), also set _NRADDR_GPR2 to zero.
Julian Seward [Sun, 22 Jan 2006 01:12:51 +0000 (01:12 +0000)]
Index the BB_info table by redirected guest address, not
non-redirected guest address. This is a small but significant change
needed to make function wrapping work. The problem is that with
function wrapping two different translations are associated with the
non-redirected address (of a wrapped function entry point), and so
cachegrind asserts. Whereas the redirected guest addresses reflect
the reality of only one translation associated with each address. So
use them instead.
Julian Seward [Fri, 20 Jan 2006 16:48:31 +0000 (16:48 +0000)]
Yet another possible output, due to trivial differences in backtraces.
This is getting rediculous. We need a better way to compare
backtraces in regression test outputs.
Julian Seward [Fri, 20 Jan 2006 14:31:57 +0000 (14:31 +0000)]
Changes to make function wrapping work better on ppc64-linux:
- when recording the non-redirected address in guest_NRADDR, also
snapshot the current R2 value, as that will be needed to run the
original safely
- As a consequence, the original-function information extracted by
VALGRIND_GET_ORIG_FN is different on ppc64-linux (2 words) from
all other platforms (1 word). So change the type of it from
void* to a new type OrigFn which can be defined differently for
each platform.
- Change the CALL_FN_* macros for ppc64-linux to save/restore
R2 values appropriately.
- ppc64-linux: detect overflow/underflow of the redirect stack
and bring Valgrind to a halt if this happens
- Update VG_CLREQ_SZB for ppc32/64 (was out of date).
Julian Seward [Fri, 20 Jan 2006 14:19:25 +0000 (14:19 +0000)]
More ppc64-only function wrapping hacks:
- increase size of redirect stack from 8 to 16 elems
- augment the _NRADDR pseudo-register with _NRADDR_GPR2,
which is the value of R2 at the most recent divert point.
This is needed in the ELF ppc64 ABI in order to safely run
the function being wrapped.
- add pseudo-instruction to read get _NRADDR_GPR2 into _GPR3.
- related change: always keep R2 up to date wrt possible memory
exceptions (no specific reason, just being conservative)
Julian Seward [Thu, 19 Jan 2006 03:52:19 +0000 (03:52 +0000)]
Clever handling of partially defined equality does not work on
ppc32/64 at the moment. Make this test handle that whilst still
testing the facility on x86/amd64.
Julian Seward [Thu, 19 Jan 2006 03:50:48 +0000 (03:50 +0000)]
This was segfaulting on ppc64-linux, even natively. These changes
stop it doing that. Am not convinced this is a good fix -- I don't
really understand how this program works.
Julian Seward [Wed, 18 Jan 2006 04:23:10 +0000 (04:23 +0000)]
Fix an all-platforms bug introduced by the recent overhaul of function
interception and wrapping. This was causing failures matching
function names in suppressions to function names in backtraces when
the latter names were Z-encoded (eg malloc), which typically caused
all leak suppressions to fail because they contain names such as
malloc, which are Z-encoded.
Julian Seward [Wed, 18 Jan 2006 04:20:04 +0000 (04:20 +0000)]
To reduce the endless nuisance of multiple different names for "the
frame below main()" screwing up the testsuite, change all known
incarnations of said into a single name, "(below main)".
Julian Seward [Wed, 18 Jan 2006 04:14:52 +0000 (04:14 +0000)]
For ppc64, emit AbiHints from the front end so as to tell tools when
the 288-byte stack should be regarded as having become undefined as
per the ppc64 ELF ABI.
Julian Seward [Tue, 17 Jan 2006 02:06:39 +0000 (02:06 +0000)]
These files all speak about instrumentation functions.
Instrumentation functions now take a callback closure structure
(VgCallbackClosure*), so this commit changes the signatures
accordingly.