For each backend, unify the sets of IRJumpKinds handled for Ist_Exit
and iselNext, so as to avoid potential failures caused by branch sense
switching at the IR level.
tchain optimisation for s390 (VEX bits)
Loading a 64-bit immediate into a register requires 4 insns on a
z900 machine, the oldest model supported. Depending on hardware
capabilities, newer machines can do the same using 2 insns.
Naturally, we want to take advantage of that.
However, currently, in disp_cp_chain_me_to_slowEP/fastEP we assume that
the length of loading a 64-bit immediate is a compile time constant:
S390_TCHAIN_LOAD64_LEN
For what we want to do this constant needs to be a runtime constant.
So in this patch we move this address arithmetic out of the dispatch
code. The general idea being that the value in %r1 does not need to
be adjusted to recover the place to patch. Upon reaching
disp_cp_chain_me_to_slowEP/fastEP %r1 contains the correct address.
Be lenient if the machine model could not be determined. Assume it's
a new machine as opposed to a too old machine.
Patch by Christian Borntraeger (borntraeger@de.ibm.com) with additional
commentary. Fixes 298394.
Consolidate and update information about dependencies of
VG_(machine_get_hwcaps) for all architectures in pub_core_machine.h
and avoid double maintenance.
Last optimisation for the day: change VG_(stats__n_xindirs) in such a
way that the fast-path through VG_(disp_cp_xindir) only has to
increment a 32 bit counter, saving memory bandwidth on 32 bit
platforms compared to a 64-bit inc. The overall numbers of XIndirs
can still be 64 bit though.
We incorrectly stored the archinfo_host argument of iselSB_S390 into
a global variable not realising it points to a stack-allocated variable.
This caused s390_archinfo_host->hwcaps member to change its value
randomly over time. It could have caused invalid code to be generated.
Curious that it did not surface.
More fixes:
- A few dummy_put_IA's were missing, causing asserts to fire.
Mostly for the "load/store conditional" kind of insns
- EX needed some finishing touches
- Assignments to irsb->next are forbidden. We had a few in the "special
opcodes" section. Now fixed, I hope.
With this patch most regressions run through. I see 3 failures in none
and a few more in the memcheck bucket.
add some .globl or used attribute to avoid link failures with gold linker + LTO
When doing experiment with gcc 4.7.0 and link time optimisation,
encountered link failures on amd64 which were solved by adding
.globl and used attribute.
=> added .globl in similar places for arm/x86/ppc32/s390.
Did not touch darwin (which asm seems somewhat different).
Change permission mask for FIFOs and shared memory to 0600 instead of 0666
Following a discussion about which user can debug which VAlgrind gdbserver:
The default umask will remove the "other" and "group" write bits.
Without the w bits, nothing works in any case.
Moreover, if the vgdb process does not belong to the user running the
V gdbserver, connections are also not possible.
=> remove useless/confusing bits.
Fix s390_tchain_patch_load64; some bytes were mixed up.
Fix unchainXDirect_S390; modified place_to_unchain address
before patching the code there.
Add some convenience functions for insn verification in
chain/unchain machinery.
Avoid magic constants.
patch fixing 297991: mmap changing a file descriptor current position
Bug caused by the following problem:
for each mmap, Valgrind reads the 1st 1024 bytes to detect
if this is an mmap-ed file containing debug info to decode.
Reading this 1Kb is done with VG_(pread). VG_(pread) should be
the equivalent of syscall pread but on linux, it is implemented as
a seek+read.
The patch implements VG_(pread) in terms of the underlying pread syscall.
Test mmap_fcntl_bug.c completed to also verify the fd current position
before and after the mmap.
tested on linux x86/amd64/ppc32/ppc64/s390.
(not tested on Darwin)
(manually tested on arm-android)
Extend CSE to cover CSEing of clean helper calls. This gives a
significant performance improvement in the baseline simulator (20%) on
some pieces of ARM code.
POWER Processor decimal floating point instruction support: part 2
(bug #297497) (Carl Love, carll@us.ibm.com) (VEX side)
This commit adds the second set of patches to add decimal floating
point (DFP) support for POWER to Valgrind. Bugzilla 295221 contains
the first set of patches for the adding the POWER support for the DFP
32, 64 and 128-bit sizes. The first set of patches also added support
for the 64 and 128-bit DFP arithmetic instructions and user test code
for the new DFP instructions. The second set of patches, being
submitted in this bugzilla include support for the DFP shift
instructions and format conversion instructions. Specifically, the
list of Power instructions is: dctdp, drsp, dctfix, dcffix, dctqpq,
dctfixq, drdpq, dcffixq, dscri, dscriq, dscli, dscliq.
TCHAIN: avoid calls to search_transtab and return to scheduer by first using tt_fast
This slightly improves some perf tests (e.g. heap).
Some not explained "real time" slow down of bz2 between trunk/svn tchain
and this patch analyzed with callgrind/cachegrind.
realtime slowdown attributed to Pentium 4 self modifying code unfriendly cache.
(callgrind/cachegrind cache simulation do not understand self modifying
code).
Android's libc includes advertise a "malloc_usable_size", but the
libc.so contains no such symbol; rather a "dlmalloc_usable_size"
(great, huh :-) So intercept that too, on Android.
Improve the behaviour of 64-to/from-80 bit QNaN conversions, so that
the QNaN produced is "canonical". SNaN conversions are unchanged
(because I don't have a definition of what a canonical SNaN is)
although there are some comment updates. Fixes Mozilla bug #738117.
outer/inner setup: new perf/vg_perf options to run perf tests + support translation chaining in inner.
* perf/vg_perf:
Similarly to tests/vg_regtest, perf/vg_perf now accepts the 3
optional arguments:
--outer-valgrind
--outer-tool
--outer-args
This allows easy analysis or comparison of performance between
different Valgrind versions (e.g. using callgrind, or cachegrind/cg_diff).
* See README_DEVELOPERS for more details.
* vg_regtest modified so as to use the 'in-place' build of inner, rather
than the installed version.
* added option --smc-check=all-non-file to vg_perf and vg_regtest
outer default arguments (needed when evaluating a Valgrind which does
translation chaining).
TCHAIN: remove caused_discard* argument to VG_(translate)
This is the followup to rev 12488.
With this revision, translation chaining is not done
if the translation with 'from address' is not existing
anymore (discarded or erased).
The assumption documented in 12488 comment has been checked by:
* first reproduce a crash in Firefox when always setting
caused discard to False
* then upgrade to rev 12488
* with this upgrade, no crash anymore.
=> this verifies that the caused discard logic is properly
replaced by revision 12488.
Fix assert due to gdbserver discarding translation
The fix consists in checking if the translation
of the 'from' address is still existing.
Patch also contains a big comment explaining why it is
safe to discard/erase the current translation being
executed.
In a follow-up patch, the Bool in VG_(translate) will
be removed :
Bool VG_(translate) ( /*OUT*/Bool* caused_discardP,
(if experiment confirms the hypothesis that it is
safe to discard current translation).