gdbserver: introduce support to show the AVX registers.
This implies to change the interface between the
arch independent gdbserver files and the arch dependent files
as AVX implies a choice of xml files at run time.
In valgrind-low-amd64.c, the xml files and the nr of registers
are different depending on AVX support or not.
Other platforms still have a fully static nr of registers.
Julian Seward [Thu, 24 May 2012 06:17:14 +0000 (06:17 +0000)]
Fix incorrect uses of disAMode in some SSE4 instructions that have an
immediate byte as a subopcode. Fixes #294260. (Patrick J. LoPresti,
lopresti@gmail.com)
Prepare for AVX support : restructure gdbsrv/target/valgrind-low/arch low
AVX support implies to have target xml files which are selected
according to the machine hwcaps.
This change improves the structure of the gdbserver software layering
to prepare for this.
Basically, the protocol files (e.g. server.c) are now calling directly
the valgrind target operations which are now defined in target.h/target.c
(before, there was a level of indirection inheritated from the GDB
structure which was useless for valgrind gdbserver).
+ clarified some comments
Julian Seward [Tue, 22 May 2012 23:12:13 +0000 (23:12 +0000)]
Implement
VMOVQ xmm1, r64 = VEX.128.66.0F.W1 7E /r (reg case only)
If this is documented in the Intel manuals, I can't find it.
GNU binutils and GDB seem to have heard of it, though.
Florian Krohm [Mon, 21 May 2012 16:18:23 +0000 (16:18 +0000)]
Add -fomit-frame-pointer for s390. The GCC maintainer was telling me that
this has been the preferred way to compile for quite a while. So let's follow
suit. The perf bucket did not reveal any measurable difference.
Julian Seward [Mon, 21 May 2012 15:45:34 +0000 (15:45 +0000)]
Ensure s390x guest state size is 32-byte aligned, as per increase in
alignment requirements resulting from r12569/r2330.
(Christian Borntraeger <borntraeger@de.ibm.com>)
name_of_sched_event was missing some values and returning "??UNKNOWN??" instead.
* re-ordered the values to match the declaration order in
libvex_trc_values.h and pub_core_dispatch_asm.h
* added missing values
Bypass gcc 4.4/4.5 compilation bug by moving -fomit-frame-pointer to Makefile.all.am
gcc 4.4 and 4.5 has a bug which causes miscompilation of mc_main.c:
args are not correctly given to VG_(am_munmap_valgrind).
This causes the secondary map entries to not be unmapped
(which can cause unlimited memory growth)
and/or causes the assert on VG_(am_munmap_valgrind) result to fail.
Removing the pragma optimize from mc_main.c and inserting it instead
in Makefile.all.am for x86 solves the gcc bug.
Add assertion that the munmap of the secmap succeeds.
It is suspected that there is a bug in the call to VG_(am_munmap_valgrind).
At first sight, it looks like a bug in gcc version 4.4.5 (Debian 4.4.5-8)
which seems to pass wrong arguments from mc_main.c to aspace mgr function.
Some tests are failing on gcc20 with this assert a.o.
./vg-in-place ./perf/bz2 x
gives an assert.
The bug does not happen if Valgrind is compiled with gcc 4.7.0.
On gcc20, the new tests failing with this assert are:
memcheck/tests/linux/lsframe1 (stderr)
memcheck/tests/linux/lsframe2 (stderr)
memcheck/tests/linux/stack_switch (stderr)
memcheck/tests/origin5-bz2 (stdout)
memcheck/tests/vcpu_bz2 (stdout)
memcheck/tests/vcpu_bz2 (stderr)
The assert is committed so as to see other platforms
where this is failing.
Florian Krohm [Sat, 12 May 2012 15:26:44 +0000 (15:26 +0000)]
Eliminate helper s390_calculate_icc. Rewrite and factor the code to use
s390_calculate_cond instead. The benefit is that the latter has comprehensive
spec_helpers whereas the former had not.
Florian Krohm [Sat, 12 May 2012 03:44:49 +0000 (03:44 +0000)]
Back out VEX r2326. It was not working correctly. The guard condition
has to be evaluated after argument evaluation. Add clarifying comments
in libvex_ir.h
fix 219156 support static malloc or alternate malloc lib (e.g. tcmalloc) with new option --soname-synonyms
* pub_tool_redir.h : define the prefix to be used for "soname synonym"
place holder
* vg_replace_malloc.c : define synonym place holder for malloc related
functions
* m_redir.c : when detecting a soname synonym place holder redir spec, search
in clo_soname_synonyms if there is a synonym pattern.
If yes, replace the soname pattern. If not, ignore the redir spec.
* various files: implement or document the new clo --soname-synonyms
* new test memcheck/tests/static_malloc.vgtest
Florian Krohm [Wed, 9 May 2012 13:31:09 +0000 (13:31 +0000)]
Improve insn selection for helper calls. Attempt to evaluate arguments
into the real register that is mandated by the ABI instead of evaluating
it in a virtual register and then move the result.
Observed savings in insns between 0.5% and 1.4%.
Probably an overrated optimization given current helper functions which
rarely take more than one argument.
Florian Krohm [Sun, 6 May 2012 03:51:00 +0000 (03:51 +0000)]
Avoid regtest failures on x86_64 and ppc64 when toolchains for the
seconday platform (x86 and ppc32, respectively) is not available.
Add -DVGA_SEC_xxxxx and -DVGP_SEC_... to the GCC command line
indicating that a seconday platform is supported. Make arch_test.c
recognise those flags.
Fixes bugzilla #296983.
Florian Krohm [Sun, 6 May 2012 03:37:25 +0000 (03:37 +0000)]
Require automake-1.10 for proper handling of include file dependencies
in .S files. Also included here is some cleanup, including a reversion
of r10378. Fixes bugzilla #197914.
Florian Krohm [Sun, 6 May 2012 03:34:55 +0000 (03:34 +0000)]
Add the counter pseudo register to the list of guest registers to
be tracked during insn selection. Saves 0.2% or so of insns depending on
how often insns with implicit loops like MVC are being used.
Florian Krohm [Sat, 5 May 2012 00:01:16 +0000 (00:01 +0000)]
Add NC and OC to the list of insns that get special treatment under EX.
Refactored code such that s390_irgen_xonc can be reused thereby avoiding
code duplication.
Improve m_redir.c debug trace by adding filename.
Many objects (shared or non shared) have no soname.
In such case, showing the filename clarifies where the
redir spec is coming from.
Test cases for POWER Power Decimal Floating Point (DFP) test class,
test group and test exponent instructions dtstdc, dtstdcq, dtstdg,
dtstdgq, dtstex and dtstexq. Bug #298862. (Carl Love,
carll@us.ibm.com and Maynard Johnson, maynardj@us.ibm.com)
Add support for POWER Power Decimal Floating Point (DFP) test class,
test group and test exponent instructions dtstdc, dtstdcq, dtstdg,
dtstdgq, dtstex and dtstexq. Bug #298862. (Carl Love,
carll@us.ibm.com)
add optional arg [aspacemgr] to v.info memory to show aspacemgr segments.
When investigating Valgrind out of memory situation,
it is useful to be able to output the list of segments of the
aspacemgr at any moment.
The GDB monitor command "v.info memory" has now an optional
argument allowing to output this list of segments
For --profile-flags=, weight the counts by the number of guest insns
in each IRSB, rather than considering each IRSB to have a weight of 1.
This probably gives more representative profiles, especially post
t-chain merge, which made inter-SB transitions more or less free
compared to what they were before.
For each backend, unify the sets of IRJumpKinds handled for Ist_Exit
and iselNext, so as to avoid potential failures caused by branch sense
switching at the IR level.
tchain optimisation for s390 (VEX bits)
Loading a 64-bit immediate into a register requires 4 insns on a
z900 machine, the oldest model supported. Depending on hardware
capabilities, newer machines can do the same using 2 insns.
Naturally, we want to take advantage of that.
However, currently, in disp_cp_chain_me_to_slowEP/fastEP we assume that
the length of loading a 64-bit immediate is a compile time constant:
S390_TCHAIN_LOAD64_LEN
For what we want to do this constant needs to be a runtime constant.
So in this patch we move this address arithmetic out of the dispatch
code. The general idea being that the value in %r1 does not need to
be adjusted to recover the place to patch. Upon reaching
disp_cp_chain_me_to_slowEP/fastEP %r1 contains the correct address.
Be lenient if the machine model could not be determined. Assume it's
a new machine as opposed to a too old machine.
Patch by Christian Borntraeger (borntraeger@de.ibm.com) with additional
commentary. Fixes 298394.
Consolidate and update information about dependencies of
VG_(machine_get_hwcaps) for all architectures in pub_core_machine.h
and avoid double maintenance.
Last optimisation for the day: change VG_(stats__n_xindirs) in such a
way that the fast-path through VG_(disp_cp_xindir) only has to
increment a 32 bit counter, saving memory bandwidth on 32 bit
platforms compared to a 64-bit inc. The overall numbers of XIndirs
can still be 64 bit though.