DW_CFA_def_cfa_expression: don't push the CFA on the stack before
evaluation starts. For DW_CFA_val_expression and DW_CFA_expression
doing so is correct, but not for DW_CFA_def_cfa_expression.
Back out most of r15145 which reports bug fixes for various altivec insns.
Either those bugs have been fixed looong time ago, or the reporter ran
on a host without altivec capabilities, or those insns were actually
e500 insns which are not supported at all at this point.
Follow up on VEX r3144 and remove VexGuestTILEGXStateAlignment.
Also fix the alignment check which should be mod 16 not mod 8.
Well, actually, it should be mod LibVEX_GUEST_STATE_ALIGN but
that is another patch.
Fix BZ #342683. Based on patch by Ivo Raisr.
What this does is to make sure that the initial client data segment
is marked as unaddressable. This is consistent with the behaviour of
brk when the data segment is shrunk. The "freed" memory is marked
as unaddressable.
Special tweaks were needed for s390 which was returning early from
the funtion to avoid sloppy register definedness initialisation.
Replace adler32 by sdbm_hash in m_deduppoolalloc.c
adler32 is not very good as a hash function.
sdbm_hash gives more different keys that adler32,
and in a large majority of the cases, shorter chains.
Fix an assertion in the address space manager. BZ #345887.
The VG_(extend_stack) call needs to be properly guarded because the
passed-in address is not necessarily part of an extensible stack
segment. And an extensible stack segment is the only thing that
function should have to deal with.
Previously, the function VG_(am_addr_is_in_extensible_client_stack)
was introduced to guard VG_(extend_stack) but it was not added in all
places it should have been.
Also, extending the client stack during signal delivery (in sigframe-common.c)
was simply calling VG_(extend_stack) hoping it would do the right thing.
But that was not always the case. The new testcase
none/tests/linux/pthread-stack.c exercises this (3.10.1 errors out on it).
Renamed ML_(sf_extend_stack) to ML_(sf_maybe_extend_stack) and add
proper guard logic for VG_(extend_stack).
Testcases none/tests/{amd64|x86}-linux/bug345887.c by Ivo Raisr.
Carl Love [Wed, 22 Apr 2015 21:17:48 +0000 (21:17 +0000)]
There is an ABI change in how the PPC64 gcc compiler handles 128 bit arguments
are aligned with GCC 5.0. The compiler generates a "note" about this starting
with GCC 4.9. To avoid generating the "note", the passing of the arguments
were changed to a pointer to make it pass by reference rather then pass by
value.
Carl Love [Wed, 22 Apr 2015 16:17:06 +0000 (16:17 +0000)]
Add support for the TEXASRU register. This register contains information on
transactional memory instruction summary information. This register contains
the upper 32-bits of the transaction information. Note, the valgrind
implementation of transactional memory instructions is limited. Currently, the
contents of the TEXASRU register will always return 0. The lower 64-bits of
the trasnaction information in the TEXASR register will contain the failure
information as setup by Valgrind.
The vex commit 3143 contains the changes needed to support the TEXASRU
register on PPC64.
The support requires changing the value of MAX_REG_WRITE_SIZE in
memcheck/mc_main.c from 1696 to 1712. The change is made in this
valgrind commit.
Add some stats to helgrind stats:
* nr of client malloc-ed blocks
* how many OldRef helgrind has, and the distribution
of these OldRef according to the nr of accs they have
Do RCEC_GC when approaching the max nr of RCEC, not when reaching it.
Otherwise, long running applications still see the max nr of RCEC
slowly growing, which increases the memory usage and
makes the (fixed) contextTab hash table slower to search.
Without this margin, the max could increase as the GC code
is not called at exactly the moment we reach the previous max,
but rather when a thread has run a bunch of basic blocks.
increase function size even more (see r15095). On s390 this testcase
might use a relative load (e.g. via load address relative long(larl)
for the address) into the literal pool for some constants. 1280 seems
to be enough that the r/o data is copied along the function.
Carl Love [Mon, 20 Apr 2015 23:38:33 +0000 (23:38 +0000)]
Add support for the lbarx, lharx, stbcx and sthcs instructions.
One of the expect files was missing. Also found that there
was a bug in the stq, stqcx, lq and lqarx instructions for LE.
The VEX commit for the instruction fix was 3138.
This commit updates the expect files for the corrected instructions
and adds the missing expect files.
The bugzilla for the orginal issue of the missing instructions
is 346324.
This patch changes the policy that does the GC of OldRef and RCEC
conflict cache size.
The current policy is:
A 'more or less' LRU policy is implemented by giving
to each OldRef a generation nr in which it was last touched.
A new generation is created every 50000 new access.
GC is done when the nr of OldRef reaches --conflict-cache-size.
The GC consists in removing enough generations to free
half of the entries.
After GC of OldRef, the RCEC (Ref Counted Exe Contexts)
not referenced anymore are GC-ed.
The new policy is:
An exact LRU policy is implemented using a doubly linked list
of OldRef.
When reaching --conflict-cache-size, the LRU entry is re-used.
The not referenced RCEC are GC-ed when less than 75% of the RCEC
are referenced, and the nr of RCEC is 'big' (at least half the
size of the contextTab, and at least the max nr of RCEC reached
previously).
(note: tried to directly recover a unref'ed RCEC when recovering
the LRU oldref, but that gives a lot of re-creation of RCEC).
new policy has the following advantages/disadvantages:
1. It is faster (at least for big applications)
On a firefox startup/exit, we gain about 1m30 second on 11m.
Similar 5..10% speed up encountered on other big applications
or on the new perf/memrw test.
The speed increase depends on the amount of memory
touched by the application. For applications with a
working set fitting in conflict-cache-size, the new policy
might be marginally slower than previous policy on platforms
having a small cache : the current policy only sets a generation
nr when an address is re-accessed, while the new policy
has to unchain and rechain the OldRef access in the LRU
doubly linked list.
2. It uses less memory (at least for big applications)
Firefox startup/exit "core" arena max use decreases from
1175MB mmap-ed/1060MB alloc-ed
to
994MB mmap-ed/913MB alloc-ed
The decrease in memory is the result of having a lot less RCEC:
The current policy let the nr of RCEC grow till the conflict
cache size is GC-ed.
The new policy limits the nr of RCEC to 133% of the RCEC
really referenced. So, we end up with a max nr of RCEC
a lot smaller with the new policy : max RCEC 191000
versus 1317000, for a total nr of discard RCEC operations
almost the same: 33M versus 32M.
Also, the current policy allocates a big temporary array
to do the GC of OldRef.
With the new policy, size of an OldRef increases because
we need 2 pointers for the LRU doubly linked list, and
we need the accessed address.
In total, the OldRef increase is limited to one Word,
as we do not need anymore the gen, and the 'magic'
for sanity check was removed (the check somewhat
becomes less needed, because an OldRef is never freed
anymore. Also, we do a new cross-check between
the ga in the OldRef and the sparseWA key).
For applications using small memory and having
a small nr of different stack traces accessing memory,
the new policy causes an increase in memory (one Word
per OldRef).
3. Functionally, the new policy gives better past information:
once the steady state is reached (i.e. the conflict cache
is full), the new policy has always --conflict-cache-size
entries of past information.
The current policy has a nr of past information varying
between --conflict-cache-size/2 and --conflict-cache-size
(so in average, 75% of conflict-cache-size).
4. The new code is a little bit smaller/simpler:
The generation based GC is replaced by a simpler LRU policy.
So, in summary, this patch should allow big applications
to use less cpu/memory, while having very little
or no impact on memory/cpu of small applications.
Note that the OldRef data structure LRU policy
is not really explicitely tested by a regtest.
Not easy at first sight to make such a test portable
between platforms/OS/compilers/....
For ppc64, use the endianess of the running program, rather
than an harcoded endness.
(this is because ppc64 supports 2 endness, decided at runtime)
For mips, use BE if running on a non mips system, otherwise
use the endness of the running program
(this is because mips supports 2 endness, but decided at compile time).
fix 346307 fuse filesystem syscall deadlocks
Mark 2 additional syscalls as 'mayblock' when fuse-compatible hint
is given.
Patch from aozgovde@ralota.com
Factor out the 'extend' function. We only need one version for Linux and
one for Darwin. Down from 11.
Carve out a new function 'track_frame_memory' that communicates to the
tool the allocation of a new stack frame. This was slightly different on
Linux and Darwin but should be the same on both platforms.
New files: priv_sigframe.h and sigframe-common.c
Add support for the lbarx, lharx, stbcx and sthcs instructions.
The instructions are part of the ISA 2.06 but were not implemented
in all versions of hardware. The four instructions are all supported
in ISA 2.07. The instructions were put under the ISA 2.07 category
of supported instructions in this patch.
Carl Love [Fri, 17 Apr 2015 23:43:36 +0000 (23:43 +0000)]
Add support for the lbarx, lharx, stbcx and sthcs instructions.
The instructions are part of the ISA 2.06 but were not implemented
in all versions of hardware. The four instructions are all supported
in ISA 2.07. The instructions were put under the ISA 2.07 category
of supported instructions in this patch.
Followup to r14974. That revision oversimplified a condition, part
of which was presumed to be redundant but wasn't. This caused code
to hang due to an infinite signal-delivery loop. Observed and
tracked down by Austin English.
Add 2 tests none/tests/libvex_test and libvexmultiarch_test
The objective of libvex_test is to verify that the VEX lib
can be used in 'single arch mode' (host == guest).
The objective of libvexmultiarch_test is to verify that the VEX lib
can be used in 'multi arch mode' (freely choose host and guest).
(but not many combinations are working: if wordsize or endianess
differs, then libVEX quickly asserts somewhere).
libvex_test.c is somewhat bizarre, as it uses the architecture
for which we have compiled as the guest, and use a 'foreign' arch
as the host.
That allows to avoid having to define in the test a bunch
of arch specific asm instructions : the test just decode a part
of its own code, and translate it to other archs.
By default, only the combination host == guest is run.
Arguments must be given to run other combinations.
See libvex_test.c for a description on how to specify which combinations
to run.
LibVEX host != guest does not (yet?) work when endianess or word size differs
between host and guest.
Also, currently, TILEGX host is not working properly (unless guest is also
TILEGX), as the evcheck instructions generated differs according to
the offset of the host_EvC_{FAILADDR,COUNTER}.
So, using TILEGX as host is only done when guest is also TILEGX.
Note that it is possible to specify a specific host arch to use.
For example, to force TILEGX to be used, do:
./none/tests/libvexmultiarch_test 1034
(where 1034 is the decimal value corresponding to the enum VexArchTILEGX.
This currently aborts with:
...
------------------------ Assembly ------------------------
vex: priv/host_tilegx_defs.c:2353 (emit_TILEGXInstr): Assertion `evCheckSzB_TILEGX() == (UChar*)p - (UChar*)p0' failed.
//// failure exit called by libVEX
Whe TILEGX is fixed, we can remove the specific condition that avoids using
TILEGX as host.
Small changes have been done on VEX to allow more combinations
to work:
* host_mips_defs.c : when not compiled on mips,
a lot of mips specific code is not compiled at all, because
one of _MIPSEL or _MIPSEB must be defined to have either the
little endian code or big endian code.
emit32 function must however work to use mips as host.
So, for this function, if _MIPSEL is not defined, then
the big endian code is compiled in by default.
(the mips endianess should probably be handled like the ppc
endianess, for which the endianness to use is decided at runtime).
* host_arm64_isel.c : addition of a 'do not emit anything' for
ABI HINT (avoid an assert e.g. for amd64 guest, arm64 host)
* libvex_guest_amd64.h : when I was still hoping mixing amd64 and x86,
a first assert was firing up due to size/alignment
of VexGuestAMD64State when compiled in 32 bits.
=> addition of pad elements to ensure the size and alignment
of VexGuestAMD64State stays the same when compiled in 32 and
64 bits (the 64 bits layout is unchanged).
The new tests have been run on x86/amd64/ppc64/s390x.
It is very well possible that the tests will fail on untested archs
(ppc32 or mips* or arm* or tilegx)
(e.g. because the hardcoded hwcaps in libvex_test.c are not ok).
It should be relatively trivial to fix these hwcaps problems.
Some other problems might be less easy to understand and fix
(e.g. similar to the TILEGX evcheck or mips emit32 problem).
Remove useless arguments in sparsewa, that were inheritated from WordFM
These arguments are not needed for sparsewa, as they can only
return the key given in input.
Have the event map GC use the same approach as the other GC
done from libhb_maybe_GC, i.e. check the condition in
libhb_maybe_GC, and call the (non inlined) GC only if
a GC is needed.
Carl Love [Thu, 9 Apr 2015 16:23:20 +0000 (16:23 +0000)]
ADD AT_DCACHEBSIZE and AT_HWCAP2 support for POWER PC
Valgrind currently does not support the following AUX vector entries:
AT_DCACHEBSIZE, and AT_HWCAP2. By default these entries are suppressed by
Valgrind. The attached patch adds the needed support so the user level programs
can correctly determine that hardware level they are running on. Specifically
that the ISA 2.07 for Power 8 is supported.
Bugzilla 345695
This fix adds the needed support. It makes a minor change to allow the
VEX settings of the host platform to be passed down so they can be checked
against the HWCAP values.
The files touched are:
coregrind/m_initimg/initimg-linux.c
coregrind/pub_core_initimg.h
coregrind/m_main.c
Assorted cleanups: remove magic constants and unneeded header file. Update
a few comments. Exit with code 127 in bash emulation mode when file was
not found.
Certain kernels on s390 provide extra read permissions on executable
segments. See discussion here: https://bugs.kde.org/show_bug.cgi?id=345824#c4
Making sure that rx and x compare equal.
Followup to r14898 which changes the storage of segment names by
putting them into a string table.
This patch adds reference counting to segment names and frees them
when they are no longer used. The so freed memory can be reclaimed to
store future segment names.
New file coregrind/m_aspacemgr/aspacemgr-segnames.c which has all the
code dealing with segment names. Carved out of aspacemgr-linux.c
Detailled comments in the code.
Fixes BZ 344559.
The linux launcher showed some odd behaviour. When given a shell script
named 'now' with this contents:
#!
/bin/date
the platform selection logic does this:
--11196:1:launcher no tool requested, defaulting to 'memcheck'
--11196:2:launcher selecting platform for './now'
--11196:2:launcher selecting platform for './now'
--11196:2:launcher opened './now'
--11196:2:launcher read 13 bytes from './now'
--11196:2:launcher selecting platform for ''
--11196:2:launcher selecting platform for '/home/florian/bin/'
--11196:2:launcher opened '/home/florian/bin/'
--11196:2:launcher selected platform 'unknown'
--11196:1:launcher no platform detected, defaulting platform to 'amd64-linux'
That is not quite right. Instead the platform should be determined by
examining the default shell.
Additionally, define VKI_BINPRM_BUF_SIZE because on linux only that many
characters are considered on a #! line. C.f. <linux>/fs/binfmt_script.c
m_ume/* needs to be adapted as well but that is a different patch.
Add testcase for BZ 231357.
To do that a small enhancement to vg_regtest was needed:
(1) New declaration to allow specifying an environemnt variable
that is set prior to invoking valgrind.
eg: env: VAR=VAL
There can be more than one such declaration
(2) prog-asis: program_name
This is like prog: except the program name is not prefixed with
the testdir.
Further reduction of the size of the sector TTE tables
For default memcheck configuration, 32 bits) this patch
decreases by 13.6 MB ie. from 89945856 to 76317696.
Note that the type EClassNo is introduced only for readibility
purpose (and avoid some cast). That does not change the size
of the TTEntry.
The TTEntry size is reduced by using unions and/or Bool on 1 bit.
No performance impact detected (outer callgrind/inner memcheck bz2
on x86 shows a small improvement).