Add intercepts for operator new(unsigned long) and operator
new[](unsigned long). The 32-bit ones take unsigned int args, not
unsigned longs, and so the existing name-set did not capture them.
Add 64-bit fast case handlers for loads/stores. On amd64,
MC_(helperc_LOADV8) compiles down to just 12 instructions for the
fast-path, which is pretty darn good.
Allow memcheck to take account of VGA_STACK_REDZONE_SIZE -- that is,
account for the fact that on amd64 (really, on amd64-linux) the area
up to 128 bytes below the stack pointer is accessible. This meant
moving the definitions of VGA_STACK_REDZONE_SIZE to tool-visible
places.
Handle 8-byte value-check failures using a special fast-case fn (like
0,1,4 sized) rather than the generic one. Remove size 2 since that
never seems to get used.
* Crank up the memcheck event-counting system, and enhance it to
name the events, rather than just number them, which makes it a
lot easier to use
* Based on that, fill in some fast-path cases
{LOAD,STORE}V{4,2,1}. The assembly code looks about the same
length as it did before, on x86. Fast-path cases for the
stack have yet to be done.
Fix a bunch of 64-bit cases required amd64. Stop to ponder whether
there is a better way to handle the 'pessimising cast' family of
operations in such a way that Vex's back-end instruction selectors can
generate better code than they do now, with less verbosity and general
confusingness in the insn selectors.
Initial rehash of Memcheck's shadow-space management to support both
32- and 64-bit targets, little- and big-endian. It does more or less
work on x86 as-is, although is unusably slow since I have knocked out
all the fast-path cases and am concentrating on getting the baseline
functionality correct. The fast cases will go back in in due course.
The fundamental idea is to retain the old 2-level indexing for speed,
even on a 64-bit target. Since that's clearly unviable on a 64-bit
target, the primary map handles only first N gigabytes of address
space (probably to be set to 16, 32 or 64G). Addresses above that are
handled slowly using an auxiliary primary map which explicitly lists
(base, &-of-secondary-map) pairs. The goal is to have the
address-space-manager try and put everything below the 16/32/64G
boundary, so we hit the fast cases almost all the time.
Performance of the 32-bit case should be unaffected since the fast map
will always cover at least the lowest 4G of address space.
There are many word-size and endianness cleanups.
Jeremy's distinguished-map space-compression scheme is retained, in
modified form, as it is simple and seems effective at reducing
Memcheck's space use.
Add another redirect that we need. This has no effect at present
because the redirect syms are set up only after the initial read of
/proc/self/maps and by then ld-linux.so.2 is already aboard. Fixing
this properly requires fixing the address space management stuff
properly.
Renamed vg_errcontext.c as errormgr.c, and carved off the relevant parts of
core.h and tool.h into pub_core_errormgr.h and pub_tool_errormgr.h. All
just to improve general modularity.
Added new assert macros vg_assert2 and tl_assert2 which allow you to print a
string explaining more detail if the assertion fails (eg. the value of the
bogus variable) using printf-style format arguments.
One consequence of this is that you can do something like
vg_assert2(0, "bad bad bad");
instead of calling VG_(core_panic). The advantage of the new approach is
that it shows the file/function/line info for the failing code, whereas
VG_(core_panic)() does not.
Fix a nasty assembler bug, in the handling of Set64, arising from
confusion over whether we were looking at a complete integer register
number or just the lower 3 bits of it. Rename functions pertaining to
messing with integer register numbers in an attempt to stop this
happening in future.
When generating IR for movsd mem->reg, don't first write the entire
guest reg with zeroes and then overwrite the lower half. This forces
the back end to generate code which creates huge write-after-write
stalls in the memory system of P4s due to the different sized writes.
This apparently small change reduces the run-time of one
sse2-intensive floating point program from 145 seconds to 90 seconds
(--tool=none).
Get rid of the --sloppy-malloc= flag and the functionality it
controlled (rounding user malloc requests up to a multiple of 4).
Subsequent changes to memcheck made it more or less pointless, it is a
time waster in the malloc/free path, and nobody ever used it AFAIK.
Renamed and retyped the fields relating to valgrind's stack in os_state_t to
make their role clearer and their behaviour more consistent with the fields
describing the client's stack. Also made the code in x86-linux/syscalls.c
and amd64-linux/syscalls.c more word-size-independent, which is not strictly
necessary but makes the code similarities between the two files more
obvious.
One consequence of this is that Valgrind's stack on AMD64 is now 16384 * 8
bytes, rather than 16384 * 4 bytes.
Whenever the flags thunk is set, fill in all the fields, even NDEP
which isn't usually used. This makes redundant-PUT elimination work
better, fixing a rather subtle optimisation bug. For at least one
floating-point case this gives a significant speedup. Consider a bb
like this:
(no flag setting insns before inc)
inc ...
(no flag setting insns)
add ...
inc sets CC_OP, CC_DEP1 and CC_NDEP; the latter is expensive because a
call to calculate_eflags_c is required.
add sets CC_OP, CC_DEP1 and CC_DEP2. The CC_NDEP value is now
irrelevant, but because CC_NDEP is not overwritten, iropt cannot
remove the previous assignment to it, and so the expensive helper call
remains even though it is irrelevant.
This commit fixes that: By always setting NDEP to zero whenever its
value will be unused, any previous assignment to it will be removed by
iropt.
This change should be propagated to the amd64 front end too.
#include <rude_words.h>. The recent change of denotation of no-op IR
statements from NULL to IRStmt_NoOp screwed up the how-much-to-unroll
heuristics in iropt, resulting in iropt being significantly less
enthusiastic about unrolling than it was before the change. Gaaah!
This commit should fix it.
Remove the x86-specific is_valid_data_size() test. Also, make any dataSize
greater than MIN_LINE_SIZE equal to MIN_LINE_SIZE. This makes the
x86/fpu-28-108 regression test pass.
minor cleanup with the stack-related fields in ThreadState:
- removed "stack_base" which wasn't used in any meaningful way
- added "client_" prefix to make it clear they concern the client's stack
- renamed "stack_size" as "client_stack_szB" to make the units clear
In vg_memory.c, allow the stack-change threshold to be specified by a
command-line flag (--max-stackframe=number), rather than hardwiring it
to 2000000. This is helpful for dealing with unruly Fortran programs
which want to allocate large arrays on the stack.
A major overhaul of how malloc/free intercepts are done. The general
idea is the same -- write functions with special names encoding
sonames and fn names, and have the redir mechanism notice them.
However the way the functions are generated is significantly changed:
* The name mangling scheme has been replaced with one which is just about
simple enough not to need a preprocessing phase. Hence
vg_replace_malloc.c.base is replaced by vg_replace_malloc.c, and
the preprocessor disappears. The demangler in vg_symtab2.c changes
accordingly.
* Kill off the horrendous LIBALIAS macro. In return we have to
enumerate all the redirections longhand, but this is not a big deal.
* Remove use of the GNUisms "attribute alias" and "attribute
protected".
* Remove the hardwired assumption that any C++ new/new[]/etc symbols
we might want to intercept are mangled in GNU style.
Remember to clear C2 after fsincos, as that actually makes it work
right with reasonable-sized inputs. This confirms fsincos as the
golden lemon of x87 floating point instructions, since Vex has by now
chomped through vast amounts of floating point code on x86 and this is
the first time this bug has come to light.
Tom Hughes [Fri, 1 Apr 2005 18:58:09 +0000 (18:58 +0000)]
Rework the vsyscall redirections to work in pie code - the old form
seemed to completely confuse the compiler and it was generating
nonsense code to get the address of the replacement routines.
Tom Hughes [Fri, 1 Apr 2005 08:07:54 +0000 (08:07 +0000)]
Run "make all" before "make install" as older versions of automake
don't put a dependency between the install target and $(BUILT_SOURCES)
so doing a straight install doesn't work.
Julian Seward [Thu, 31 Mar 2005 15:48:57 +0000 (15:48 +0000)]
Increase maximum translation size. This can happen when translating
long sequences of x86 insns with IR optimisation disabled, so the
tag-checking crap doesn't get knocked out like it usually does.
Tom Hughes [Thu, 31 Mar 2005 10:19:59 +0000 (10:19 +0000)]
Rework the nightly build script to stop as soon as one of the build
steps fails instead of carrying on with the other steps - this should
help ensure that the log fragment sent out contains useful information.
A second change is to ensure that if the regression tests complete
then the full results are included in the email - if they don't
complete then just the last 20 lines of output are sent as before.
Rename VG_(tool_interface), which is overly general and a bit verbose, as
VG_(tdict). Also make the typing more meaningful in vg_mallocfuncs_info.
And (barely) start removing the use of "TL_" names in the core.
This change reduces the number of calls to dlsym() when loading tools from a
lot to one. This required two basic changes.
1. Tools are responsible for telling the tool about any functions they
provide that the tool may call. This includes basic functions like
TL_(instrument)(), functions that assist core services such as
TL_(pp_Error)(), and malloc-replacement-related functions like
TL_(malloc)().
2. Tools that replace malloc now specify the size of the heap block redzones
through an arg to the VG_(malloc_funcs)() function, rather than with a
variable VG_(vg_malloc_redzone_szB).
One consequence of these changes is that VG_(tool_init_dlsym)() no longer
needs to be generated by gen_toolint.pl.
There are a number of further improvements that could follow on from this one.
- Avoid the confusingly different definitions of the TL_() macro in the
core vs. for tools. Indeed, the functions provided by the tools now don't
need to use the TL_() macro at all, as they can have arbitrary names.
- Remove a lot of the auto-generated stuff in vg_toolint.c and vg_toolint.h
(indeed, it might be possible to not auto-generate these at all, which
would be nice).
- The handling of VgToolInterface is currently split across vg_needs.c and
vg_toolint.c, which isn't nice.
Julian Seward [Wed, 30 Mar 2005 18:42:59 +0000 (18:42 +0000)]
Get rid of the use of VG_(instr_ptr_offset) since we know what that is
at system-build time: OFFSET_amd64_RIP. This saves an instruction on
the fast path, and reduces the number of PIE-difficulties by one.