Mark Wielaard [Mon, 1 Sep 2014 15:29:55 +0000 (15:29 +0000)]
Bug 338703 helgrind on arm-linux gets false positives in dynamic loader.
There are a couple of issues with helgrind on arm-linux with glibc:
- Thread creation stack traces cannot unwind through clone
(cfi ends right after syscall)
- ld.so has a special "hard float" name that isn't recognized as special
(ld-linux-armhf.so.3)
- Races are found when manipulating GOT sections.
Improve description of an address that is on a stack but below sp.
An address below the sp will be described as being on a stack, but below sp.
The stack for such an address is found in the registered stacks.
Also, if there is a guard page at the end of the stack (lowest address)
an address in this page will be described as being in thread guard page.
A guard page is recognised as being a page not readable/writable/executable.
Mark Wielaard [Sat, 30 Aug 2014 20:37:40 +0000 (20:37 +0000)]
Bug 338681 Enable clone backtrace hack for i386-linux in helgrind.
glibc doesn't provide CFI unwind information right after the clone call
(because it would be invalid in the child). Enable the same workaround
for i386-linux that is already used for amd64-linux (subtract 3 from ip).
Julian Seward [Sat, 30 Aug 2014 19:21:48 +0000 (19:21 +0000)]
Helgrind needs to know the soname of ld.so, and on arm64-linux
it is different (ld-linux-aarch64.so.1) from all other targets.
(Why?) This makes Helgrind at least somewhat usable on arm64-linux.
The semantic of the stack bounds is not consistent or is not described.
At various places, there were either some assumption that the 'end'
boundary (highest address) was either not included, included,
or was the highest addressable word, or the highest addressable byte.
This e.g. was very visible when doing:
./vg-in-place -d -d ./helgrind/tests/tc01_simple_race|&grep regi
giving
--24040:2:stacks register 0xBEDB4000-0xBEDB4FFF as stack 0
--24040:2:stacks register 0x402C000-0x4A2C000 as stack 1
showing that the main stack end was (on x86) not the highest word
but the highest byte, while for the thread 1, the registered end
was a byte not part of the stack.
The attached patch ensures that stack bounds semantic are documented and
consistent. Also, some of the stack handling code is factorised.
The convention that the patch ensures and documents is:
start is the lowest addressable byte, end is the highest addressable byte.
(the words 'min' and 'max' have been kept when already used, as this wording is
consistent with the new semantic of start/end).
In various debug log, used brackets [ and ] to make clear that
both bounds are included.
The code to guess and register the client stack was duplicated
in all the platform specific syswrap-<plat>-<os>.c files.
Code has been factorised in syswrap-generic.c
The patch has been regression tested on
x86, amd64, ppc32/64, s390x.
It has been compiled and one test run on arm64.
Not compiled/not tested on darwin, android, mips32/64, arm
More in details, the patch does the following:
coregrind/pub_core_aspacemgr.h
include/valgrind.h
include/pub_tool_machine.h
coregrind/pub_core_scheduler.h
coregrind/pub_core_stacks.h
- document start/end semantic in various functions
also in pub_tool_machine.h:
- replaces unclear 'bottommost address' by 'lowest address'
(unclear as stack bottom is or at least can be interpreted as
the 'functional' bottom of the stack, which is the highest
address for 'stack growing downwards').
coregrind/pub_core_initimg.h
replace unclear clstack_top by clstack_end
coregrind/m_main.c
updated to clstack_end
coregrind/pub_core_threadstate.h
renamed client_stack_highest_word to client_stack_highest_byte
coregrind/m_scheduler/scheduler.c
computes client_stack_highest_byte as the highest addressable byte
Update comments in call to VG_(show_sched_status)
coregrind/m_machine.c
coregrind/m_stacktrace.c
updated to client_stack_highest_byte, and switched
stack_lowest/highest_word to stack_lowest/highest_byte accordingly
coregrind/m_stacks.c
clarify semantic of start/end,
added a comment to indicate why we invert start/end in register call
(note that the code find_stack_by_addr was already assuming that
end was included as the checks were doing e.g.
sp >= i->start && sp <= i->end
coregrind/pub_core_clientstate.h
coregrind/m_clientstate.c
renames Addr VG_(clstk_base) to Addr VG_(clstk_start_base)
(start to indicate it is the lowest address, base suffix kept
to indicate it is the initial lowest address).
coregrind/m_initimg/initimg-darwin.c
updated to VG_(clstk_start_base)
replace unclear iicii.clstack_top by iicii.clstack_end
updated clstack_max_size computation according to both bounds included.
coregrind/m_initimg/initimg-linux.c
updated to VG_(clstk_start_base)
updated VG_(clstk_end) computation according to both bounds included.
replace unclear iicii.clstack_top by iicii.clstack_end
coregrind/pub_core_aspacemgr.h
extern Addr VG_(am_startup) : clarify semantic of the returned value
coregrind/m_aspacemgr/aspacemgr-linux.c
removed a copy of a comment that was already in pub_core_aspacemgr.h
(avoid double maintenance)
renamed unclear suggested_clstack_top to suggested_clstack_end
(note that here, it looks like suggested_clstack_top was already
the last addressable byte)
* factorisation of the stack guessing and registration causes
mechanical changes in the following files:
coregrind/m_syswrap/syswrap-ppc64-linux.c
coregrind/m_syswrap/syswrap-x86-darwin.c
coregrind/m_syswrap/syswrap-amd64-linux.c
coregrind/m_syswrap/syswrap-arm-linux.c
coregrind/m_syswrap/syswrap-generic.c
coregrind/m_syswrap/syswrap-mips64-linux.c
coregrind/m_syswrap/syswrap-ppc32-linux.c
coregrind/m_syswrap/syswrap-amd64-darwin.c
coregrind/m_syswrap/syswrap-mips32-linux.c
coregrind/m_syswrap/priv_syswrap-generic.h
coregrind/m_syswrap/syswrap-x86-linux.c
coregrind/m_syswrap/syswrap-s390x-linux.c
coregrind/m_syswrap/syswrap-darwin.c
coregrind/m_syswrap/syswrap-arm64-linux.c
Some files to look at more in details:
syswrap-darwin.c : the handling of sysctl(kern.usrstack) looked
buggy to me, and has probably be made correct by the fact that
VG_(clstk_end) is now the last addressable byte. However,unsure
about this, as I could not find any documentation about
sysctl(kern.usrstack). I only find several occurences on the web,
showing that the result of this is page aligned, which I guess
means it must be 1+ the last addressable byte.
syswrap-x86-darwin.c and syswrap-amd64-darwin.c
I suspect the code that was computing client_stack_highest_word
was wrong, and the patch makes it correct.
syswrap-mips64-linux.c
not sure what to do for this code. This is the only code
that was guessing the stack differently from others.
Kept (almost) untouched. To be discussed with mips maintainers.
coregrind/pub_core_libcassert.h
coregrind/m_libcassert.c
* void VG_(show_sched_status):
renamed Bool valgrind_stack_usage to Bool stack_usage
if stack_usage, shows both the valgrind stack usage and
the client stack boundaries
coregrind/m_scheduler/scheduler.c
coregrind/m_gdbserver/server.c
coregrind/m_gdbserver/remote-utils.c
Updated comments in callers to VG_(show_sched_status)
Julian Seward [Fri, 29 Aug 2014 19:12:38 +0000 (19:12 +0000)]
run_thread_for_a_while: Make the computation of done_this_time less
bogus, and in particular ensure that it can't be zero if in fact the
thread did do some useful work. Fix up a couple of associated
assertions. Fixes #336435.
Mark Wielaard [Fri, 29 Aug 2014 14:28:30 +0000 (14:28 +0000)]
Use getdents64 syscall on linux.
getdents has been deprecated since linux 2.4 and newer arches (arm64)
might no longer provide the getdents syscall. Use getdents64 for reading
the /proc/self/fd/ dir so --track-fds=yes works reliable on all arches.
Without this the none/tests/fdleak*vgtest might fail.
Fix a bunch of defined(VGA_ppc64)
(a.o. this was making leak_cpp_interior test failing,
as the ppc64 specific code in mc_leakcheck.c was not compiled in)
Mark Wielaard [Thu, 28 Aug 2014 14:59:04 +0000 (14:59 +0000)]
Bug 338615 suppress glibc 2.20 optimized strcmp implementation for ARMv7.
Add an add_hardwired_spec for strcmp in VG_(redir_initialise) for
ld-linux.so.3 and ld-linux-armhf.so.3 to use a simple strcmp
implementation in m_trampoline.S (compiled from the trivial .c code
to asm with gcc like the other implementations in that file).
338499 --sim-hints parsing broken due to wrong order in tokens (after introduction of no-nptl-pthread-stackcache)
Fix the token order in m_main.c
Somewhat retested by running the regression tests
(testing no-nptl-pthread-stackcache) and testing in an outer/inner setup
(testing enable-outer,no-inner-prefix).
It seems there is no regtest for the 2 other flags (lax-ioctls,fuse-compatible)
Julian Seward [Fri, 22 Aug 2014 19:07:12 +0000 (19:07 +0000)]
mc_LOADV_128_or_256_slow: change a constant from V_BITS8_DEFINED
to V_BITS64_DEFINED so as to be consistent with the rest of the
types in this function. Since both values are zero it gives no
functional change.
Mark Wielaard [Fri, 22 Aug 2014 10:14:28 +0000 (10:14 +0000)]
Tweak gdbserver_tests/hgtls.stdoutB.exp filter_gdb a little for older GDB.
Older GDB (7.2 on i386) don't print out which variable+offset an argument
pointer in a breakpoint function points to. The hgtls test already tests
whether the p pointer/test points to the expected tests array element.
So don't expect gdb to print it also and filter it out with filter_gdb.
Follow up to r14313: disable stack cache earlier
glibc is recycling memory for detached threads before a thread
termination => disable the stack cache earlier (i.e. once
a 'non main thread' is seen)
Mark Wielaard [Thu, 21 Aug 2014 10:04:04 +0000 (10:04 +0000)]
Check some known PATHs for mpicc in configure.
On some distributions (fedora) mpicc not installed on the default PATH.
Add a search path for finding mpicc by default if it is installed.
The user can still override the used mpicc compiler using --with-mpicc=.
Explicitly say we are checking for -mpreferred-stack-boundary=2.
Only 2 is ever used when it is supported. Some gcc versions/arches
(e.g. GCC 4.8 for x86_64) support -mpreferred-stack-boundary, but
only between 4 and 12. The message that -mpreferred-stack-boundary
wasn't supported was a little confusing. So explicitly say we checked
for -mpreferred-stack-boundary=2.
Florian Krohm [Wed, 20 Aug 2014 21:04:14 +0000 (21:04 +0000)]
Clean up confusion about VG_(args_the_exename) which was believed to
possibly be NULL in several places. Nowadays, VG_(ii_create_image) will
terminate the process if VG_(args_the_exename) is NULL.
Julian Seward [Wed, 20 Aug 2014 17:45:00 +0000 (17:45 +0000)]
Kind of a follow-up to r14237.
pre_mem_read_sockaddr: in the case where the caller doesn't
specify any address family (that is, the family is AF_UNSPEC)
don't perform any further checks on the supplied |sa| address
block, since doing so merely gives rise to false uninitialised
value errors.
Mark Wielaard [Wed, 20 Aug 2014 16:11:53 +0000 (16:11 +0000)]
configure should check for warning flags supported to disable them (#338205).
Configure would check whether gcc supported -Wno... flags. But gcc always
does. It is happy to just not warn about anything. So flip all configure
checks to test for the warning and only when gcc accepts the warning flag
use -Wno-...
Introduces two helper functions to make it easier to add new flag checks.
AC_GCC_WARNING_COND and AC_GCC_WARNING_SUBST_NO.
Add option a new sim-hint no-nptl-pthread-stackcache.
Activating this hint using --sim-hints=no-nptl-pthread-stackcache
means the glibc nptl stack cache will be disabled.
Disabling this stack/tls cache avoids helgrind false positive race conditions
errors when using __thread variables.
Note: disabling the stack cache is done by a kludge, dependent on
internal knowledge of glibc code, and using libpthread debug info.
So, this kludge might be broken with newer glibc version.
This has been tested on various platforms and various
glibc versions 2.11, 2.16 and 2.18
To check if the disabling works, you can do:
valgrind --tool=helgrind --sim-hints=no-nptl-pthread-stackcache -d -v ./helgrind/tests/tls_threads |& grep kludge
If you see the below 2 lines, then hopefully the stack cache has been disabled.
--12624-- deactivate nptl pthread stackcache via kludge: found symbol stack_cache_actsize at addr 0x3AF178
--12624:1:sched pthread stack cache size disabling done via kludge
The attached patch cleanups the clo processing
of clo which are (or should be) 'enum set'.
* pub_tool_options.h : add new macrox VG_USET_CLO and VG_USETX_CLO to
parse an 'enum set' command line option (with or without "all" keyword).
* use VG_USET_CLO for existing enum set clo options:
memcheck --errors-for-leak-kinds, --show-leak-kinds, --leak-check-heuristics
coregrind --vgdb-stop-at
* change --sim-hints and --kernel-variants to enum set
(this allows to detect user typos: currently, a typo in a sim-hint
or kernel variant is silently ignored. Now, an error will be given
to the user)
* The 2 new sets (--sim-hints and --kernel-variants) should not make
use of the 'all' keyword => VG_(parse_enum_set) has a new argument
to enable/disable the use of the "all" keyword.
* The macros defining an 'all enum' set definition was duplicating
all enum values (so addition of a new enum value could easily
give a bug). Removing these macros as they are unused
(to the exception of the leak-kind set).
For this set, the 'all macro' has been replaced by an 'all function',
coded using parse_enum_set parsing the "all" keyword.
* Unbreak build on macos (and on other platforms) that do not have
HAVE_DLINFO_RTLD_DI_TLS_MODID.
* add the copyright notice in auxprogs/getoff.c
* add more prereq to hgtls.vgtest
Some cleanup post QGetTlsAddr commit
* remove useless commented line in hgtls.vgtest
* avoid some #ifdef in arch specific code, hoping to discover more compile
time errors.
fix 338160: Implement QGetTlsAddr query so that GDB+V gdbsrv can print __thread variables.
To implement QGetTlsAddr, gdbsrv has to know how to get the glibc dtv
address and the module id from the link_map.
These 2 things are dependent on the internals of glibc.
The dependency is mostly isolated in a few lines of arch dependent
code or in an external utility that used a hack + -ldl lib to find
the offset of the modid in the link_map structure.
Tested on x86/amd64/ppc64/s390x. Somewhat tested on ppc32 and arm64.
Untested/a few #ifdef-ed lines not compiled on arm/mips32/mips64
and darwin.
For more background info about thread local storage handling, see
'ELF Handling For Thread-Local Storage' http://www.akkadia.org/drepper/tls.pdf
Changes:
* auxprogs/getoff.c new auxilliary program to get platform specific offsets
(currently only the offset for the module id in struct link_map).
* configure.ac : check for dlinfo(RTLD_DI_TLS_MODID) needed for getoff.c
* new gdbserver_tests/hgtls, testing various types of __thread variables
* various m_gdbserver files:
- implement decoding of the QGetTlsAddr query
- for each platform: platform specific code to get the dtv
- call to external program getoff-<platform> the first time an
__thread variable is printed.
Carl Love [Thu, 14 Aug 2014 16:54:48 +0000 (16:54 +0000)]
This commit is an update to Bugzilla 334836
There are two copies of the round test in none/tests/ppc32/round.c
and none/tests/ppc64/round.c. The two source files should be
identical. The LE functional test commit updated the round.c test for
ppc64 but was missing the ppc32 round updates. The round.c test was
updated to fix an issue where we were getting different outputs
depending on the compiler. The output is now consistent for the
compilers allowing the removal of the additional expect files for
ppc32 and ppc64.
Reduce memory needed for symbols, by having the tocptr and local_ep
(used for ppc64 platforms) #ifdef-ed and accessed by macros
that becomes NOP on non ppc64 platforms.
This decreases the debuginfo memory by about 2.5 Mb on a big 32 bit application.
Note : doing that, some questions were encountered in the way
tocptr and local_ep have (or do not have) to be copied/maintained
in storage.c canonicaliseSymtab
Florian Krohm [Tue, 12 Aug 2014 11:43:17 +0000 (11:43 +0000)]
Remove function VG_(sysnum_string_extra) as it was just a wrapper
around VG_(sysnum_string). Also remove associated macro
VG_SYSNUM_STRING_EXTRA.
The VG_SYSNUM_STRING macro returned a pointer to a variable which
is out of scope. Using that value may cause undefined behaviour.
Change VG_(sysnum_string) to return pointer to static buffer instead.
Fix call sites.