Julian Seward [Wed, 6 Aug 2014 19:52:12 +0000 (19:52 +0000)]
pre_mem_read_sockaddr: properly handle the NETLINK address family
rather than throwing to the default case. This stops Memcheck
reporting false positives for the NETLINK case.
fix 338024 inlined functions are not shown if DW_AT_ranges is used
Based on investigation and patch by Matthias Schwarzott.
(no small test found that reproduced the problem,
but the equivalent patch given in bug 338024 fixed the inlined stack
trace in a big shared lib).
Would be nice however to have a small test case ...
Bart Van Assche [Tue, 5 Aug 2014 11:35:48 +0000 (11:35 +0000)]
Linux system call wrappers: truncate ioctl request number to 32 bits
As explained in https://bugs.kde.org/show_bug.cgi?id=331829, when passing
an ioctl request number as an int to a function the request number will
be sign-extended to 64 bits on 64-bit systems. Avoid that this causes
Valgrind to fail to recognize an ioctl by truncating the request number
to 32 bits.
Fix dangling ref in m_errormgr.c + report all uninit fields in a syscall param
Some syscall verification code is allocating memory to generate
the string used to build an error, e.g. syswrap-generic.c verifying fields of
e.g socket addresses (pre_mem_read_sockaddr) or sendmsg/recvmsg args
(msghdr_foreachfield)
The allocated pointer was copied in the error created by VG_(maybe_record_error).
This was wrong for 2 reasons:
1. If the error is a new error, it is stored in a list of errors,
but the string memory was freed by pre_mem_read_sockaddr, msghdr_foreachfield, ...
This causes a dangling reference. Was at least visible when giving -v, which
re-prints all errors at the end of execution.
Probably this could have some consequences during run while generating new errors,
and comparing for equality with a recorded error having a dangling reference.
2. the same allocated string is re-used for each piece/field of the verified struct.
The code in mc_errors.c that checks that 2 errors are identical was then wrongly
considereing that 2 successive errors for 2 different fields for the same syscall
arg are identical, just because the error string happened to be produced at
the same address.
(it is believed that initially, the error string was assumed to be a static
string, which is not the case anymore, causing the above 2 problems).
Changes:
* The fix consists in duplicating in m_errormgr.c the given error string when
the error is recorded. In other words, the error string is now duplicated similarly
to the (optional) extra component of the error.
* memcheck/tests/linux/rfcomm.c test modified as now an error is reported
for each uninit field.
* socketaddr unknown family is also better reported (using sa_data field name,
rather than an empty field name.
* minor reformatting in m_errormgr.c, to be below 80 characters.
Some notes:
1. the string is only duplicated if the error is recorded
(ie. printed or the first time an error matches a suppression).
The string is not duplicated for duplicated errors or following errors
matching the first (suppressed) error.
The string is also not duplicated for 'unique errors' (that are printed
and then not recorded).
2. duplicating the string for each recorded error is not deemed to
use a lot of memory:
* error strings are usually NULL or short (often 10 bytes or so).
* we expect no program has a huge number of errors
If ever this string duplicate would be significant, having a DedupPoolAlloc
in m_errormgr.c for these strings would reduce this memory (as we expect to
have very few different strings, even with millions of errors).
* Add lock announcements in various helgrind errors that were not
announcing the locks
* ensure locks are also announced in xml (note that this is compatible
with xml protocol version 4, so no impact on GUI which properly
implement the protocol)
Changes done:
* Like other HG record_error functions, HG_(record_error_LockOrder) is
now passing Lock* rather than lock guest addresses.
* update exp files for tests that were showing locks without announcing them
* change tc14_laog_dinphils.c and tc15_laog_lockdel.c so as to
have same sizes on 32 and 64 bits systems for allocated or symbol sizes.
* factorise all code that was announcing first lock observation
* enable xml lock announcement
glibc 2.3.4 does not appear to define PTRACE_GETSIGINFO. This was
observed on a RHEL5 system on s390. Provide a suitable definition.
Tweak gdbserver_tests/filter_stderr to ignore messages related to
interrupted poll system calls.
Describe the lock address in a lock announcement message.
(note that some error messages are not announcing the lock,
which is not that nice).
At least the lock order violation message do not announce locks.
That should be improved/fixed
Factor out VG_(exit_now) to contain the syscall incantation to terminate
the process. Make ML_(am_exit) and VG_(exit) use it, thereby avoiding
double maintenance.
Introduce libcbase_assert macro and use it in VG_(strncpy_safely) to
document the case that function cannot handle.
Add stub functions to memcheck/tests/unit_libcbase.c to satisfy new
dependencies.
optimise readpdb.c filename and dirname handling, following r14158
r14158 introduced a dedup pool to store pairs (filename,dirname).
The windows debug info reader (readpdb.c) performance was still to be
improved, as calls to ML_(addFnDn) were done for each line loc to add.
With this patch, the nr of calls to ML_(addFnDn) should be reduced
significantly.
Code has been compiled and regtested on linux, but no windows/wine test
was done.
Add a new heuristic 'length64' to detect interior pointers
pointing at offset 64bit of a block, when the first 8 bytes contains
the block size - 8. This is e.g. used by sqlite3MemMalloc.
Patch adding (or showing the proper/not confusing) helgrind thread nr for block
and stack address description.
* A race condition on an allocated block shows the stacktrace, but
does not show the thread # that allocated the block.
This patch adds the output of the thread # that allocated the block.
* The patch also fixes the confusion that might appear between
the core threadid and the helgrind thread nr in Stack address description:
A printed stack addrinfo was containing a thread id, while all other helgrind
messages are using (supposed to use) an 'helgrind thread #' which
is used in the thread announcement.
Basically, the idea is to let a tool set a "tool specific thread nr'
in an addrinfo.
The pretty printing of the addrinfo is then by preference showing this
thread nr (if it was set, i.e. different of 0).
Currently, only helgrind uses this addrinfo tnr.
Note: in xml mode, the output is matching the protocol description.
I.e., GUI should not be impacted by this change, if they properly implement
the xml protocol.
* Also, make the output produced by m_addrinfo consistent:
The message 'block was alloc'd at' is changed to be like all other
output : one character indent, and starting with an uppercase
Mark Wielaard [Thu, 17 Jul 2014 13:29:43 +0000 (13:29 +0000)]
Omit frame pointer also for main in ppc ldst_multiple test.
Other functions already explicitly omitted the frame pointer. Also
do that for main to prevent gcc 4.8.2 complaining:
ldst_multiple.c: In function ‘main’:
ldst_multiple.c:180:5: error: frame pointer required, but reserved
int main(void)
^
ldst_multiple.c:31:18: note: for ‘r31’
register HWord_t r31 asm("r31");
Mark Wielaard [Thu, 17 Jul 2014 10:56:26 +0000 (10:56 +0000)]
ppc64 ifunc_wrapper add casts suggested by gcc warning.
vg_preloaded.c: In function ‘_vgnU_ifunc_wrapper’:
vg_preloaded.c:91:13: warning: assignment makes integer from pointer without a cast [enabled by default]
arm64: get_Dwarf_Reg: at least handle the case of requesting XSP
instead of failing. This makes some of the memcheck/tests/varinfo*
tests work somewhat correctly on arm64-linux.
Mark Wielaard [Tue, 15 Jul 2014 15:07:01 +0000 (15:07 +0000)]
Bug 337094 ifunc wrapper is broken on ppc64.
ppc64 uses function descriptors, so we need to get the actual function
entry address for the VG_USERREQ__ADD_IFUNC_TARGET client request, but
we need to return the function descriptor itself from the ifunc_wrapper.
So, peak dinfo memory decreases by about 36Mb, and final by 15Mb.
(for info, valgrind 3.9.0 uses
dinfo: 158941184/109666304 max/curr mmap'd, 156775944/107590656 max/curr
So, compared to 3.9.0, dinfo peak decreases by about 40%, and the final
memory is divided by more than 2).
The memory decrease is obtained by:
* using a dedup pool to store filename/dirname pair for the loctab source/line
information.
As typically, there is not a lot of such pairs, typically a UShort is
good enough to identify a fn/dn pair in a dedup pool.
To avoid losing memory due to alignment, the fndn indexes are stored
in a "parallel" array to the DiLoc loctab array, with entries having
1, or 2 or 4 bytes according to the nr of fn/dn pairs in the dedup pool.
See priv_storage.h comments for details.
(there was a extensible WordArray local implementation in readdwarf.c.
As with this change, we use an xarray, the local implementation was
removed).
* the memory needed for --read-inline-info is slightly decreased (-2Mb)
by removing the (unused) dirname from the DiInlLoc struct.
Handling dirname for inlined function caller implies to rework
the dwarf3 parser read_filename_table common to the var and inlinfo parser.
Waiting for this to be done, the dirname component is removed from DiInlLoc.
* the stabs reader (readstabs.c) is broken since 3.9.0.
For this change, the code has been updated to make it compile with the new
DiLoc/FnDn dedup pool. As the code is completely broken, a vg_assert(0)
has been put at the begin of the stabs reader.
* the pdb reader (readpdb.c) has been trivially updated and should still work.
It has not been tested (how do we test this ?).
A follow-up patch will be done to avoid doing too many calls to
ML_(addFnDn) : instead of having one call per ML_(addLineInfo), one
should have a single call done when reading the filename table.
This has also be tested in an outer/inner setup, to verify no
memory leak/bugs.
Rollback the (functional) effect of 13944 and 14134
Re-opening the FIFO before closing it gives (difficult to understand)
problems => rollback the change that keeps the FIFO opened.
Rather handle the race condition by retrying at vgdb side.
See extensive comments in remote-utils.c
Apply text_debug_bias to inline IP extracted from dwarf3
Without this biasing, inline info is not correct for shared objects.
Updated test varinfo5 to use --read-inline-info=yes and added
an inline test case.
Note: the varinfo reader does not understand the inlining info, and
so variables in inlined functions are not properly described.
OSX 10.9/10.8: Debuginfo reading FSM: enable recording of r-- mappings
so as to enable arrival at acceptance states via calls to
VG_(di_notify_vm_protect).
Darwin only: don't tell aspacemgr about the kernel commpage -- only
tell the tool. This is because telling the aspacemgr about it causes
the sync checker to fail entirely on Darwin.
Add -Wno-tautological-compare to the standard compile flags, if that
is accepted. With XCode 5.5.1 -Wtautological-compare appears to come
as standard, and it generates a lot of mostly useless noise.
Follow up to rev 13944
13944 objective was to avoid having a vgdb that connects to a just forked child
that would have the FIFO still opened, while its parent would close it.
However, in case a previous vgdb closed the FIFO, the read FIFO in the parent
is put in 'eof status' by the kernel. So, readchar will then return eof
in the parent unless another vgdb re-opens the FIFO in write.
So, gdbsrv does not stop anymore on error if needed, due to this readchar
giving eof.
The only way to reset this eof condition is to close the fd.
But we must always have the FIFO open (to avoid the race condition that
rev 13944 fixed)
=> in case of error, first re-open the FIFO, before closing the (previous)
FIFO fd (which is in eof state and cannot be properly used anymore).
Small fixes/improvements post-cfsi_m improvement
* Avoid printing the size of a null dedup pool
* Avoid warnings of 2 unused variables on some platforms
So, peak dinfo memory decreases by 21Mb, and final by 36Mb.
The memory decrease is obtained by:
* using a dedup pool to store the machine dependent part (cfsi_m)
of the cfsi information as this information is highly duplicated.
For x86 and arm64, the duplication factor of cfsi machine dependent
part is very high (up to a factor 60).
For arm64, it is more like a factor 3.
A 'variable size' (1, 2 or 4 bytes) is automatically used to identify
the cfsi_m, if there is less than or more than 255/64K different cfsi_m.
* not storing explicitely the length of a range for which a cfsi_m
is to be used: in a large majority of the cases, ranges are
consecutive, and so the end of a range is just one byte before
the start of the next range.
So, we do not store the length of the ranges.
If there is a hole between 2 ranges, the hole is stored explicitely
as a range in which we have no cfsi_m information.
On x86 and amd64, we have quite some holes (something like one hole
every 7 cfsi). On arm64, we have very few holes (less than one hole
every 50 cfsi).
Even with the nr of holes on x86/amd64, it is more memory efficient
to store the holes rather than to store the length of each cfsi.
* Merging consecutive ranges that have the same cfsi_m info:
Many cfsi are "mergeable": there is no hole between 2 cfsi, and their
machine dependent part is identical
(I guess the unwind info needed by valgrind is subset of the full
unwind info, and so, the cfsi entries are not merged by the compiler,
but can be merged for simple unwind). Depending on the platform
(x86, amd64, arm64) and of the library/object file, we can have a
significant nr of mergeable entries.
The patch is not very small, but a lot is mechanical changes.
The patch has been compiled and tested on x86/amd64/ppc32/ppc64
(but ppc does not use cfsi so that just verifies it compiles).
It has been compiled on arm64, and "tested" by launching valgrind on
one executable.
It has not been compiled on s390 and mips.
With some luck, maybe it will compile on these platforms.
And if that uses the whole provision of luck for 2014, it might even work
on these platforms :).
If it does not compile, the fix should be straightforward.
Runtime problems might be more tricky (but arm64 "worked out of the box"
once x86/amd64 were ok).
This has also be tested in an outer/inner setup, to verify no memory leak/bugs.
Shadow registers wronly shown by gdb on avx machine
For an unclear reason, the orig_rax register and its shadows are described in the
xml file using a register number.
This register number is correct on non avx machine, but is wrong on
avx machine, as these have more registers, which means that orig_raxs1 and s2
should have different numbers.
As no reason was found to have a register number explicitely give, remove this
regnnr from the xml file, and let gdb calculate it.
Fix a bug in the "numbering" dedup pool: as indicated in
pub_tool_deduppoolalloc.h, for "numbering" pool, there is no guarantee
that the address of an element is stable if a new element is inserted.
But m_deduppoolalloc.c was itself not taking this 'no guarantee' into account.
So, when the addresses of the elements are changed due to reallocation
of the only pool, apply an offset to the element addresses stored in
the dedup hash table.
Implement VG_(arena_realloc_shrink) similar to realloc, but can
only decrease the size of a block, does not change the address,
does not need to alloc another block and copy the memory,
and (if big enough) makes the excess memory available for other
allocations.
VG_(arena_realloc_shrink) is then used for debuginfo storage.c
(replacing an allocation + copy).
Also use it in the dedup pool, to recuperate the unused
memory of the last pool.
This also allows to re-increase the string pool size to the original
3.9.0 value of 64Kb. All this slightly decrease the peak and in use
memory of dinfo.
VG_(arena_realloc_shrink) will also be used to implement (in another patch)
a dedup pool which "numbers" the allocated elements.