This might happen when the source contains something like
if (something_involving_pcmpxstrx && foo) { .. }
which might use amd64g_dirtyhelper_PCMPxSTRx.
mips: Fix return from syscall mechanism for nanoMIPS
- Restore guest sigmask in VG_(sigframe_destroy)
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in VG_(nanomips_linux_SUBST_FOR_rt_sigreturn)
- Call ML_(fixup_guest_state_to_restart_syscall) from PRE(sys_rt_sigreturn)
- Tiny code refactor of sigframe-nanomips-linux.c
mips: Fix UASWM and UALWM instructions for nanoMIPS
UASWM and UALWM have not been implemented correctly.
Code used to implement SWM and LWM has been reused without making all of
the required adjustments.
During a save (push) instruction adjusting the SP is required before doing
a store, otherwise Memcheck reports warning because of a write operation
outside of the stack area.
Petar Jovanovic [Tue, 14 Jan 2020 09:31:48 +0000 (09:31 +0000)]
mips: Fix clone syscall for nanoMIPS
- Reset syscall return register (a0) in clone_new_thread()
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in ML_ (call_on_new_stack_0_1)()
- Optimize stack usage in ML_ (call_on_new_stack_0_1)()
- Code refactor of ML_ (call_on_new_stack_0_1)()
It partially fixes all tests which use clone system call, e.g. none/tests/pth_atfork1.
Julian Seward [Thu, 2 Jan 2020 08:32:19 +0000 (09:32 +0100)]
amd64 insn selector: improved handling of Or1/And1 trees.
This splits function iselCondCode into iselCondCode_C and iselCondCode_R, the
former of which is the old one that computes boolean expressions into an amd64
condition code, but the latter being new, and computes boolean expressions
into the lowest bit of an integer register. This enables much better code
generation for Or1/And1 trees, which now result quite commonly from the new
&&-recovery machinery in the front end.
Julian Seward [Thu, 2 Jan 2020 08:23:46 +0000 (09:23 +0100)]
amd64 back end: generate 32-bit shift instructions for 32-bit IR shifts.
Until now these have been handled by possibly widening the value to 64 bits,
if necessary, followed by a 64-bit shift. That wastes instructions and code
space.
Julian Seward [Thu, 2 Jan 2020 08:10:06 +0000 (09:10 +0100)]
Enable expensive handling of CmpEQ64/CmpNE64 for amd64 by default.
This has unfortunately become necessary because optimising compilers are
generating 64-bit equality comparisons on partially defined values on this
target. There will shortly be two followup commits which partially mitigate
the resulting performance loss.
Julian Seward [Sun, 15 Dec 2019 19:14:37 +0000 (20:14 +0100)]
'grail' fixes for MIPS:
This isn't a good result. It merely disables the new functionality on MIPS
because enabling it causes segfaults, even with --tool=none, the cause of
which are not obvious. It is only chasing through conditional branches that
is disabled, though. Chasing through unconditional branches (jumps and calls
to known destinations) is still enabled.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on MIPS.
Julian Seward [Sun, 1 Dec 2019 06:01:20 +0000 (07:01 +0100)]
'grail' fixes for s390x:
This isn't a good result. It merely disables the new functionality on s390x,
for the reason stated below.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on s390x, since it causes Memcheck to crash for
reasons I couldn't figure out. It also exposes some missing Iex_ITE cases
in the s390x insn selector, although those shouldn't be a big deal to fix.
Maybe it's some strangeness to do with the s390x "ex" instruction. I don't
exactly understand how that trickery works, but from some study of it, I
didn't see anything obviously wrong.
It is only chasing through conditional branches that is disabled for s390x.
Chasing through unconditional branches (jumps and calls to known
destinations) is still enabled.
* host_s390_isel.c s390_isel_cc(): No functional change. Code has been added
here to handle the new Iop_And1 and Iop_Or1, and it is somewhat tested, but
is not needed until conditional branch chasing is enabled on s390x.
Julian Seward [Thu, 21 Nov 2019 19:03:47 +0000 (20:03 +0100)]
Tidy up ir_opt.c aspects relating to the 'grail' work. In particular:
* Rewrite do_minimal_initial_iropt_BB so it doesn't do full constant folding;
that is unnecessary expense at this point, and later passes will do it
anyway
* do_iropt_BB: don't flatten the incoming block, because
do_minimal_initial_iropt_BB will have run earlier and done so. But at least
for the moment, assert that it really is flat.
* VEX/priv/guest_generic_bb_to_IR.c create_self_checks_as_needed: generate
flat IR so as not to fail the abovementioned assertion.
I believe this completes the target-independent aspects of this work, and also
the x86_64 specifics (of which there are very few).
Julian Seward [Mon, 18 Nov 2019 18:12:49 +0000 (19:12 +0100)]
Rationalise --vex-guest* flags in the new IRSB construction framework
* removes --vex-guest-chase-cond=no|yes. This was never used in practice.
* rename --vex-guest-chase-thresh=<0..99> to --vex-guest-chase=no|yes. In
otherwords, downgrade it from a numeric flag to a boolean one, that can
simply disable all chasing if required. (Some tools, notably Callgrind,
force-disable block chasing, so this functionality at least needs to be
retained).
test %al,%al // "if (return value of call is nonzero) { .."
je 7ed9e08 // "je after"
..
after:
That is, the && has been evaluated right-to-left. This is a correct
transformation if the compiler can prove that the call to |isRect| returns
|false| along any path on which it does not write its out-parameter
|&isClosed|.
In general, for the lazy-semantics (L->R) C-source-level && operator, we have
|A && B| == |B && A| if you can prove that |B| is |false| whenever A is
undefined. I assume that clang has some kind of interprocedural analysis that
tells it that. The compiler is further obliged to show that |B| won't trap,
since it is now being evaluated speculatively, but that's no big deal to
prove.
A similar result holds, per de Morgan, for transformations involving the C
language ||.
Memcheck correctly handles bitwise &&/|| in the presence of undefined inputs.
It has done so since the beginning. However, it assumes that every
conditional branch in the program is important -- any branch on uninitialised
data is an error. However, this idiom demonstrates otherwise. It defeats
Memcheck's existing &&/|| handling because the &&/|| is spread across two
basic blocks, rather than being bitwise.
This initial commit contains a complete initial implementation to fix that.
The basic idea is to detect the && condition spread across two blocks, and
transform it into a single block using bitwise &&. Then Memcheck's existing
accurate instrumentation of bitwise && will correctly handle it. The
transformation is
<contents of basic block A>
C1 = ...
if (!C1) goto after
.. falls through to ..
<contents of basic block B>
C2 = ...
if (!C2) goto after
.. falls through to ..
after:
===>
<contents of basic block A>
C1 = ...
<contents of basic block B, conditional on C1>
C2 = ...
if (!C1 && !C2) goto after
.. falls through to ..
after:
This assumes that <contents of basic block B> can be conditionalised, at the
IR level, so that the guest state is not modified if C1 is |false|. That's
not possible for all IRStmt kinds, but it is possible for a large enough
subset to make this transformation feasible.
There is no corresponding transformation that recovers an || condition,
because, per de Morgan, that merely corresponds to swapping the side exits vs
fallthoughs, and inverting the sense of the tests, and the pattern-recogniser
as implemented checks all possible combinations already.
The analysis and block-building is performed on the IR returned by the
architecture specific front ends. So they are almost not modified at all: in
fact they are simplified because all logic related to chasing through
unconditional and conditional branches has been removed from them, redone at
the IR level, and centralised.
The only file with big changes is the IRSB constructor logic,
guest_generic_bb_to_IR.c (a.k.a the "trace builder"). This is a complete
rewrite.
There is some additional work for the IR optimiser (ir_opt.c), since that
needs to do a quick initial simplification pass of the basic blocks, in order
to reduce the number of different IR variants that the trace-builder has to
pattern match on. An important followup task is to further reduce this cost.
There are two new IROps to support this: And1 and Or1, which both operate on
Ity_I1. They are regarded as evaluating both arguments, consistent with AndXX
and OrXX for all other sizes. It is possible to synthesise at the IR level by
widening the value to Ity_I8 or above, doing bitwise And/Or, and re-narrowing
it, but this gives inefficient code, so I chose to represent them directly.
The transformation appears to work for amd64-linux. In principle -- because
it operates entirely at the IR level -- it should work for all targets,
providing the initial pre-simplification pass can normalise the block ends
into the required form. That will no doubt require some tuning. And1 and Or1
will have to be implemented in all instruction selectors, but that's easy
enough.
Remaining FIXMEs in the code:
* Rename `expr_is_speculatable` et al to `expr_is_conditionalisable`. These
functions merely conditionalise code; the speculation has already been done
by gcc/clang.
* `expr_is_speculatable`: properly check that Iex_Unop/Binop don't contain
operatins that might trap (Div, Rem, etc).
* `analyse_block_end`: recognise all block ends, and abort on ones that can't
be recognised. Needed to ensure we don't miss any cases.
* maybe: guest_amd64_toIR.c: generate better code for And1/Or1
* ir_opt.c, do_iropt_BB: remove the initial flattening pass since presimp
will already have done it
* ir_opt.c, do_minimal_initial_iropt_BB (a.k.a. presimp). Make this as
cheap as possible. In particular, calling `cprop_BB_wrk` is total overkill
since we only need copy propagation.
* ir_opt.c: once the above is done, remove boolean parameter for `cprop_BB_wrk`.
sigprocmask should ignore HOW argument when SET is NULL.
Specific use case bug found in SysRes VG_(do_sys_sigprocmask).
Fix for case when ,,set,, parameter is NULL.
In this case ,,how,, parameter should be ignored because we are
only requesting from kernel to put current signal mask into ,,oldset,,.
But instead we determine the action based on ,,how,, parameter and
therefore make the system call fail when it should pass.
Taken from linux man pages (sigprocmask).
Andreas Arnez [Thu, 5 Dec 2019 17:22:43 +0000 (18:22 +0100)]
s390x: Fix offsets in comments to VexGuestS390XState
Each member of the structure declaration for `VexGuestS390XState' is
commented with its offset within the structure. But starting with
`guest_r0' and for all remaining members, these comments indicate the
wrong offsets, and the actual offsets are 8 bytes higher. Adjust the
comments accordingly.
Petar Jovanovic [Wed, 27 Nov 2019 12:22:46 +0000 (12:22 +0000)]
mips: enable sloppyXcheck for mips32 and mips64
Newer mips kernels (post 4.7.0) assign execute permissions to loadable
program segments which originally did not have them as per the
information provided in the elf file itself.
Include mips32/mips64 in the list of architectures for which the address
space manager should allow the kernel to report execute permissions in
sync_check_mapping_callback.
Petar Jovanovic [Thu, 14 Nov 2019 12:32:50 +0000 (12:32 +0000)]
mips64: upgrade parts of valgrind's fast cache for the n32 abi
Update the list of architectures to differentiate between the n32 and n64 abi
for mips64 when defining the fast cache macros in
coregrind/pub_core_transtab_asm.h.
Also amend the VG_(disp_cp_xindir) function in
coregrind/m_dispatch/dispatch-mips64-linux.S to use word-sized loads in case
of the n32 abi since the FastCacheSet structure members are now 4 bytes in
size for mips64 n32.
The estimate instructions (rcpss, rcpps, rsqrtps, rsqrtss) are, as the
name suggests, not expected to give a fully accurate result. They may
produce slighly different results on different CPU families because
their results are not defined by the IEEE standard. This is the
reason avx-1 test fails on amd now.
This patch assumes there are only two implementations, the intel and
amd one. It moves these estimate instructions out of avx-1 and into
their own testcase - avx_estimate_insn and creates two different .exp
files for intel and amd.
Repair --px-file-backed broken due to dynamic option change.
The commit 3a803036f7 (Allow the user to change a set of command line options
during execution) removed by mistake the code handling the option
--px-file-backed.
Add it back, and modify a trivialleak.vgtest to use the 'VEX registers'
options setting (and their synonym) to do a minimal verification that
the options and synonyms are accepted.
The options are specifying the default values, they should not influence
the result of the test.
Andreas Arnez [Wed, 23 Oct 2019 18:35:50 +0000 (20:35 +0200)]
callgrind_annotate, cg_annotate: don't truncate function names at '#'
C++ function names can contain substrings like "{lambda()#1}". But
callgrind_annotate and cg_annotate interpret the '#'-character as a
comment marker anywhere on each input line, and thus truncate such names
there.
On the other hand, the documentation in docs/cl-format.xml, states:
Everywhere, comments on own lines starting with '#' are allowed.
This seems to imply that a comment line must start with '#' in the first
column. Thus skip exactly such lines in the input file and don't handle
'#' as a comment marker anywhere else.
Signed-off-by: Philippe Waroquiers <philippe.waroquiers@skynet.be>
Have callgrind producing event: lines before events: line.
callgrind_annotate expects the 'events:' line to be the last line
of the header of a Part.
When event: lines are after the events: line, these event: lines are
handled by the calllgrind_annotate body line logic, that does not recognises
them and generates warnings such as:
WARNING: line 18 malformed, ignoring
line: 'event: sysTime : sysTime (elapsed ns)'
WARNING: line 19 malformed, ignoring
line: 'event: sysCpuTime : sysCpuTime (system cpu ns)'
- The command option --collect-systime has been enhanced to specify
the unit used to record the elapsed time spent during system calls.
The command option now accepts the values no|yes|msec|usec|nsec,
where yes is a synonym of msec. When giving the value nsec, the
system cpu time of system calls is also recorded.
Note that the nsec option is not supported on Darwin.
include/vki: fix vki_siginfo_t definition on amd64, arm64, and ppc64
As it turned out, the size of vki_siginfo_t is incorrect on these 64-bit
architectures:
(gdb) p sizeof(vki_siginfo_t)
$1 = 136
(gdb) ptype struct vki_siginfo
type = struct vki_siginfo {
int si_signo;
int si_errno;
int si_code;
union {
int _pad[29];
struct {...} _kill;
struct {...} _timer;
struct {...} _rt;
struct {...} _sigchld;
struct {...} _sigfault;
struct {...} _sigpoll;
} _sifields;
}
It looks like that for this architecture, __VKI_ARCH_SI_PREAMBLE_SIZE
hasn't been defined properly, which resulted in incorrect
VKI_SI_PAD_SIZE calculation (29 instead of 28).
This issue has been discovered with strace's "make check-valgrind-memcheck",
which produced false out-of-bounds writes on ptrace(PTRACE_GETSIGINFO) calls:
SYSCALL[24264,1](101) sys_ptrace ( 16898, 24283, 0x0, 0x606bd40 )
==24264== Syscall param ptrace(getsiginfo) points to unaddressable byte(s)
==24264== at 0x575C06E: ptrace (ptrace.c:45)
==24264== by 0x443244: next_event (strace.c:2431)
==24264== by 0x443D30: main (strace.c:2845)
==24264== Address 0x606bdc0 is 0 bytes after a block of size 144 alloc'd
(Note that the address passed is 0x606bd40 and the address reported is
0x606bdc0).
After the patch, no such errors observed.
* include/vki/vki-amd64-linux.h [__x86_64__ && __ILP32__]
(__vki_kernel_si_clock_t): New typedef.
[__x86_64__ && __ILP32__] (__VKI_ARCH_SI_CLOCK_T,
__VKI_ARCH_SI_ATTRIBUTES): New macros.
[__x86_64__ && !__ILP32__] (__VKI_ARCH_SI_PREAMBLE_SIZE): New macro,
define to 4 ints.
* include/vki/vki-arm64-linux.h (__VKI_ARCH_SI_PREAMBLE_SIZE): Likewise.
* include/vki/vki-ppc64-linux.h [__powerpc64__] (__VKI_ARCH_SI_PREAMBLE_SIZE):
Likewise.
* include/vki/vki-linux.h [!__VKI_ARCH_SI_CLOCK_T]
(__VKI_ARCH_SI_CLOCK_T): New macro, define to vki_clock_t.
[!__VKI_ARCH_SI_ATTRIBUTES] (__VKI_ARCH_SI_ATTRIBUTES): New macro,
define to nil.
(struct vki_siginfo): Use __VKI_ARCH_SI_CLOCK_T type for _utime and
_stime fields. Add __VKI_ARCH_SI_ATTRIBUTES.
Announce fix 411134 Allow the user to change a set of command line options during execution
Note that the fix for 411134 contains a bunch of white space only changes.
To see the diff without the white spaces, do:
git diff -w 3a803036^..3a803036
Allow the user to change a set of command line options during execution.
This patch changes the option parsing framework to allow a set of
core or tool (currently only memcheck) options to be changed dynamically.
Here is a summary of the new functionality (extracted from NEWS):
* It is now possible to dynamically change the value of many command
line options while your program (or its children) are running under
Valgrind.
To have the list of dynamically changeable options, run
valgrind --help-dyn-options
You can change the options from the shell by using vgdb to launch
the monitor command "v.clo <clo option>...".
The same monitor command can be used from a gdb connected
to the valgrind gdbserver.
Your program can also change the dynamically changeable options using
the client request VALGRIND_CLO_CHANGE(option).
Here is a brief description of the code changes.
* the command line options parsing macros are now checking a 'parsing' mode
to decide if the given option must be handled or not.
(more about the parsing mode below).
* the 'main' command option parsing code has been split in a function
'process_option' that can be called now by:
- early_process_cmd_line_options
(looping over args, calling process_option in mode "Early")
- main_process_cmd_line_options
(looping over args, calling process_option in mode "Processing")
- the new function VG_(process_dynamic_option) called from
gdbserver or from VALGRIND_CLO_CHANGE (calling
process_option in mode "Dynamic" or "Help")
* So, now, during startup, process_option is called twice for each arg:
- once during Early phase
- once during normal Processing
Then process_option can then be called again during execution.
So, the parsing mode is defined so that the option parsing code
behaves differently (e.g. allows or not to handle the option)
depending on the mode.
// Command line option parsing happens in the following modes:
// cloE : Early processing, used by coregrind m_main.c to parse the
// command line options that must be handled early on.
// cloP : Processing, used by coregrind and tools during startup, when
// doing command line options Processing.
// clodD : Dynamic, used to dynamically change options after startup.
// A subset of the command line options can be changed dynamically
// after startup.
// cloH : Help, special mode to produce the list of dynamically changeable
// options for --help-dyn-options.
typedef
enum {
cloE = 1,
cloP = 2,
cloD = 4,
cloH = 8
} Clo_Mode;
The option parsing macros in pub_tool_options.h have now all a new variant
*_CLOM with the mode(s) in which the given option is accepted.
The old variant is kept and calls the new variant with mode cloP.
The function VG_(check_clom) in the macro compares the current mode
with the modes allowed for the option, and returns True if qq_arg
should be further processed.
For example:
// String argument, eg. --foo=yes or --foo=no
(VG_(check_clom) \
(qq_mode, qq_arg, qq_option, \
VG_STREQN(VG_(strlen)(qq_option)+1, qq_arg, qq_option"=")) && \
({const HChar* val = &(qq_arg)[ VG_(strlen)(qq_option)+1 ]; \
if VG_STREQ(val, "yes") (qq_var) = True; \
else if VG_STREQ(val, "no") (qq_var) = False; \
else VG_(fmsg_bad_option)(qq_arg, "Invalid boolean value '%s'" \
" (should be 'yes' or 'no')\n", val); \
True; }))
VG_BOOL_CLOM(cloP, qq_arg, qq_option, qq_var)
To make an option dynamically excutable, it is typically enough to replace
VG_BOOL_CLO(...)
by
VG_BOOL_CLOM(cloPD, ...)
For example:
- else if VG_BOOL_CLO(arg, "--show-possibly-lost", tmp_show) {
+ else if VG_BOOL_CLOM(cloPD, arg, "--show-possibly-lost", tmp_show) {
cloPD means the option value is set/changed during the main command
Processing (P) and Dynamically during execution (D).
Note that the 'body/further processing' of a command is only executed when
the option is recognised and the current parsing mode is ok for this option.
Mark Wielaard [Fri, 23 Aug 2019 20:17:57 +0000 (22:17 +0200)]
arm64 fixup for statx support on older kernels
Turns out (older) arm64 linux kernels don't have statx, but also not
stat64 and no stat syscalls. It uses fstatat instead. The new statx
patch also added a check for stat. So That needs a special case for
arm64.
Petar Jovanovic [Wed, 21 Aug 2019 16:08:42 +0000 (16:08 +0000)]
mips: Add nanoMIPS support to Valgrind 1/4
Necessary changes to support nanoMIPS on Linux.
Part 1/4 - VEX changes
Patch by Aleksandar Rikalo, Dimitrije Nikolic, Tamara Vlahovic and
Aleksandra Karadzic.
nanoMIPS architecture in brief
Designed for embedded devices, nanoMIPS is a variable lengths instruction
set architecture (ISA) offering high performance in substantially reduced
code size.
The nanoMIPS ISA combines recoded and new 16-, 32-, and 48-bit instructions
to achieve an ideal balance of performance and code density.
It incorporates all MIPS32 instructions and architecture modules including
MIPS DSP and MIPS MT, as well as new instructions for advanced code size
reduction.
nanoMIPS is supported in release 6 of the MIPS architecture. It is first
implemented in the new MIPS I7200 multi-threaded multi-core processor
series. Compiler support is included in the MIPS GNU-based development
tools.
Fix compilation problem when __NR_preadv2 __NR_pwritev2 are undefined
check_preadv2_pwritev2.c: In function ‘main’:
check_preadv2_pwritev2.c:12:12: error: ‘__NR_preadv2’ undeclared (first use in this function)
syscall(__NR_preadv2, 0, NULL, 0, 0, 0);
^
check_preadv2_pwritev2.c:12:12: note: each undeclared identifier is reported only once for each function it appears in
check_preadv2_pwritev2.c:15:12: error: ‘__NR_pwritev2’ undeclared (first use in this function)
syscall(__NR_pwritev2, 0, NULL, 0, 0, 0);