Andreas Arnez [Tue, 25 Feb 2020 14:54:46 +0000 (15:54 +0100)]
s390x: Replace use of deprecated Iop_Clz64 operator
The operator Iop_Clz64 has been deprecated. Drop it in the s390x backend
and replace it by Iop_ClzNat64. Previously s390_irgen_FLOGR() handled the
value zero specially and replaced it by 1 before applying Iop_Clz64. With
Iop_ClzNat64 this is no longer needed, so remove this special-case
handling.
Carl Love [Fri, 21 Feb 2020 23:22:26 +0000 (17:22 -0600)]
PPC64, fix for alignment of the rt_sigframe data structure.
The PPC64 implementation checks that the data structure is aligned. The
changes in commit listed below breaks the alignment. This patch adds an
explicit alignment directive to ensure the data structure is allocated
with the required alignment. This fixes 31 stderr failures, 10 stdout
failures on the Power 7, Power 8 and Power 9 platforms.
Tom Hughes [Thu, 20 Feb 2020 09:14:24 +0000 (09:14 +0000)]
Allow clone with CLONE_VFORK and no CLONE_VM
The CLONE_VFORK flag causes the parent to suspend until the child
exits or execs so without the memory sharing CLONE_VM would give
this is really closer to fork but we convert vfork to fork by
removing CLONE_VM anyway so there is no reason not to allow this.
Andreas Arnez [Wed, 12 Feb 2020 13:13:55 +0000 (14:13 +0100)]
Bug 417452 - s390x: Force 12-bit amode for vector stores in isel
It was seen that the s390 instruction selector chose a wrong addressing
mode for storing a vector register. The VST instruction only handles
short (12-bit unsigned) displacements, but a long (20-bit signed)
displacement was generated instead, resulting in a panic:
vex: the `impossible' happened:
s390_insn_store_emit: unknown dst->tag for HRcVec128
The fix prevents long displacements for vector store operations. It also
optimizes vector store operations from an Iex_Get, by converting them to a
memory copy. This optimization was already performed for integer
registers.
Andreas Arnez [Mon, 10 Feb 2020 12:37:03 +0000 (13:37 +0100)]
s390x: Fix printing of virtual register numbers
As noticed by Julian Seward, the code for printing s390x register names
currently does not show the virtual register numbers correctly. Although
it distinguishes between virtual and real registers, it uses the hardware
register number for both cases. This is fixed.
Andreas Arnez [Thu, 16 Jan 2020 12:49:10 +0000 (13:49 +0100)]
Bug 416301 - s390x: Support "compare and signal" instructions
Add VEX support for the s390x "compare and signal" instructions KEBR,
KDBR, KXBR, KEB, and KDB. For now, let them behave exactly like their
non-signalling counterparts. Enhance the bfp-4 test case to cover these
instructions as well. Update the list of supported instructions in
s390-opcodes.csv. Add a disclaimer to README.s390, explaining that FP
signalling is not handled accurately on s390x at the moment.
Khem Raj [Tue, 28 Jan 2020 03:50:04 +0000 (19:50 -0800)]
drd/tests/pth_detached3: Make pthread_detach() call portable across platforms
pthread_t is opaque type therefore we can not apply simple arithmetic to
variables of pthread_t type this test needs to pass a invalid pthread_t
handle, typcasting to uintptr_t works too and is portable across glibc and
musl
Fixes
| pth_detached3.c:24:25: error: invalid use of undefined type 'struct __pthread'
| 24 | pthread_detach(thread + 8);
| | ^
[ bvanassche: reformatted patch description and fixed up line numbers ]
Rhys Kidd [Tue, 28 Jan 2020 08:33:03 +0000 (19:33 +1100)]
Fix non-glibc build of the test suite with s390x_features
s390x_features is built unconditionally on a range of platforms, accordingly
any non-portable or glibc-specific functionality must be guarded.
Fixes error reported when running 'make check' or 'make regtest' on a platform
with an alternative libc that Valgrind supports, in this case Apple's libc:
s390x_features.c:13:10: fatal error: 'features.h' file not found
#include <features.h> // __GLIBC_PREREQ
^
1 error generated.
Fixes: 161d22f0a ("s390x: Fix vector facility (vx) check in test suite")
Mark Wielaard [Sat, 25 Jan 2020 17:34:58 +0000 (18:34 +0100)]
x86 and amd64 tests: Use .text and .previous around all top-level asm.
GCC10 defaults to -fno-common which exposes some latent bugs in
some of the top-level asm code in various .c test files. Some of the
tests started to segfault (even if not run under valgrind). Such code
needs to be wrapped inside a .text and a .previous asm statement to
make sure the code is generated in the .text code section and to
make sure the compiler doesn't lose track of the section currently
being used to generate data or code in. Without it code might be
generated inside a data section or the other way around.
Mark Wielaard [Sat, 25 Jan 2020 16:44:43 +0000 (17:44 +0100)]
Revert accidentially added changes in commit ce094ba912
These changes were part of my local testing of bug 416667
gcc10 ppc64le impossible constraint in 'asm' in test_isa
And shouldn't have been committed yet before review.
Mark Wielaard [Fri, 24 Jan 2020 10:26:25 +0000 (11:26 +0100)]
Fix tests/x86/incdec_alt.c asm for GCC10.
Thanks to Jakub Jelinek. The test is broken. It blindly assumes the
toplevel inline asm is placed into some sensible section, but that is
a wrong assumption. The right thing is to start the inline asm with
.text directive and end with .previous. The reason gcc 10 breaks it
is the -fno-common default, the int r1, ... vars are emitted into .bss
section and that is the section that is current when the inline asm is
emitted previously they were in .common at the end of the assembly file.
Mark Wielaard [Thu, 23 Jan 2020 20:30:59 +0000 (21:30 +0100)]
Fix GCC10 issue in guest_s390_defs.h typedef enum type s390x_vec_op_t.
GCC10 defaults to -fno-common which produces this error:
guest_s390_defs.h:291: multiple definition of `s390x_vec_op_t
This is because GCC10 detects there are multiple definitions of the
variable s390x_vec_op_t. We don't want to define a variable though.
We had wanted to define a type (one that currently isn't used).
Fix this by making it a typedef enum.
Guard withinEpsOf[FD] within none/tests/mips32/msa_fpu.c
Enclose the recently introduced functions with preprocessor guards,
much like the rest of the code is inside the main function.
Also mark the functions as static.
Minor code formatting.
mips64: rework math tests to take into account allowed approximation
Change the math tests to check whether the results are approximate to the
expected values instead of checking for exact matches since the calculations
in question are allowed to be approximate.
This fixes
/none/tests/mips64/test_math and
/none/tests/mips64/msa_fpu
This might happen when the source contains something like
if (something_involving_pcmpxstrx && foo) { .. }
which might use amd64g_dirtyhelper_PCMPxSTRx.
mips: Fix return from syscall mechanism for nanoMIPS
- Restore guest sigmask in VG_(sigframe_destroy)
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in VG_(nanomips_linux_SUBST_FOR_rt_sigreturn)
- Call ML_(fixup_guest_state_to_restart_syscall) from PRE(sys_rt_sigreturn)
- Tiny code refactor of sigframe-nanomips-linux.c
mips: Fix UASWM and UALWM instructions for nanoMIPS
UASWM and UALWM have not been implemented correctly.
Code used to implement SWM and LWM has been reused without making all of
the required adjustments.
During a save (push) instruction adjusting the SP is required before doing
a store, otherwise Memcheck reports warning because of a write operation
outside of the stack area.
Petar Jovanovic [Tue, 14 Jan 2020 09:31:48 +0000 (09:31 +0000)]
mips: Fix clone syscall for nanoMIPS
- Reset syscall return register (a0) in clone_new_thread()
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in ML_ (call_on_new_stack_0_1)()
- Optimize stack usage in ML_ (call_on_new_stack_0_1)()
- Code refactor of ML_ (call_on_new_stack_0_1)()
It partially fixes all tests which use clone system call, e.g. none/tests/pth_atfork1.
Julian Seward [Thu, 2 Jan 2020 08:32:19 +0000 (09:32 +0100)]
amd64 insn selector: improved handling of Or1/And1 trees.
This splits function iselCondCode into iselCondCode_C and iselCondCode_R, the
former of which is the old one that computes boolean expressions into an amd64
condition code, but the latter being new, and computes boolean expressions
into the lowest bit of an integer register. This enables much better code
generation for Or1/And1 trees, which now result quite commonly from the new
&&-recovery machinery in the front end.
Julian Seward [Thu, 2 Jan 2020 08:23:46 +0000 (09:23 +0100)]
amd64 back end: generate 32-bit shift instructions for 32-bit IR shifts.
Until now these have been handled by possibly widening the value to 64 bits,
if necessary, followed by a 64-bit shift. That wastes instructions and code
space.
Julian Seward [Thu, 2 Jan 2020 08:10:06 +0000 (09:10 +0100)]
Enable expensive handling of CmpEQ64/CmpNE64 for amd64 by default.
This has unfortunately become necessary because optimising compilers are
generating 64-bit equality comparisons on partially defined values on this
target. There will shortly be two followup commits which partially mitigate
the resulting performance loss.
Julian Seward [Sun, 15 Dec 2019 19:14:37 +0000 (20:14 +0100)]
'grail' fixes for MIPS:
This isn't a good result. It merely disables the new functionality on MIPS
because enabling it causes segfaults, even with --tool=none, the cause of
which are not obvious. It is only chasing through conditional branches that
is disabled, though. Chasing through unconditional branches (jumps and calls
to known destinations) is still enabled.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on MIPS.
Julian Seward [Sun, 1 Dec 2019 06:01:20 +0000 (07:01 +0100)]
'grail' fixes for s390x:
This isn't a good result. It merely disables the new functionality on s390x,
for the reason stated below.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on s390x, since it causes Memcheck to crash for
reasons I couldn't figure out. It also exposes some missing Iex_ITE cases
in the s390x insn selector, although those shouldn't be a big deal to fix.
Maybe it's some strangeness to do with the s390x "ex" instruction. I don't
exactly understand how that trickery works, but from some study of it, I
didn't see anything obviously wrong.
It is only chasing through conditional branches that is disabled for s390x.
Chasing through unconditional branches (jumps and calls to known
destinations) is still enabled.
* host_s390_isel.c s390_isel_cc(): No functional change. Code has been added
here to handle the new Iop_And1 and Iop_Or1, and it is somewhat tested, but
is not needed until conditional branch chasing is enabled on s390x.
Julian Seward [Thu, 21 Nov 2019 19:03:47 +0000 (20:03 +0100)]
Tidy up ir_opt.c aspects relating to the 'grail' work. In particular:
* Rewrite do_minimal_initial_iropt_BB so it doesn't do full constant folding;
that is unnecessary expense at this point, and later passes will do it
anyway
* do_iropt_BB: don't flatten the incoming block, because
do_minimal_initial_iropt_BB will have run earlier and done so. But at least
for the moment, assert that it really is flat.
* VEX/priv/guest_generic_bb_to_IR.c create_self_checks_as_needed: generate
flat IR so as not to fail the abovementioned assertion.
I believe this completes the target-independent aspects of this work, and also
the x86_64 specifics (of which there are very few).
Julian Seward [Mon, 18 Nov 2019 18:12:49 +0000 (19:12 +0100)]
Rationalise --vex-guest* flags in the new IRSB construction framework
* removes --vex-guest-chase-cond=no|yes. This was never used in practice.
* rename --vex-guest-chase-thresh=<0..99> to --vex-guest-chase=no|yes. In
otherwords, downgrade it from a numeric flag to a boolean one, that can
simply disable all chasing if required. (Some tools, notably Callgrind,
force-disable block chasing, so this functionality at least needs to be
retained).
test %al,%al // "if (return value of call is nonzero) { .."
je 7ed9e08 // "je after"
..
after:
That is, the && has been evaluated right-to-left. This is a correct
transformation if the compiler can prove that the call to |isRect| returns
|false| along any path on which it does not write its out-parameter
|&isClosed|.
In general, for the lazy-semantics (L->R) C-source-level && operator, we have
|A && B| == |B && A| if you can prove that |B| is |false| whenever A is
undefined. I assume that clang has some kind of interprocedural analysis that
tells it that. The compiler is further obliged to show that |B| won't trap,
since it is now being evaluated speculatively, but that's no big deal to
prove.
A similar result holds, per de Morgan, for transformations involving the C
language ||.
Memcheck correctly handles bitwise &&/|| in the presence of undefined inputs.
It has done so since the beginning. However, it assumes that every
conditional branch in the program is important -- any branch on uninitialised
data is an error. However, this idiom demonstrates otherwise. It defeats
Memcheck's existing &&/|| handling because the &&/|| is spread across two
basic blocks, rather than being bitwise.
This initial commit contains a complete initial implementation to fix that.
The basic idea is to detect the && condition spread across two blocks, and
transform it into a single block using bitwise &&. Then Memcheck's existing
accurate instrumentation of bitwise && will correctly handle it. The
transformation is
<contents of basic block A>
C1 = ...
if (!C1) goto after
.. falls through to ..
<contents of basic block B>
C2 = ...
if (!C2) goto after
.. falls through to ..
after:
===>
<contents of basic block A>
C1 = ...
<contents of basic block B, conditional on C1>
C2 = ...
if (!C1 && !C2) goto after
.. falls through to ..
after:
This assumes that <contents of basic block B> can be conditionalised, at the
IR level, so that the guest state is not modified if C1 is |false|. That's
not possible for all IRStmt kinds, but it is possible for a large enough
subset to make this transformation feasible.
There is no corresponding transformation that recovers an || condition,
because, per de Morgan, that merely corresponds to swapping the side exits vs
fallthoughs, and inverting the sense of the tests, and the pattern-recogniser
as implemented checks all possible combinations already.
The analysis and block-building is performed on the IR returned by the
architecture specific front ends. So they are almost not modified at all: in
fact they are simplified because all logic related to chasing through
unconditional and conditional branches has been removed from them, redone at
the IR level, and centralised.
The only file with big changes is the IRSB constructor logic,
guest_generic_bb_to_IR.c (a.k.a the "trace builder"). This is a complete
rewrite.
There is some additional work for the IR optimiser (ir_opt.c), since that
needs to do a quick initial simplification pass of the basic blocks, in order
to reduce the number of different IR variants that the trace-builder has to
pattern match on. An important followup task is to further reduce this cost.
There are two new IROps to support this: And1 and Or1, which both operate on
Ity_I1. They are regarded as evaluating both arguments, consistent with AndXX
and OrXX for all other sizes. It is possible to synthesise at the IR level by
widening the value to Ity_I8 or above, doing bitwise And/Or, and re-narrowing
it, but this gives inefficient code, so I chose to represent them directly.
The transformation appears to work for amd64-linux. In principle -- because
it operates entirely at the IR level -- it should work for all targets,
providing the initial pre-simplification pass can normalise the block ends
into the required form. That will no doubt require some tuning. And1 and Or1
will have to be implemented in all instruction selectors, but that's easy
enough.
Remaining FIXMEs in the code:
* Rename `expr_is_speculatable` et al to `expr_is_conditionalisable`. These
functions merely conditionalise code; the speculation has already been done
by gcc/clang.
* `expr_is_speculatable`: properly check that Iex_Unop/Binop don't contain
operatins that might trap (Div, Rem, etc).
* `analyse_block_end`: recognise all block ends, and abort on ones that can't
be recognised. Needed to ensure we don't miss any cases.
* maybe: guest_amd64_toIR.c: generate better code for And1/Or1
* ir_opt.c, do_iropt_BB: remove the initial flattening pass since presimp
will already have done it
* ir_opt.c, do_minimal_initial_iropt_BB (a.k.a. presimp). Make this as
cheap as possible. In particular, calling `cprop_BB_wrk` is total overkill
since we only need copy propagation.
* ir_opt.c: once the above is done, remove boolean parameter for `cprop_BB_wrk`.
sigprocmask should ignore HOW argument when SET is NULL.
Specific use case bug found in SysRes VG_(do_sys_sigprocmask).
Fix for case when ,,set,, parameter is NULL.
In this case ,,how,, parameter should be ignored because we are
only requesting from kernel to put current signal mask into ,,oldset,,.
But instead we determine the action based on ,,how,, parameter and
therefore make the system call fail when it should pass.
Taken from linux man pages (sigprocmask).
Andreas Arnez [Thu, 5 Dec 2019 17:22:43 +0000 (18:22 +0100)]
s390x: Fix offsets in comments to VexGuestS390XState
Each member of the structure declaration for `VexGuestS390XState' is
commented with its offset within the structure. But starting with
`guest_r0' and for all remaining members, these comments indicate the
wrong offsets, and the actual offsets are 8 bytes higher. Adjust the
comments accordingly.
Petar Jovanovic [Wed, 27 Nov 2019 12:22:46 +0000 (12:22 +0000)]
mips: enable sloppyXcheck for mips32 and mips64
Newer mips kernels (post 4.7.0) assign execute permissions to loadable
program segments which originally did not have them as per the
information provided in the elf file itself.
Include mips32/mips64 in the list of architectures for which the address
space manager should allow the kernel to report execute permissions in
sync_check_mapping_callback.