Andreas Arnez [Wed, 18 Mar 2020 11:24:25 +0000 (12:24 +0100)]
Bug 417281 - s390x: Fix register usage of conditional moves
The s390x register usage callback marks the target register of a
conditional move as HRmWrite only. It fails to mention the fact that the
target register is also an input to the insn (unless the condition is
"never" or "always").
This was discovered while investigating "grail" failures on s390x and
fixes the majority of them.
Andreas Arnez [Fri, 13 Mar 2020 16:20:20 +0000 (17:20 +0100)]
s390x: Actually use "load on condition" for conditional moves
Although the implementation of the cond_move insn is prepared to emit
"load on condition" instructions, it doesn't, because of a reversed check.
The check is supposed to prevent emitting LOCx instructions when the
condition code mask is set to "always", but it's accidentally negated.
Fix the reversal of the check, so LOCx instructions are actually emitted
when applicable.
Andreas Arnez [Mon, 9 Mar 2020 16:26:26 +0000 (17:26 +0100)]
s390x: Mark register usage with HRmModify when applicable
Instead of marking register usage for the same register with HRmRead and
HRmWrite separately, use HRmModify instead. This makes the code a bit
easier to read.
Andreas Arnez [Mon, 9 Mar 2020 14:14:16 +0000 (15:14 +0100)]
s390x: Enable 1- and 2-byte operands for v-test
The v-test operation tests its operand against zero and sets the condition
code accordingly. So far the operation was only supported for 4- and
8-byte operands.
Lift this restriction and enable 1- and 2-byte operands for v-test, using
the z/Architecture "test under mask" instructions TM, TMY, and TMLL.
Exploit this in the instruction selector, getting rid of the conversion to
a 4-byte operand. This slightly reduces the generated code on s390x.
Andreas Arnez [Wed, 5 Feb 2020 17:18:49 +0000 (18:18 +0100)]
s390x: Support And1/Or1, improve handling of Int1 expressions
This provides an instruction selector for Int1-expressions that supports
And1 and Or1. This implementation tries to keep values in registers as
much as possible, to avoid too many conversions from a Boolean value to a
condition code or vice versa. To this end, the new function
s390_isel_int1_expr() is added, which handles bit-typed expressions that
are supposed to end up in a register.
Also change the representation of Int1 values in registers and always
sign-extend them to 64 bits.
Andreas Arnez [Tue, 10 Mar 2020 16:18:48 +0000 (17:18 +0100)]
s390x: Fix down-cast from memory operand with size < 8
A down-cast always copies 8 bytes from the source operand, even if the
operand is actually smaller. This doesn't matter for register operands,
but it does for memory operands. Fix this and copy the correct number of
bytes instead.
Andreas Arnez [Fri, 13 Mar 2020 16:18:55 +0000 (17:18 +0100)]
s390x: Mark VRs as clobbered by helper calls
According to the s390x ABI, all vector registers are call-clobbered
(except for their portions that overlap with the call-saved FPRs). But
the s390x backend doesn't mark them as such when determining the register
usage of helper call insns.
Fix this in s390_insn_get_reg_usage when handling S390_INSN_HELPER_CALL.
Julian Seward [Mon, 9 Mar 2020 08:22:31 +0000 (09:22 +0100)]
Bug 415136 - ARMv8.1 Compare-and-Swap instructions are not supported. (TEST CASES).
This commit provides test cases for ARMv8.1 CAS instructions, support for
which was added in the previous commit.
Patch by Assad Hashmi <assad.hashmi@linaro.org>.
Julian Seward [Mon, 9 Mar 2020 08:18:09 +0000 (09:18 +0100)]
Bug 415136 - ARMv8.1 Compare-and-Swap instructions are not supported.
This commit implements ARMv8.1 CAS instructions. It does not contain
test cases; those will be in a subsequent commit.
Patch by Assad Hashmi <assad.hashmi@linaro.org>.
Andreas Arnez [Mon, 2 Mar 2020 15:22:59 +0000 (16:22 +0100)]
Bug 418435 - s390x: Avoid extra value dependency in CLC implementation
The test memcheck/tests/memcmp currently fails on s390x because it yields
the expected "conditional jump or move depends on uninitialised value(s)"
message twice instead of just once.
This is caused by the handling of the s390x instruction CLC, see
s390_irgen_CLC_EX(). When comparing two bytes from the two input strings,
the implementation uses the comparison result for a conditional branch to
the next instruction. But if no further bytes need to be compared, the
comparison result is also used for generating the resulting condition
code.
There are two cases: Either the inputs are equal; then the resulting
condition code is zero. This is what happens in the memcmp test case. Or
the inputs are different; then the resulting condition code is 1 or 2 if
the first or second operand is greater, respectively.
At least in the first case it is easy to avoid the additional dependency,
by clearing the condition code explicitly. Just do this.
Mark Wielaard [Fri, 28 Feb 2020 12:36:31 +0000 (13:36 +0100)]
Add 32bit time64 syscalls for arm, mips32, ppc32 and x86.
This patch adds sycall wrappers for the following syscalls which
use a 64bit time_t on 32bit arches: gettime64, settime64,
clock_getres_time64, clock_nanosleep_time64, timer_gettime64,
timer_settime64, timerfd_gettime64, timerfd_settime64,
utimensat_time64, pselect6_time64, ppoll_time64, recvmmsg_time64,
mq_timedsend_time64, mq_timedreceive_time64, semtimedop_time64,
rt_sigtimedwait_time64, futex_time64 and sched_rr_get_interval_time64.
Still missing are clock_adjtime64 and io_pgetevents_time64.
For the more complicated syscalls futex[_time64], pselect6[_time64]
and ppoll[_time64] there are shared pre and/or post helper functions.
Other functions just have their own PRE and POST handler.
Note that the vki_timespec64 struct really is the struct as used by
by glibc (it internally translates a 32bit timespec struct to a 64bit
timespec64 struct before passing it to any of the time64 syscalls).
The kernel uses a 64-bit signed int, but is ignoring the upper 32 bits
of the tv_nsec field. It does always write the full struct though.
So avoid checking the padding is only needed for PRE_MEM_READ.
There are two helper pre_read_timespec64 and pre_read_itimerspec64
to check the new structs.
Mark Wielaard [Wed, 4 Mar 2020 13:23:37 +0000 (14:23 +0100)]
Add suppressions for glibc DTV leaks
The glibc DTV (Dynamic Thread Vector) for the main thread is never
released, not even through __libc_freeres. This causes it to always
show up as a reachable block when used, and sometimes, when it is
extended and then reduced, as a possible leak when memcheck cannot
find a pointer to the start of the block.
Improve line info tracing, in particular when using lto.
With gcc 9 and --enable-lto, we now have spurious warnings telling
that the line information in the debug info has huge line numbers,
greater than the (valgrind) maximum of 2^20.
These spurious warnings make that all tests are failing.
This change modifies the tracing/debugging of the line info to:
* disable by default the warning for line info greater than 2^20.
When using -d, such warnings are however still shown (once).
* allow to see all such warnings, when using at least -d -d -d -d
Allow valgrind to find debug info in a 'usr merge' setup.
On ubuntu 19.10, valgrind fails telling that it cannot find
the mandatory redirection for strlen in ld-linux-x86-64.so.2.
This is due to /bin being a symlink to usr/bin: ld is found
in /usr/lib/x86_64-linux-gnu/ld-2.30.so
but its debug info is
in /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.30.so
Without this patch, valgrind searches the debug info (a.o.)
in /usr/lib/debug/usr/lib/x86_64-linux-gnu/ld-2.30.so
so using the concatenation of /usr/lib/debug
and /usr/lib/x86_64-linux-gnu/ld-2.30.so,
but the debug info is located at the concatenation of
/usr/lib/debug and /lib/x86_64-linux-gnu/ld-2.30.so
(so without the leading /usr).
Modify the debug info search so as to try with and without the /usr.
Patch derived from the patch done by Mathieu Trudel-Lapierre
to solve https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1808508
Andreas Arnez [Thu, 27 Feb 2020 14:52:53 +0000 (15:52 +0100)]
s390x: Add CPU model for z15
Make the z15 CPU models known to Valgrind. Add test case output for z15
to the "ecag" test. Also ensure that the facility bits for CPU facilities
unsupported by Valgrind are unset, particularly for the new
deflate-conversion facility.
mips: Fix linking errors for none/tests/mips[32|64]/msa_fpu
Some older toolchains (e.g. Codescape GNU Tools 2016.05-03 for MIPS
MTI Linux 4.9.2) require explicit inclusion of the "math" library in
order to link to the fpclassify() function.
Andreas Arnez [Wed, 26 Feb 2020 16:46:45 +0000 (17:46 +0100)]
s390x: Fix possible false positives with mul-z14 test case
The output of the tests for msrkc and msgrkc in "none/tests/s390x/mul-z14"
can differ from the expected output, because it depends on undetermined
data. The test always prints the register pair r2/r3, but the
instructions msrkc and msgrkc only write to r2, and msrkc even affects
only its lowest half.
Fix the undetermined output by initializing r2 and r3 with zero first.
Andreas Arnez [Wed, 5 Feb 2020 18:28:53 +0000 (19:28 +0100)]
s390x: Exploit LOCGHI for converting from CC to Int1
Whenever converting a condition code to a Boolean value, the current
implementation in s390_insn_cc2bool_emit() generates six instructions
including "insert program mask" (IPM). On systems with the
load/store-on-condition facility 2, this can be done in two instructions
instead, using "load halfword immediate on condition" (LOCGHI).
Add the new hardware capability VEX_HWCAPS_S390X_LSC2 and the respective
macro s390_host_has_lsc2. In s390_insn_cc2bool_emit(), check for the
facility and exploit it if available.
A conditional move from an immediate value can be slightly improved with
LOCGHI as well, so do that in s390_insn_cond_move_emit() if possible.
Andreas Arnez [Tue, 25 Feb 2020 14:54:46 +0000 (15:54 +0100)]
s390x: Replace use of deprecated Iop_Clz64 operator
The operator Iop_Clz64 has been deprecated. Drop it in the s390x backend
and replace it by Iop_ClzNat64. Previously s390_irgen_FLOGR() handled the
value zero specially and replaced it by 1 before applying Iop_Clz64. With
Iop_ClzNat64 this is no longer needed, so remove this special-case
handling.
Carl Love [Fri, 21 Feb 2020 23:22:26 +0000 (17:22 -0600)]
PPC64, fix for alignment of the rt_sigframe data structure.
The PPC64 implementation checks that the data structure is aligned. The
changes in commit listed below breaks the alignment. This patch adds an
explicit alignment directive to ensure the data structure is allocated
with the required alignment. This fixes 31 stderr failures, 10 stdout
failures on the Power 7, Power 8 and Power 9 platforms.
Tom Hughes [Thu, 20 Feb 2020 09:14:24 +0000 (09:14 +0000)]
Allow clone with CLONE_VFORK and no CLONE_VM
The CLONE_VFORK flag causes the parent to suspend until the child
exits or execs so without the memory sharing CLONE_VM would give
this is really closer to fork but we convert vfork to fork by
removing CLONE_VM anyway so there is no reason not to allow this.
Andreas Arnez [Wed, 12 Feb 2020 13:13:55 +0000 (14:13 +0100)]
Bug 417452 - s390x: Force 12-bit amode for vector stores in isel
It was seen that the s390 instruction selector chose a wrong addressing
mode for storing a vector register. The VST instruction only handles
short (12-bit unsigned) displacements, but a long (20-bit signed)
displacement was generated instead, resulting in a panic:
vex: the `impossible' happened:
s390_insn_store_emit: unknown dst->tag for HRcVec128
The fix prevents long displacements for vector store operations. It also
optimizes vector store operations from an Iex_Get, by converting them to a
memory copy. This optimization was already performed for integer
registers.
Andreas Arnez [Mon, 10 Feb 2020 12:37:03 +0000 (13:37 +0100)]
s390x: Fix printing of virtual register numbers
As noticed by Julian Seward, the code for printing s390x register names
currently does not show the virtual register numbers correctly. Although
it distinguishes between virtual and real registers, it uses the hardware
register number for both cases. This is fixed.
Andreas Arnez [Thu, 16 Jan 2020 12:49:10 +0000 (13:49 +0100)]
Bug 416301 - s390x: Support "compare and signal" instructions
Add VEX support for the s390x "compare and signal" instructions KEBR,
KDBR, KXBR, KEB, and KDB. For now, let them behave exactly like their
non-signalling counterparts. Enhance the bfp-4 test case to cover these
instructions as well. Update the list of supported instructions in
s390-opcodes.csv. Add a disclaimer to README.s390, explaining that FP
signalling is not handled accurately on s390x at the moment.
Khem Raj [Tue, 28 Jan 2020 03:50:04 +0000 (19:50 -0800)]
drd/tests/pth_detached3: Make pthread_detach() call portable across platforms
pthread_t is opaque type therefore we can not apply simple arithmetic to
variables of pthread_t type this test needs to pass a invalid pthread_t
handle, typcasting to uintptr_t works too and is portable across glibc and
musl
Fixes
| pth_detached3.c:24:25: error: invalid use of undefined type 'struct __pthread'
| 24 | pthread_detach(thread + 8);
| | ^
[ bvanassche: reformatted patch description and fixed up line numbers ]
Rhys Kidd [Tue, 28 Jan 2020 08:33:03 +0000 (19:33 +1100)]
Fix non-glibc build of the test suite with s390x_features
s390x_features is built unconditionally on a range of platforms, accordingly
any non-portable or glibc-specific functionality must be guarded.
Fixes error reported when running 'make check' or 'make regtest' on a platform
with an alternative libc that Valgrind supports, in this case Apple's libc:
s390x_features.c:13:10: fatal error: 'features.h' file not found
#include <features.h> // __GLIBC_PREREQ
^
1 error generated.
Fixes: 161d22f0a ("s390x: Fix vector facility (vx) check in test suite")
Mark Wielaard [Sat, 25 Jan 2020 17:34:58 +0000 (18:34 +0100)]
x86 and amd64 tests: Use .text and .previous around all top-level asm.
GCC10 defaults to -fno-common which exposes some latent bugs in
some of the top-level asm code in various .c test files. Some of the
tests started to segfault (even if not run under valgrind). Such code
needs to be wrapped inside a .text and a .previous asm statement to
make sure the code is generated in the .text code section and to
make sure the compiler doesn't lose track of the section currently
being used to generate data or code in. Without it code might be
generated inside a data section or the other way around.
Mark Wielaard [Sat, 25 Jan 2020 16:44:43 +0000 (17:44 +0100)]
Revert accidentially added changes in commit ce094ba912
These changes were part of my local testing of bug 416667
gcc10 ppc64le impossible constraint in 'asm' in test_isa
And shouldn't have been committed yet before review.
Mark Wielaard [Fri, 24 Jan 2020 10:26:25 +0000 (11:26 +0100)]
Fix tests/x86/incdec_alt.c asm for GCC10.
Thanks to Jakub Jelinek. The test is broken. It blindly assumes the
toplevel inline asm is placed into some sensible section, but that is
a wrong assumption. The right thing is to start the inline asm with
.text directive and end with .previous. The reason gcc 10 breaks it
is the -fno-common default, the int r1, ... vars are emitted into .bss
section and that is the section that is current when the inline asm is
emitted previously they were in .common at the end of the assembly file.
Mark Wielaard [Thu, 23 Jan 2020 20:30:59 +0000 (21:30 +0100)]
Fix GCC10 issue in guest_s390_defs.h typedef enum type s390x_vec_op_t.
GCC10 defaults to -fno-common which produces this error:
guest_s390_defs.h:291: multiple definition of `s390x_vec_op_t
This is because GCC10 detects there are multiple definitions of the
variable s390x_vec_op_t. We don't want to define a variable though.
We had wanted to define a type (one that currently isn't used).
Fix this by making it a typedef enum.
Guard withinEpsOf[FD] within none/tests/mips32/msa_fpu.c
Enclose the recently introduced functions with preprocessor guards,
much like the rest of the code is inside the main function.
Also mark the functions as static.
Minor code formatting.
mips64: rework math tests to take into account allowed approximation
Change the math tests to check whether the results are approximate to the
expected values instead of checking for exact matches since the calculations
in question are allowed to be approximate.
This fixes
/none/tests/mips64/test_math and
/none/tests/mips64/msa_fpu
This might happen when the source contains something like
if (something_involving_pcmpxstrx && foo) { .. }
which might use amd64g_dirtyhelper_PCMPxSTRx.
mips: Fix return from syscall mechanism for nanoMIPS
- Restore guest sigmask in VG_(sigframe_destroy)
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in VG_(nanomips_linux_SUBST_FOR_rt_sigreturn)
- Call ML_(fixup_guest_state_to_restart_syscall) from PRE(sys_rt_sigreturn)
- Tiny code refactor of sigframe-nanomips-linux.c
mips: Fix UASWM and UALWM instructions for nanoMIPS
UASWM and UALWM have not been implemented correctly.
Code used to implement SWM and LWM has been reused without making all of
the required adjustments.
During a save (push) instruction adjusting the SP is required before doing
a store, otherwise Memcheck reports warning because of a write operation
outside of the stack area.
Petar Jovanovic [Tue, 14 Jan 2020 09:31:48 +0000 (09:31 +0000)]
mips: Fix clone syscall for nanoMIPS
- Reset syscall return register (a0) in clone_new_thread()
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in ML_ (call_on_new_stack_0_1)()
- Optimize stack usage in ML_ (call_on_new_stack_0_1)()
- Code refactor of ML_ (call_on_new_stack_0_1)()
It partially fixes all tests which use clone system call, e.g. none/tests/pth_atfork1.
Julian Seward [Thu, 2 Jan 2020 08:32:19 +0000 (09:32 +0100)]
amd64 insn selector: improved handling of Or1/And1 trees.
This splits function iselCondCode into iselCondCode_C and iselCondCode_R, the
former of which is the old one that computes boolean expressions into an amd64
condition code, but the latter being new, and computes boolean expressions
into the lowest bit of an integer register. This enables much better code
generation for Or1/And1 trees, which now result quite commonly from the new
&&-recovery machinery in the front end.
Julian Seward [Thu, 2 Jan 2020 08:23:46 +0000 (09:23 +0100)]
amd64 back end: generate 32-bit shift instructions for 32-bit IR shifts.
Until now these have been handled by possibly widening the value to 64 bits,
if necessary, followed by a 64-bit shift. That wastes instructions and code
space.
Julian Seward [Thu, 2 Jan 2020 08:10:06 +0000 (09:10 +0100)]
Enable expensive handling of CmpEQ64/CmpNE64 for amd64 by default.
This has unfortunately become necessary because optimising compilers are
generating 64-bit equality comparisons on partially defined values on this
target. There will shortly be two followup commits which partially mitigate
the resulting performance loss.
Julian Seward [Sun, 15 Dec 2019 19:14:37 +0000 (20:14 +0100)]
'grail' fixes for MIPS:
This isn't a good result. It merely disables the new functionality on MIPS
because enabling it causes segfaults, even with --tool=none, the cause of
which are not obvious. It is only chasing through conditional branches that
is disabled, though. Chasing through unconditional branches (jumps and calls
to known destinations) is still enabled.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on MIPS.
Julian Seward [Sun, 1 Dec 2019 06:01:20 +0000 (07:01 +0100)]
'grail' fixes for s390x:
This isn't a good result. It merely disables the new functionality on s390x,
for the reason stated below.
* guest_generic_bb_to_IR.c bb_to_IR(): Disable, hopefully temporarily, the key
&&-recovery transformation on s390x, since it causes Memcheck to crash for
reasons I couldn't figure out. It also exposes some missing Iex_ITE cases
in the s390x insn selector, although those shouldn't be a big deal to fix.
Maybe it's some strangeness to do with the s390x "ex" instruction. I don't
exactly understand how that trickery works, but from some study of it, I
didn't see anything obviously wrong.
It is only chasing through conditional branches that is disabled for s390x.
Chasing through unconditional branches (jumps and calls to known
destinations) is still enabled.
* host_s390_isel.c s390_isel_cc(): No functional change. Code has been added
here to handle the new Iop_And1 and Iop_Or1, and it is somewhat tested, but
is not needed until conditional branch chasing is enabled on s390x.
Julian Seward [Thu, 21 Nov 2019 19:03:47 +0000 (20:03 +0100)]
Tidy up ir_opt.c aspects relating to the 'grail' work. In particular:
* Rewrite do_minimal_initial_iropt_BB so it doesn't do full constant folding;
that is unnecessary expense at this point, and later passes will do it
anyway
* do_iropt_BB: don't flatten the incoming block, because
do_minimal_initial_iropt_BB will have run earlier and done so. But at least
for the moment, assert that it really is flat.
* VEX/priv/guest_generic_bb_to_IR.c create_self_checks_as_needed: generate
flat IR so as not to fail the abovementioned assertion.
I believe this completes the target-independent aspects of this work, and also
the x86_64 specifics (of which there are very few).
Julian Seward [Mon, 18 Nov 2019 18:12:49 +0000 (19:12 +0100)]
Rationalise --vex-guest* flags in the new IRSB construction framework
* removes --vex-guest-chase-cond=no|yes. This was never used in practice.
* rename --vex-guest-chase-thresh=<0..99> to --vex-guest-chase=no|yes. In
otherwords, downgrade it from a numeric flag to a boolean one, that can
simply disable all chasing if required. (Some tools, notably Callgrind,
force-disable block chasing, so this functionality at least needs to be
retained).