mips: treat delay slot as part of the previous instruction
Do so by recursively calling disInstr_MIPS_WRK() if the instruction
currently being disassembled is a branch/jump, effectively combining them
into one IR instruction.
A notable change is that the branch/jump + delay slot combination now forms
an eight-byte instruction.
It only ever worked on x86 and amd64, and even on those it had a high false
positive rate and was slow. Everything it does, ASan can do faster, better,
and on more architectures. So there's no reason to keep this tool any more.
drd/drd_pthread_intercepts: Add a workaround for what is probably a compiler bug
Without this patch drd produces incorrect output for some test cases. It
seems like without this patch an incorrect value is passed as the sixth
argument of VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__POST_SEM_OPEN, ...):
$ ./vg-in-place --tool=drd --traemaphore=yes drd/tests/sem_open -m -p
drd, a thread error detector
Copyright (C) 2006-2017, and GNU GPL'd, by Bart Van Assche.
Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
Command: drd/tests/sem_open -m -p
[1] sem_open 0x4029000 name /drd-sem-open-test-27725 oflag 0xc0 mode 0600 value 0
s_d1 = 1 (should be 1)
[2] sem_wait 0x4029000 value 0 -> 4294967295
Thread 2:
Invalid semaphore: semaphore 0x4029000
at 0x484ADC7: sem_wait_intercept (drd_pthread_intercepts.c:1436)
by 0x484ADC7: sem_wait@* (drd_pthread_intercepts.c:1441)
by 0x4014A9: thread_func (sem_open.c:114)
by 0x483FEA6: vgDrd_thread_wrapper (drd_pthread_intercepts.c:449)
by 0x4886EF9: start_thread (in /lib64/libpthread-2.31.so)
by 0x499F3BE: clone (in /lib64/libc-2.31.so)
semaphore 0x4029000 was first observed at:
at 0x484A395: sem_open_intercept (drd_pthread_intercepts.c:1403)
by 0x484A395: sem_open (drd_pthread_intercepts.c:1409)
by 0x4012CE: main (sem_open.c:63)
[2] sem_post 0x4029000 value 4294967295 -> 0
[1] sem_wait 0x4029000 value 0 -> 4294967295
Thread 1:
Invalid semaphore: semaphore 0x4029000
at 0x484ADC7: sem_wait_intercept (drd_pthread_intercepts.c:1436)
by 0x484ADC7: sem_wait@* (drd_pthread_intercepts.c:1441)
by 0x40139D: main (sem_open.c:90)
semaphore 0x4029000 was first observed at:
at 0x484A395: sem_open_intercept (drd_pthread_intercepts.c:1403)
by 0x484A395: sem_open (drd_pthread_intercepts.c:1409)
by 0x4012CE: main (sem_open.c:63)
Conflicting load by thread 1 at 0x00404108 size 8
at 0x40139E: main (sem_open.c:91)
Allocation context: BSS section of /home/bart/software/valgrind.git/drd/tests/sem_open
Other segment start (thread 2)
(thread finished, call stack no longer available)
Other segment end (thread 2)
(thread finished, call stack no longer available)
Conflicting store by thread 1 at 0x00404108 size 8
at 0x4013B2: main (sem_open.c:91)
Allocation context: BSS section of /home/bart/software/valgrind.git/drd/tests/sem_open
Other segment start (thread 2)
(thread finished, call stack no longer available)
Other segment end (thread 2)
(thread finished, call stack no longer available)
[1] sem_post 0x4029000 value 4294967295 -> 0
s_d2 = 2 (should be 2)
s_d3 = 5 (should be 5)
[1] sem_close 0x4029000 value 0
For lists of detected and suppressed errors, rerun with: -s
ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 18 from 8)
The test code in drd/tests/trylock.c attempts to write-lock a POSIX rwlock
twice. The code expects the second attempt to return an error, but POSIX
doesn't require that behaviour, and FreeBSD's implementation deadlocks
instead.
See also https://bugs.kde.org/show_bug.cgi?id=403212
Andreas Arnez [Fri, 3 Apr 2020 17:16:01 +0000 (19:16 +0200)]
s390x: Drop spurious register moves in CDAS instruction selector
The s390x instruction selector for Ist_CAS, in its handling of "compare
double and swap", adds spurious register moves after the CDAS operation
itself. These moves overwrite registers returned by calls to
s390_isel_int_expr(), potentially causing corruption of temp values.
Andreas Arnez [Thu, 2 Apr 2020 18:40:02 +0000 (20:40 +0200)]
s390x: Fix Iex_Load instruction selectors for F128/D128 types
The s390x instruction selectors for Iex_Load of Ity_F128 and Ity_D128
types had a common typo that would lead to crashes when used. So far this
bug didn't surface because Iex_Load is not emitted on s390x with these
types.
Andreas Arnez [Thu, 2 Apr 2020 16:00:13 +0000 (18:00 +0200)]
s390x: Introduce and exploit new ALU operator S390_ALU_ILIH
The handlers of Iop_8HLto16, Iop16HLto32, and Iop_32HLto64 in
s390_isel_int_wrk() yield a sequence of "shift", "and", and "or" ALU
operations, the second of which modifies a register returned from a call
to s390_isel_int_expr(). While this approach does not lead to wrong code
generation (because only the register's upper bits are changed which are
not relevant to the IR type), it violates the general "no-modify" rule.
Replace this sequence of ALU operations by a single ALU operation
S390_ALU_ILIH that inserts the low half of its second operand into the
high half of its first operand. Use the z/Architecture instruction
RISBG ("rotate then insert selected bits") for implementating it.
Andreas Arnez [Wed, 18 Mar 2020 17:59:15 +0000 (18:59 +0100)]
s390x: Drop register arg to s390_isel_int1_expr()
Restructure the interface of s390_isel_int1_expr() such that no
destination register is passed to it any more. Adjust all its callers
accordingly. Ensure that callers never modify the returned register, but
make a copy and modify that instead.
Andreas Arnez [Tue, 11 Feb 2020 17:02:38 +0000 (18:02 +0100)]
s390x: Activate "grail"
Now that the known problems with activating "grail" on s390x have been
fixed, there is no need to disable it for s390x guests any more. Remove
the appropriate check in "guest_generic_bb_to_IR.c".
Andreas Arnez [Thu, 19 Mar 2020 16:35:55 +0000 (17:35 +0100)]
Bug 418997 - s390x: Support Iex_ITE for float and vector expressions
The s390x backend supports Iex_ITE expressions for integer types I8, I16,
I32, and I64 only. But "grail" can now generate such expressions for
guarding any kind of Ist_Put statements; see add_guarded_stmt_to_end_of()
in "guest_generic_bb_to_IR.c". On s390x this means that F64 and V128 can
occur as well, in which case a crash would result. And such crashes are
actually seen when running the test suite with "grail" enabled.
Extend Iex_ITE support to the floating-point types F32 and F64 and to the
vector type V128. Do this by extending S390_INSN_COND_MOVE as needed.
Andreas Arnez [Tue, 3 Mar 2020 15:42:04 +0000 (16:42 +0100)]
s390x: Add directReload function for the register allocator
This adds the function directReload_S390() and wires it up to the register
allocator, enabling "direct reloading" for various types of instructions.
Direct reloading, when applied, avoids loading an operand into a register
and thus may reduce spilling. On s390x this slightly reduces the
generated code.
In order to determine which instructions are relevant for direct
reloading, it was tested which direct reloads the register allocator tries
to perform in simple programs.
Andreas Arnez [Wed, 18 Mar 2020 11:24:25 +0000 (12:24 +0100)]
Bug 417281 - s390x: Fix register usage of conditional moves
The s390x register usage callback marks the target register of a
conditional move as HRmWrite only. It fails to mention the fact that the
target register is also an input to the insn (unless the condition is
"never" or "always").
This was discovered while investigating "grail" failures on s390x and
fixes the majority of them.
Andreas Arnez [Fri, 13 Mar 2020 16:20:20 +0000 (17:20 +0100)]
s390x: Actually use "load on condition" for conditional moves
Although the implementation of the cond_move insn is prepared to emit
"load on condition" instructions, it doesn't, because of a reversed check.
The check is supposed to prevent emitting LOCx instructions when the
condition code mask is set to "always", but it's accidentally negated.
Fix the reversal of the check, so LOCx instructions are actually emitted
when applicable.
Andreas Arnez [Mon, 9 Mar 2020 16:26:26 +0000 (17:26 +0100)]
s390x: Mark register usage with HRmModify when applicable
Instead of marking register usage for the same register with HRmRead and
HRmWrite separately, use HRmModify instead. This makes the code a bit
easier to read.
Andreas Arnez [Mon, 9 Mar 2020 14:14:16 +0000 (15:14 +0100)]
s390x: Enable 1- and 2-byte operands for v-test
The v-test operation tests its operand against zero and sets the condition
code accordingly. So far the operation was only supported for 4- and
8-byte operands.
Lift this restriction and enable 1- and 2-byte operands for v-test, using
the z/Architecture "test under mask" instructions TM, TMY, and TMLL.
Exploit this in the instruction selector, getting rid of the conversion to
a 4-byte operand. This slightly reduces the generated code on s390x.
Andreas Arnez [Wed, 5 Feb 2020 17:18:49 +0000 (18:18 +0100)]
s390x: Support And1/Or1, improve handling of Int1 expressions
This provides an instruction selector for Int1-expressions that supports
And1 and Or1. This implementation tries to keep values in registers as
much as possible, to avoid too many conversions from a Boolean value to a
condition code or vice versa. To this end, the new function
s390_isel_int1_expr() is added, which handles bit-typed expressions that
are supposed to end up in a register.
Also change the representation of Int1 values in registers and always
sign-extend them to 64 bits.
Andreas Arnez [Tue, 10 Mar 2020 16:18:48 +0000 (17:18 +0100)]
s390x: Fix down-cast from memory operand with size < 8
A down-cast always copies 8 bytes from the source operand, even if the
operand is actually smaller. This doesn't matter for register operands,
but it does for memory operands. Fix this and copy the correct number of
bytes instead.
Andreas Arnez [Fri, 13 Mar 2020 16:18:55 +0000 (17:18 +0100)]
s390x: Mark VRs as clobbered by helper calls
According to the s390x ABI, all vector registers are call-clobbered
(except for their portions that overlap with the call-saved FPRs). But
the s390x backend doesn't mark them as such when determining the register
usage of helper call insns.
Fix this in s390_insn_get_reg_usage when handling S390_INSN_HELPER_CALL.
Julian Seward [Mon, 9 Mar 2020 08:22:31 +0000 (09:22 +0100)]
Bug 415136 - ARMv8.1 Compare-and-Swap instructions are not supported. (TEST CASES).
This commit provides test cases for ARMv8.1 CAS instructions, support for
which was added in the previous commit.
Patch by Assad Hashmi <assad.hashmi@linaro.org>.
Julian Seward [Mon, 9 Mar 2020 08:18:09 +0000 (09:18 +0100)]
Bug 415136 - ARMv8.1 Compare-and-Swap instructions are not supported.
This commit implements ARMv8.1 CAS instructions. It does not contain
test cases; those will be in a subsequent commit.
Patch by Assad Hashmi <assad.hashmi@linaro.org>.
Andreas Arnez [Mon, 2 Mar 2020 15:22:59 +0000 (16:22 +0100)]
Bug 418435 - s390x: Avoid extra value dependency in CLC implementation
The test memcheck/tests/memcmp currently fails on s390x because it yields
the expected "conditional jump or move depends on uninitialised value(s)"
message twice instead of just once.
This is caused by the handling of the s390x instruction CLC, see
s390_irgen_CLC_EX(). When comparing two bytes from the two input strings,
the implementation uses the comparison result for a conditional branch to
the next instruction. But if no further bytes need to be compared, the
comparison result is also used for generating the resulting condition
code.
There are two cases: Either the inputs are equal; then the resulting
condition code is zero. This is what happens in the memcmp test case. Or
the inputs are different; then the resulting condition code is 1 or 2 if
the first or second operand is greater, respectively.
At least in the first case it is easy to avoid the additional dependency,
by clearing the condition code explicitly. Just do this.
Mark Wielaard [Fri, 28 Feb 2020 12:36:31 +0000 (13:36 +0100)]
Add 32bit time64 syscalls for arm, mips32, ppc32 and x86.
This patch adds sycall wrappers for the following syscalls which
use a 64bit time_t on 32bit arches: gettime64, settime64,
clock_getres_time64, clock_nanosleep_time64, timer_gettime64,
timer_settime64, timerfd_gettime64, timerfd_settime64,
utimensat_time64, pselect6_time64, ppoll_time64, recvmmsg_time64,
mq_timedsend_time64, mq_timedreceive_time64, semtimedop_time64,
rt_sigtimedwait_time64, futex_time64 and sched_rr_get_interval_time64.
Still missing are clock_adjtime64 and io_pgetevents_time64.
For the more complicated syscalls futex[_time64], pselect6[_time64]
and ppoll[_time64] there are shared pre and/or post helper functions.
Other functions just have their own PRE and POST handler.
Note that the vki_timespec64 struct really is the struct as used by
by glibc (it internally translates a 32bit timespec struct to a 64bit
timespec64 struct before passing it to any of the time64 syscalls).
The kernel uses a 64-bit signed int, but is ignoring the upper 32 bits
of the tv_nsec field. It does always write the full struct though.
So avoid checking the padding is only needed for PRE_MEM_READ.
There are two helper pre_read_timespec64 and pre_read_itimerspec64
to check the new structs.
Mark Wielaard [Wed, 4 Mar 2020 13:23:37 +0000 (14:23 +0100)]
Add suppressions for glibc DTV leaks
The glibc DTV (Dynamic Thread Vector) for the main thread is never
released, not even through __libc_freeres. This causes it to always
show up as a reachable block when used, and sometimes, when it is
extended and then reduced, as a possible leak when memcheck cannot
find a pointer to the start of the block.
Improve line info tracing, in particular when using lto.
With gcc 9 and --enable-lto, we now have spurious warnings telling
that the line information in the debug info has huge line numbers,
greater than the (valgrind) maximum of 2^20.
These spurious warnings make that all tests are failing.
This change modifies the tracing/debugging of the line info to:
* disable by default the warning for line info greater than 2^20.
When using -d, such warnings are however still shown (once).
* allow to see all such warnings, when using at least -d -d -d -d
Allow valgrind to find debug info in a 'usr merge' setup.
On ubuntu 19.10, valgrind fails telling that it cannot find
the mandatory redirection for strlen in ld-linux-x86-64.so.2.
This is due to /bin being a symlink to usr/bin: ld is found
in /usr/lib/x86_64-linux-gnu/ld-2.30.so
but its debug info is
in /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.30.so
Without this patch, valgrind searches the debug info (a.o.)
in /usr/lib/debug/usr/lib/x86_64-linux-gnu/ld-2.30.so
so using the concatenation of /usr/lib/debug
and /usr/lib/x86_64-linux-gnu/ld-2.30.so,
but the debug info is located at the concatenation of
/usr/lib/debug and /lib/x86_64-linux-gnu/ld-2.30.so
(so without the leading /usr).
Modify the debug info search so as to try with and without the /usr.
Patch derived from the patch done by Mathieu Trudel-Lapierre
to solve https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1808508
Andreas Arnez [Thu, 27 Feb 2020 14:52:53 +0000 (15:52 +0100)]
s390x: Add CPU model for z15
Make the z15 CPU models known to Valgrind. Add test case output for z15
to the "ecag" test. Also ensure that the facility bits for CPU facilities
unsupported by Valgrind are unset, particularly for the new
deflate-conversion facility.
mips: Fix linking errors for none/tests/mips[32|64]/msa_fpu
Some older toolchains (e.g. Codescape GNU Tools 2016.05-03 for MIPS
MTI Linux 4.9.2) require explicit inclusion of the "math" library in
order to link to the fpclassify() function.
Andreas Arnez [Wed, 26 Feb 2020 16:46:45 +0000 (17:46 +0100)]
s390x: Fix possible false positives with mul-z14 test case
The output of the tests for msrkc and msgrkc in "none/tests/s390x/mul-z14"
can differ from the expected output, because it depends on undetermined
data. The test always prints the register pair r2/r3, but the
instructions msrkc and msgrkc only write to r2, and msrkc even affects
only its lowest half.
Fix the undetermined output by initializing r2 and r3 with zero first.
Andreas Arnez [Wed, 5 Feb 2020 18:28:53 +0000 (19:28 +0100)]
s390x: Exploit LOCGHI for converting from CC to Int1
Whenever converting a condition code to a Boolean value, the current
implementation in s390_insn_cc2bool_emit() generates six instructions
including "insert program mask" (IPM). On systems with the
load/store-on-condition facility 2, this can be done in two instructions
instead, using "load halfword immediate on condition" (LOCGHI).
Add the new hardware capability VEX_HWCAPS_S390X_LSC2 and the respective
macro s390_host_has_lsc2. In s390_insn_cc2bool_emit(), check for the
facility and exploit it if available.
A conditional move from an immediate value can be slightly improved with
LOCGHI as well, so do that in s390_insn_cond_move_emit() if possible.
Andreas Arnez [Tue, 25 Feb 2020 14:54:46 +0000 (15:54 +0100)]
s390x: Replace use of deprecated Iop_Clz64 operator
The operator Iop_Clz64 has been deprecated. Drop it in the s390x backend
and replace it by Iop_ClzNat64. Previously s390_irgen_FLOGR() handled the
value zero specially and replaced it by 1 before applying Iop_Clz64. With
Iop_ClzNat64 this is no longer needed, so remove this special-case
handling.
Carl Love [Fri, 21 Feb 2020 23:22:26 +0000 (17:22 -0600)]
PPC64, fix for alignment of the rt_sigframe data structure.
The PPC64 implementation checks that the data structure is aligned. The
changes in commit listed below breaks the alignment. This patch adds an
explicit alignment directive to ensure the data structure is allocated
with the required alignment. This fixes 31 stderr failures, 10 stdout
failures on the Power 7, Power 8 and Power 9 platforms.
Tom Hughes [Thu, 20 Feb 2020 09:14:24 +0000 (09:14 +0000)]
Allow clone with CLONE_VFORK and no CLONE_VM
The CLONE_VFORK flag causes the parent to suspend until the child
exits or execs so without the memory sharing CLONE_VM would give
this is really closer to fork but we convert vfork to fork by
removing CLONE_VM anyway so there is no reason not to allow this.
Andreas Arnez [Wed, 12 Feb 2020 13:13:55 +0000 (14:13 +0100)]
Bug 417452 - s390x: Force 12-bit amode for vector stores in isel
It was seen that the s390 instruction selector chose a wrong addressing
mode for storing a vector register. The VST instruction only handles
short (12-bit unsigned) displacements, but a long (20-bit signed)
displacement was generated instead, resulting in a panic:
vex: the `impossible' happened:
s390_insn_store_emit: unknown dst->tag for HRcVec128
The fix prevents long displacements for vector store operations. It also
optimizes vector store operations from an Iex_Get, by converting them to a
memory copy. This optimization was already performed for integer
registers.
Andreas Arnez [Mon, 10 Feb 2020 12:37:03 +0000 (13:37 +0100)]
s390x: Fix printing of virtual register numbers
As noticed by Julian Seward, the code for printing s390x register names
currently does not show the virtual register numbers correctly. Although
it distinguishes between virtual and real registers, it uses the hardware
register number for both cases. This is fixed.
Andreas Arnez [Thu, 16 Jan 2020 12:49:10 +0000 (13:49 +0100)]
Bug 416301 - s390x: Support "compare and signal" instructions
Add VEX support for the s390x "compare and signal" instructions KEBR,
KDBR, KXBR, KEB, and KDB. For now, let them behave exactly like their
non-signalling counterparts. Enhance the bfp-4 test case to cover these
instructions as well. Update the list of supported instructions in
s390-opcodes.csv. Add a disclaimer to README.s390, explaining that FP
signalling is not handled accurately on s390x at the moment.
Khem Raj [Tue, 28 Jan 2020 03:50:04 +0000 (19:50 -0800)]
drd/tests/pth_detached3: Make pthread_detach() call portable across platforms
pthread_t is opaque type therefore we can not apply simple arithmetic to
variables of pthread_t type this test needs to pass a invalid pthread_t
handle, typcasting to uintptr_t works too and is portable across glibc and
musl
Fixes
| pth_detached3.c:24:25: error: invalid use of undefined type 'struct __pthread'
| 24 | pthread_detach(thread + 8);
| | ^
[ bvanassche: reformatted patch description and fixed up line numbers ]
Rhys Kidd [Tue, 28 Jan 2020 08:33:03 +0000 (19:33 +1100)]
Fix non-glibc build of the test suite with s390x_features
s390x_features is built unconditionally on a range of platforms, accordingly
any non-portable or glibc-specific functionality must be guarded.
Fixes error reported when running 'make check' or 'make regtest' on a platform
with an alternative libc that Valgrind supports, in this case Apple's libc:
s390x_features.c:13:10: fatal error: 'features.h' file not found
#include <features.h> // __GLIBC_PREREQ
^
1 error generated.
Fixes: 161d22f0a ("s390x: Fix vector facility (vx) check in test suite")
Mark Wielaard [Sat, 25 Jan 2020 17:34:58 +0000 (18:34 +0100)]
x86 and amd64 tests: Use .text and .previous around all top-level asm.
GCC10 defaults to -fno-common which exposes some latent bugs in
some of the top-level asm code in various .c test files. Some of the
tests started to segfault (even if not run under valgrind). Such code
needs to be wrapped inside a .text and a .previous asm statement to
make sure the code is generated in the .text code section and to
make sure the compiler doesn't lose track of the section currently
being used to generate data or code in. Without it code might be
generated inside a data section or the other way around.
Mark Wielaard [Sat, 25 Jan 2020 16:44:43 +0000 (17:44 +0100)]
Revert accidentially added changes in commit ce094ba912
These changes were part of my local testing of bug 416667
gcc10 ppc64le impossible constraint in 'asm' in test_isa
And shouldn't have been committed yet before review.
Mark Wielaard [Fri, 24 Jan 2020 10:26:25 +0000 (11:26 +0100)]
Fix tests/x86/incdec_alt.c asm for GCC10.
Thanks to Jakub Jelinek. The test is broken. It blindly assumes the
toplevel inline asm is placed into some sensible section, but that is
a wrong assumption. The right thing is to start the inline asm with
.text directive and end with .previous. The reason gcc 10 breaks it
is the -fno-common default, the int r1, ... vars are emitted into .bss
section and that is the section that is current when the inline asm is
emitted previously they were in .common at the end of the assembly file.
Mark Wielaard [Thu, 23 Jan 2020 20:30:59 +0000 (21:30 +0100)]
Fix GCC10 issue in guest_s390_defs.h typedef enum type s390x_vec_op_t.
GCC10 defaults to -fno-common which produces this error:
guest_s390_defs.h:291: multiple definition of `s390x_vec_op_t
This is because GCC10 detects there are multiple definitions of the
variable s390x_vec_op_t. We don't want to define a variable though.
We had wanted to define a type (one that currently isn't used).
Fix this by making it a typedef enum.
Guard withinEpsOf[FD] within none/tests/mips32/msa_fpu.c
Enclose the recently introduced functions with preprocessor guards,
much like the rest of the code is inside the main function.
Also mark the functions as static.
Minor code formatting.
mips64: rework math tests to take into account allowed approximation
Change the math tests to check whether the results are approximate to the
expected values instead of checking for exact matches since the calculations
in question are allowed to be approximate.
This fixes
/none/tests/mips64/test_math and
/none/tests/mips64/msa_fpu
This might happen when the source contains something like
if (something_involving_pcmpxstrx && foo) { .. }
which might use amd64g_dirtyhelper_PCMPxSTRx.
mips: Fix return from syscall mechanism for nanoMIPS
- Restore guest sigmask in VG_(sigframe_destroy)
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in VG_(nanomips_linux_SUBST_FOR_rt_sigreturn)
- Call ML_(fixup_guest_state_to_restart_syscall) from PRE(sys_rt_sigreturn)
- Tiny code refactor of sigframe-nanomips-linux.c
mips: Fix UASWM and UALWM instructions for nanoMIPS
UASWM and UALWM have not been implemented correctly.
Code used to implement SWM and LWM has been reused without making all of
the required adjustments.
During a save (push) instruction adjusting the SP is required before doing
a store, otherwise Memcheck reports warning because of a write operation
outside of the stack area.
Petar Jovanovic [Tue, 14 Jan 2020 09:31:48 +0000 (09:31 +0000)]
mips: Fix clone syscall for nanoMIPS
- Reset syscall return register (a0) in clone_new_thread()
- Use "syscall[32]" asm idiom instead of "syscall" with immediate parameter
in ML_ (call_on_new_stack_0_1)()
- Optimize stack usage in ML_ (call_on_new_stack_0_1)()
- Code refactor of ML_ (call_on_new_stack_0_1)()
It partially fixes all tests which use clone system call, e.g. none/tests/pth_atfork1.