git.ipfire.org Git - thirdparty/valgrind.git/log

s390x: Fix compile warnings in test cases

Some GCC versions emit the following warnings for some s390x-specific test
cases:

   warning: listing the stack pointer register '15' in a clobber list is
   deprecated

   warning: this 'else' clause does not
   guard... [-Wmisleading-indentation] ...this statement, but...

Fix these.

Most of inline assemblies declaring r15 as clobbered do not actually
change its value.  Only in stmg_wrap() it becomes necessary to save and
restore r15.

Hook up acct syscall for amd64, mips32, ppc32 and ppc64

There was already a generic linux wrapper for acct, but it was not
hooked up for all linux arches.

Ajust filter_gdb for arm64 with eglibc 2.19 and gdb 7.7.1

Older ubuntu arm64 setups used eglibc 2.19 and gdb 7.7.1. In that
case select.c could be under linux/generic and the select argument
list could be split up differently over several lines. Adjust
filter_gdb to catch those differences.

Also checked against an Debian arm64 with glibc 2.31 and gdb 10.1.

Add shell.stderr.exp-dash4 to none/tests/scripts/Makefile.am EXTRA_DIST

Extend filter_vgdb for GNU gdb (Debian 10.1-2) 10.1.90.20210103-git

On Debian 11.0 arm64 gdb will emit a similar (stray) ENOTTY message
as on SLES11, but for _exit.c instead of rtld.c.

Add none/tests/scripts/shell.stderr.exp-dash4 for dash 0.5.11

dash 0.5.11 produces slightly different error messagess.
The new exp file is similar to shell.stderr.exp-dash3 but
with the extra (second) "shell: " output removed.

Add generated glibc drd and helgrind files to .gitignore

fix compiler print format warnings in test_isa_3_0.c

GCC fixed the compiler warnings long long types. Add explicit
casts so gcc will not generate compile warnings.

Fix compiler warnings for subnormal_test.c

GCC fixed the compiler warnings long long types. Add explicit
casts so gcc will not generate compile warnings.

Fixes for the lxvx and stxvx instructions

The lxvx and stxvx tests are moved into their own separate
tests. Add the expec files for the new test.

Update the expected results for the altivec test.

Fix tests for mfspr

Split out the mfspr tests into a separate test using command line option
"-M". The value in the LR and CTR registers changed. It appears the
changes are due to changes in the test program jm-insns.c. Splinting
these instructions out will help to minimize the size of future updates
when the test program changes.

fix sraw, srawi, srad, sradi instructions

For ISA 3.0 and beyond, the instructions also write the XER register.

Split the instructions out to a new command line option so we can create
an ISA 2.07 expect file, ISA 3.0 LE and ISA 3.0 BE expect file. The new
command line option is "-s" to just run just these four instructions.

New test for the ISA 3.0 mcrxrx instruction.

Add new test.

Add support for the mcrxrx instruction.

The mcrxrx instruction was introduced in ISA 3.0. It was missed when the
ISA 3.0 support was added to Valgrind.

The mcrxr instruction is not supported on ISA 3.0 and beyond. The
instructions both do a move to the condition register however the mcrxrx
moves [OV|OV32|CA|CA32]. Where the mcrxr instruction moves XER[32:35]
(S0, OV, and CA bits) to the CR.

Fixes for mcrxr instruction

Add .machine directives to ensure the mcrxr instruction is assembled
for power 6. The instruction is not supported on later platforms.

Fix dfp tests.

Due to changes between the compiler and linker, we need to add .machine
arguments to configure file to properly detect the availability of the
dfp instructions.

Add print statement if HAS_DFP is not enabled to make it
easier to determine when HAS_DFP is not enabled.

Add .machine directives for the darn instruction

Fixes to add .machine directives for assembly instructions

powerpc: Add .machine directives for scv, copy, paste, cpabort instructions

GCC is no longer passing the "-many" flag to the assembler. So, the
inline assembly instructions statements need to use the .machine directives
for the specific platform.

(gcc commit e154242724b084380e3221df7c08fcdbd8460674 ; "[RS6000] Don't
pass -many to the assembler".

Hardware sync instruction (hwsync) added after the copy, paste and cpabort
instructions to improve the reliability of the test.

Configure,makefile and test case fixes for older powerpc targets.

Assorted changes to fix up compile issues as seen during regression
testing of VG on hardware back as far as Power 6 (ISA 2.05).

Add generated man pages to .gitignore

Remove VEX/nanoarm.orig file

configure.ac: Avoid the use of "which"

The "which" command is not always installed, but configure.ac uses it in
the function AC_HWCAP_CONTAINS_FLAG to force invocation of the executable
"true" rather than the shell builtin with the same name. (The point here
is to get LD_SHOW_AUXV=1 evaluated by the dynamic loader.)

Another option might be to hard-wire the location /bin/true, because the
filesystem hierarchy standard requires it to be there. However, the FHS
doesn't apply to BSDs and at least some FreeBSD versions do not stick to
that specific rule.

On the other hand, the "env" command seems to be available on all relevant
platforms, so use that instead.

Update libiberty demangler

Update the libiberty demangler using the auxprogs/update-demangler
script to gcc git commit b3585c0836e729bed56b9afd4292177673a25ca0.

This update includes:

- prevent null dereferencing on dlang_type
- prevent buffer overflow when decoding user input
- Add support for demangling local D template declarations
- Add support for demangling D function literals as template
value parameters
- Add support for D `typeof(*null)' types
- Fix -Wundef warnings in ansidecl.h
- Fix endian bug in rust demangler
- Adjust mangling of __alignof__
- Avoid -Wstringop-truncation

Update libiberty demangler to support Rust v0 name mangling

Update the libiberty demangler using the auxprogs/update-demangler
script to the gcc git 01d92cfd79872e4cffc78bf233bb9b767336beb8.
Updates rust demangling to support the new v0 mangling scheme.

This includes the following changes:
- Update the update-demangler script to use gcc git instead of svn.
- The result of running the updated script to get an updated
  demangler and resolving the merge conflicts.
- A change to long_namespace_xml.stderr.exp because two overly long
  symbols aren't demangled anymore, but just returned as is.
- an update to the m_demangle/demangle.c source to deal with Rust
  demangling in cp_demangle, which now directly demangles old and
  new style rust symbols.

NEWS: Add 442061 - very slow execution under Fedora 34 (readdwarf3)

readdwarf3: Introduce abbv_state to read .debug_abbrev more lazily

With the inline parser often a lot of DIEs are skipped, so reading
all abbrevs up front wastes time and memory. A lot of time and memory
can be saved by reading the abbrevs on demand. Do this by introducing
an abbv_state that is used to keep track of the abbrevs already read.
This does technically make the CUConst struct not const.

readdwarf3: Reuse abbrev if possible between units

Instead of destroying the ht_abbrvs after processing a CU save it
and the offset so it can be reused for the next CU if that happens
to have the same abbrev offset. dwz compressed DWARF often reuse
the same abbrev for multiple CUs.

readdwarf3: Immediately skip to end of CU when not parsing children

readdwarf3: Reuse fndn_ix_Table as much as possible

Both the var parser and the inl parser kept a fndn_ix_Table.
Initialize only one per debuginfo read pass and reuse if the stmt offset
is the same as last time (CUs can share the same line table and alt
files do share one for all units).

readdwarf3: Only read line table for units with addresses for inlined functions

When parsing DIEs for inlined functions, only read the line table for
units which can actually contain inlined_subroutines.

readdwarf3: Skip units without addresses when looking for inlined functions

When a unit doesn't cover any addresses skip it because no actual code
will be inside. Also use skip_DIE instead of read_DIE when not parsing
(skipping) children.

amd64: add spec rules for: S/NS after ADDL, S after ADDQ.

m_debuginfo: Handle DW_TAG_atomic_type

DW_TAG_atomic_type is a DWARF5 qualifier tag like DW_TAG_volatile_type,
DW_TAG_const_type and DW_TAG_restrict_type.

s390x: Fix 64-bit shift in s390_irgen_VSTRS

The function s390_irgen_VSTRS in guest_s390_toIR.c contains a shift
operation that is intended to yield a 64-bit number but uses 1UL instead
of 1ULL. This doesn't work on systems where 'unsigned long' is only 32
bits wide. Fix by replacing 1UL by 1ULL.

Remove deprecated regression tests for mftgpr and mffgpr.

The mftgpr and mffgpr instructions are deprecated.  Added comments in
VEX/priv/guest_ppc_toIR.c for the instructions stating the instructions
are deprecated.  Valgrind support can be removed if the opcodes get reused
in the future.  For now, leaving the functional support in Valgrind
for the instructions.

Removed the regression test power6_mf_gpr.c, expect files and vgtest file
from none/tests/ppc64.

Cleanup of none/tests/ppc64/Makefile.am

Fixing indentation and move the jm_insns_CFLAGS next to the other CFLAGS
definitions.

No functional changes.

Add getoff-arm64-linux to .gitignore

Update the expected output for test_isa_3_1_VRT.

The inputs into the vinsdlx,vinsdrx instructions changed as a result of
the impossible constraint issue fix. This patch updates the expected
results.

https://bugs.kde.org/show_bug.cgi?id=441534

Fix impossible constraint issue in P10 testcase.

This reworks the modulo operation as seen in
valgrind/none/tests/ppc64/test_isa_3_1_common.c:
initialize_source_registers().

Due to a GCC issue (PR101882), we will try to avoid a modulo operation with
both input and outputs set to a hard register. In this case, we can apply
the modulo operation to the args[] array value used to initialize the ra
value.

https://bugs.kde.org/show_bug.cgi?id=440906

Remove a unneeded / unnecessary prefix check.

The pstxvp instruction is valid for R=1, i.e. use pc relative addressing.
The test should have been remmoved before committing the ISA 3.1 support.

https://bugs.kde.org/show_bug.cgi?id=441512

s390x: Wrap up misc-insn-3 and vec-enh-2 support

Wrap up support for the miscellaneous-instruction-extensions facility 3
and the vector-enhancements facility 2: Add 'case' statements for the
remaining unhandled arch13 instructions to 'guest_s390_toIR.c', document
the new support in 's390-opcodes.csv', adjust 's390-check-opcodes.pl', and
announce the new feature in 'NEWS'.

s390x: Vec-enh-2, test cases

Add test cases for verifying the new/enhanced instructions in the
vector-enhancements facility 2. For "vector string search" VSTRS add a
memcheck test case.

s390x: Mark arch13 features as supported

Make the STFLE instruction report the miscellaneous-instruction-extensions
facility 3 and the vector-enhancements facility 2 as supported. Indicate
support for the latter in the HWCAP vector as well.

s390x: Vec-enh-2, VSTRS

Support the new "vector string search" instruction VSTRS. The
implementation is a full emulation and follows a similar approach as for
the other vector string instructions.

s390x: Vec-enh-2, VSLD and VSRD

Support the new "vector shift left/right double by bit" instructions VSLD
and VSRD.

s390x: Vec-enh-2, VLBR and friends

Add support for the new byte- and element-swapping vector load/store
instructions VLEBRH, VLEBRG, VLEBRF, VLLEBRZ, VLBRREP, VLBR, VLER,
VSTEBRH, VSTEBRG, VSTEBRF, VSTBR, and VSTER.

s390x: Vec-enh-2, extend VCDG, VCDLG, VCGD, and VCLGD

The vector-enhancements facility 2 extends the vector floating-point
conversion instructions VCDG, VCDLG, VCGD, and VCLGD. In addition to
64-bit elements, they now also handle 32-bit elements. Add support for
these new forms.

s390x: Vec-enh-2, extend VSL, VSRA, and VSRL

The vector-enhancements facility 2 extends the existing bitwise vector
shift instructions VSL, VSRA, and VSRL. Now they allow the shift
vector (the third operand) to contain different shift amounts for each
byte. Add support for these new forms.

s390x: Misc-insn-3, test case

Add a test case for the new instructions in the miscellaneous instruction
extensions facitility 3.

s390x: Misc-insn-3, MVCRL

Add support for the "move right to left" instruction MVCRL.

s390x: Misc-insn-3, new POPCNT variant

Add support for the new POPCNT variant that has bit 0 of the M3 field set
and yields the total number of one bits in its 64-bit operand.

s390x: Misc-insn-3, "select" instructions

Add support for the instructions SELR, SELGR, and SELFHR.

s390x: Misc-insn-3, bitwise logical 3-way instructions

Add support for the instructions NCRK, NCGRK, NNRK, NNGRK, NORK, NOGRK,
NXRK, NXGRK, OCRK, and OCGRK. Introduce a common helper and use it for
the existing instructions NRK, NGRK, XRK, XGRK, ORK, and OGRK as well.

unhandled ppc64le-linux syscall: 252 (statfs64) and 253 (fstatfs64)

glibc 2.34 consolidated all statfs implementations. All other arches
that have statfs64/fstat64 (including ppc32) already had that syscall
hooked up, it was just ppc64 that was missing it.

https://bugs.kde.org/show_bug.cgi?id=440670

Generate a ENOSYS (sys_ni_syscall) for clone3 on all linux arches

glibc 2.34 will try to use clone3 first before falling back to
the clone syscall. So implement clone3 as sys_ni_syscall which
simply return ENOSYS without producing a warning.

https://bugs.kde.org/show_bug.cgi?id=439590

Add 439590 glibc-2.34 breaks suppressions to NEWS

Update helgrind and drd suppression libc and libpthread paths in glibc 2.34

glibc 2.34 moved all pthread functions into the main libc library.
And it changed the (in memory) path of the main libc library to
libc.so.6 (before it was libc-2.xx.so).

This breaks various standard suppressions for helgrind and drd.
Fix this by doing a configure check for whether we are using glibc
2.34 by checking whether pthread_create is in libc instead of in
libpthread. If we are using glibc then define GLIBC_LIBC_PATH and
GLIBC_LIBPTHREAD_PATH variables that point to the (regexp) path
of the library that contains all libc functions and pthread functions
(which will be the same path for glibc 2.34+).

Rename glibc-2.34567-NPTL-helgrind.supp to glibc-2.X-helgrind.supp.in
and glibc-2.X-drd.supp to glibc-2.X-drd.supp.in and replace the
GLIBC_LIBC_PATH and GLIBC_LIBPTHREAD_PATH at configure time.

The same could be done for the glibc-2.X.supp.in file, but hasn't
yet because it looks like most suppressions in that file are obsolete.

m_debuginfo/debuginfo.c VG_(get_fnname_kind) _start is below main

With glibc 2.34 we might see the _start symbol as the frame that
called main instead of directly after __libc_start_main or
generic_start_main.

Fixes memcheck/tests/badjump[2], memcheck/tests/origin4-many,
helgrind/tests/tc04_free_lock, helgrind/tests/tc09_bad_unlock
and helgrind/tests/tc20_verifywrap.

gdbserver_tests: update filters for newer glibc/gdb

With newer glibc/gdb we might see a __select call without anything
following on the line. Also when gdb cannot find a file it might
now print "Inappropriate ioctl for device" instead of the message
"No such file or directory"

Un-break arm64 isel following 22bae4b1544fc5d82f131ef8fde4cea7666112c2

22bae4b1544fc5d82f131ef8fde4cea7666112c2 introduced an iropt-level rewrite rule

  64to16( 32Uto64 ( x )) --> 32to16(x)

that creates Iop_32to16 nodes.  The arm64 isel apparently has never seen these
before and so asserts.  This is a 1-liner fix.

amd64 front end: Make uses of 8- and 16-bit GPRs GET the entire containing register.

Until now, a read of a 32-bit GPR (eg, %ecx) in the amd64 front end actually
involved GETting the containing 64-bit reg (%rcx) and dropping off its top
32-bits, in the IR translation.  This makes IR optimisation work well for code
that mixes 32 and 64 bit integer operations, which is very commont.  In
particular it helps guarantee that PUT-to-GET and redundant-GET optimisations
work, hence that constant propagation/folding across such boundaries works,
and indirectly helps to avoid generating code in the back end that suffers
from store-forwarding or partial-register-read stalls.

This commit partially extends those advantages to 8- and 16-bit GPR reads.  In
particular, all 16-bit GPR fetches are now a GET of the whole 64-bit register
followed by an Iop_64to16 cast.  The same scheme is used for 8-bit register
fetches, except for the "anomalous four" (%ah, %bh, %ch, %dh), whose handling
is left unchanged.

With this in place, now, a wider write followed by a smaller read, will play
nice with constant folding, propagation, for example (somewhat artificially):

   movl $17, %ecx    // 32-bit write of %rcx
   shrl %cl, %r15    // 8-bit read of %rcx

The 17 will be propagated, in IR, up to the shift.

The commit also adds a couple more rewrite rules in ir_opt.c to remove some of
the resulting pointless conversion pairings.

Consistently set CC_NDEP when setting the flags thunk.

For most settings of the flags thunk (guest_CC_{OP,DEP1,DEP2,NDEP}), the value
of the NDEP field is irrelevant, because of the setting of the OP field, and
so it is usually not set in such cases, which are the vast majority.  This
saves a store (a PUT) in the final generated code.  But it has the bad effect
that the IR optimiser cannot know that preceding PUTs to the field are
possibly dead and can be removed.  Most of the time that is not important, but
just occasionally it can cause a lot of pointless extra computation (calling
of amd64g_calculate_rflags_all) to happen.  This was observed in a long basic
block involved in a hash calculation, like this:

   rolq ..   // sets CC_NDEP to the previous value of the flags,
             // as calculated by amd64g_calculate_rflags_all
   mulq ..
   (rolq/mulq repeated several times)

   addq ..   // effect is, all of the flag computation done for the rol/mul
             // sequence is irrelevant, but iropt can't see that

Setting CC_NDEP consistently to zero, even if it isn't needed, avoids the
problem.

amd64 front end: more spec rules: S/NS after LOGICW, S after SHRL, Z after SHRW, C after SUBW.

This adds a few more spec rules that seem useful for running Firefox built
with gcc-O3 and clang-O3. At least one of them removes a false Memcheck
error.

There is also some improved debug printing, currently #if 0'd.

Remove redundant assertions and conditionals in move_CEnt_to_top.

move_CEnt_to_top is on the hot path when reading large amounts of debug info,
especially Dwarf inlined-function info. It shows up in 'perf' profiles. This
commit removes assertions which are asserted elsewhere, and tries to avoid a
couple of conditional branches.

Reimplement h_generic_calc_GetMSBs8x16 to be more efficient.

h_generic_calc_GetMSBs8x16 concatenates the top bit of each 8-bit lane in a
128-bit value, producing a 16-bit scalar value. (It is PMOVMSKB, really).
The existing implementation is excessively inefficient and shows up sometimes
in 'perf' profiles of V. This commit replaces it with a logarithmic (4-stage)
algorithm which is hopefully much faster.

Ignore redundant REX.W for some MOVDQU variants

Fixes BZ#438871

Bug 438630 Adds zero variants of arm64 v8.2 FP compare instructions.

This patch adds half-precision floating-point support for the following:
FCMEQ <Hd>, <Hn>, #0.0
FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0
FCMGE <Hd>, <Hn>, #0.0
FCMGE <Vd>.<T>, <Vn>.<T>, #0.0
FCMGT <Hd>, <Hn>, #0.0
FCMGT <Vd>.<T>, <Vn>.<T>, #0.0
FCMLE <Hd>, <Hn>, #0.0
FCMLE <Vd>.<T>, <Vn>.<T>, #0.0
FCMLT <Hd>, <Hn>, #0.0
FCMLT <Vd>.<T>, <Vn>.<T>, #0.0

Fixes https://bugs.kde.org/show_bug.cgi?id=438630

Bug 438038 Adds arm64 v8.2 FP compare & conditional compare instructions.

This patch adds half-precision floating-point support for the following:
FCCMP <Hn>, <Hm>, #<nzcv>, <cond>
FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>
FCMEQ <Hd>, <Hn>, <Hm>
FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
FCMGE <Hd>, <Hn>, <Hm>
FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
FCMGT <Hd>, <Hn>, <Hm>
FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Fixes https://bugs.kde.org/show_bug.cgi?id=438038

PPC64 Enable the MMA bit in the HWCAP.

The MMA bit should have been enabled when the last of the MMA instructions
were committed. Also, the header comments about filtering out the DARN
and SCV support should have been updated when DARN and SCV support was added.

Bug 436873 Added arm64 v8.2 vector FABD, FACGE, FACGT and FADD

This patch adds FP half-precision support for the following:
FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Fixes https://bugs.kde.org/show_bug.cgi?id=436873

Bug 436411 Added arm64 v8.2 scalar FABD, FACGE, FACGT and FADD

This patch adds FP half-precision support for the following:
FADD <Hd>, <Hn>, <Hm>
FABD <Hd>, <Hn>, <Hm>
FACGT <Hd>, <Hn>, <Hm>
FACGE <Hd>, <Hn>, <Hm>

Fixes https://bugs.kde.org/show_bug.cgi?id=436411

s390x: Don't emit "vector or with complement" on z13

The z/Architecture instruction "vector or with complement" (VOC) can be
used as an optimization to combine "vector or" with "vector nor".  This is
exploited in Valgrind since commit 6c1cb1a0128b00858b973e.  However, VOC
requires the vector-enhancements facility 1, which is not installed on a
z13 CPU.  Thus Valgrind can now run into SIGILL on z13 when trying to
execute vector string instructions.

Fix this by suppressing the VOC optimization unless the
vector-enhancements facility 1 is recognized on the host.

PPC64: Add support for copy, cpabort, paste instructions

Don't look for separate debuginfo if the image has a .debug_info section

Fixes BZ#435908

configure.ac: Fix portability of test(1) expression for C99 support

The == operator is non-standard, use = instead.

[ bvanassche: added "configure.ac: " prefix in front of patch subject ]

s390x: Add missing stdout.exp for vector string memcheck test

The file vistr.stdout.exp was missing from commit 32312d588. Add it.

s390x: Fix/optimize Iop_64HLtoV128

In s390_vr_fill() in guest_s390_toIR.c, filling a vector with two copies
of a 64-bit value is realized with Iop_64HLtoV128, since there is no such
operator as Iop_Dup64x2.  But the two args to Iop_64HLtoV128 use the same
expression, referenced twice.  Although this hasn't been seen to cause
real trouble yet, it's problematic and potentially inefficient, so change
it: Assign to a temp and pass that twice instead.

In the instruction selector, if Iop_64HLtoV128 is found to be used for a
duplication as above, select "v-vdup" instead of "v-vinitfromgprs".  This
mimicks the behavior we'd get if there actually was an operator
Iop_Dup64x2.

s390x: Add support for emitting "vector or with complement"

In the instruction selector, look out for IR expressions that fit "vector
or with complement (VOC)". Emit when applicable.

This slighly reduces the generated code sometimes, such as for certain
vector string instructions, where such expressions occur quite frequently.

s390x: Rework insn "v-vdup" and add "v-vrep"

So far the only s390x insn for filling a vector with copies of the same
element is "v-vdup" (S390_VEC_DUPLICATE), which replicates the first
element of its vector argument.  This is fairly restrictive and can lead
to unnecessarily long code sequences.

Redefine "v-vdup" to replicate any scalar value instead.  And add
"v-vrep" (S390_INSN_VEC_REPLICATE) for replicating any given element of a
vector.  Select the latter for suitable expressions like

  Iop_Dup8x16(Iop_GetElem8x16(vector_expr, i))

This improves the generated code for some vector string instructions,
where a lot of element replications are performed.

s390x: Improve handling of amodes without base register

Addressing modes without a base or index register represent constants.
They can occur in some special cases such as shift operations and when
accessing individual vector elements. Perform some minor improvements to
the handling of such amodes.

Bug 434296 - s390x: Add memcheck test cases for vector string insns

Bug 434296 addresses memcheck false positives with the vector string
instructions VISTR, VSTRC, VFAE, VFEE, and VFENE. Add test cases that
verify the fix for that bug. Without the fix, memcheck yields many
complains with these tests, most of which are false positives.

Bug 434296 - s390x: Rework IR conversion of VISTR

The z/Architecture instruction VISTR is currently transformed to a dirty
helper that executes the instruction. This can cause false positives with
memcheck if the input string contains undefined characters after the
string terminator. Implement without a dirty helper and emulate the
instruction instead.

Bug 434296 - s390x: Rework IR conversion of VFENE

So far the z/Architecture instruction "vector find element not
equal" (VFENE) is transformed to a loop. This can cause spurious
"conditional jump or move depends on uninitialised value(s)" messages by
memcheck. Re-implement without a loop.

Bug 434296 - s390x: Rework IR conversion of VSTRC, VFAE, and VFEE

The z/Architecture instructions "vector string range compare" (VSTRC),
"vector find any element equal" (VFAE), and "vector find element
equal" (VFEE) are each implemented with a dirty helper that executes the
instruction.  Unfortunately this approach leads to memcheck false
positives, because these instructions may yield a defined result even if
parts of the input vectors are undefined.  There are multiple ways this
can happen: Wherever the flags in the fourth operand to VSTRC indicate
"match always" or "match never", the corresponding elements in the third
operand don't affect the result.  The same is true for the elements
following the first zero-element in the second operand if the ZS flag is
set, or for the elements following the first matching element, if any.

Re-implement the instructions without dirty helpers and transform into
lengthy IR instead.

s390x: Add convenience function mkV128()

Provide mkV128() as a short-hand notation for creating a vector constant from
a bit pattern, similar to other such functions like mkU64().

s390x: Support "expensive" comparisons Iop_ExpCmpNE32/64

Add support for Iop_ExpCmpNE32 and Iop_ExpCmpNE64 in the s390x instruction
selector. Handle them exactly like the "inexpensive" variants Iop_CmpNE32
and Iop_CmpNE64.

PPC64: add support for the vectored system call instruction scv.

PPC64: Add support for the darn instruction

Bug 433863 - s390x: Remove memcheck test cases for cs, cds, and csg

The fix for bug 429864 - "s390x: C++ atomic test_and_set yields
false-positive memcheck diagnostics" changes the memcheck behavior at
various compare-and-swap instructions.  The comparison between the old and
expected value now always yields a defined result, even if the input
values are (partially) undefined.  However, some existing test cases
explicitly verify that memcheck complains about the use of uninitialised
values here.  These test cases are no longer valid.  Remove them.

s390x: Add missing UNOP insns to s390_insn_as_string

Some unary operator insns are not handled by s390_insn_as_string(). If
they are encountered while the appropriate trace flag is set, a vpanic
occurs. Fix this: add handling for the missing insns.

drd/tests/swapcontext: Add SIGALRM handler to avoid stacktrace

During testing for oe-core build on QEMU,
SIGALRM can trigger during nanosleep.
This results a different stderr output than expected.

```
==277== Process terminating with default action of signal 14 (SIGALRM)
==277== at 0x36C74C3943: clock_nanosleep@@GLIBC_2.17 (clock_nanosleep.c:43)
==277== by 0x36C74C8726: nanosleep (nanosleep.c:25)
```

This stacktrace printing will not occur
if we add a handler that simply exits.

https://bugs.kde.org/show_bug.cgi?id=435160

Signed-off-by: Yi Fan Yu <yifan.yu@windriver.com>

Callgrind: Broader handling of _dl_runtime_resolve variants

This is a supplement to commit 86277041

To improve its results, Callgrind does special handling for
the runtime linker entry point to resolve symbols. However,
it only used the exact symbol name "_dl_runtime_resolve",
as well as specific machine code templates (when the runtime
linker was stripped from symbol names) as basis.
Recent glibc added multiple similar symbol names as variants,
such as _dl_runtime_resolve_xsave.

The above-mentioned commit 86277041 solves this by extending
the check for machine code templates for specific Linux
distributions.
This patch extends this for more architectures and variants
by checking if a function starts with "_dl_runtime_resolve".
Furthermore, the original function names of the variants
still are visible in the output (and not forced to the prefix).

While the heuristic that every function symbol starting
with the prefix "_dl_runtime_resolve" as being an entry point
into the runtime linker for resolving a function address may
be a bit rough, this prefix is not expected to be used often in
other source code for anything else.

The worst case is a slightly misleading call graph only
visible in a very specific situation: if the wrongly-detected
function does a tail call (ie instead of returning, jumping
to another function), it will be shown as 2 calls in a row
from the original caller.

PPC64 rename xvcvbf16sp to xvcvbf16spn. Fix up comments for xvcvspbf16 and xvcvbf16spn instructions.

Record BZ#423963 fix

Only process clone results in the parent thread

Fixes BZ#423963

Create initial new release entry in NEWS file for a future release.

Reduced precision Missing Integer based outer tests

PPC64: Reduced-Precision: Missing Integer-based Outer Product Operations

Add support for:

pmxvi16ger2 VSX Vector 16-bit Signed Integer GER (rank-2 update), Prefixed
   Masked
pmxvi16ger2pp VSX Vector 16-bit Signed Integer GER (rank-2 update) (Positive
   multiply, Positive accumulate), Prefixed Masked
pmxvi8ger4spp VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) with
   Saturation (Positive multiply, Positive accumulate), Prefixed Masked
xvi16ger2 VSX Vector 16-bit Signed Integer GER (rank-2 update)
xvi16ger2pp VSX Vector 16-bit Signed Integer GER (rank-2 update) (Positive
   multiply, Positive accumulate)
xvi8ger4spp VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) with
   Saturation (Positive multiply, Positive accumulate)

Reduced Precision bfloat16 outer product tests