git.ipfire.org Git - thirdparty/valgrind.git/log

Keep on churning.

Without #define _XOPEN_SOURCE macports clang 9.0.1 on OSX 10.7.5 was
giving me

In file included from swapcontext.c:12:
/usr/include/ucontext.h:43:2: error: The deprecated ucontext routines require
      _XOPEN_SOURCE to be defined
^
swapcontext.

So I added #define _XOPEN_SOURCE

But that gives, on Solaris 11.3

In file included from /usr/include/limits.h:12:0,
                 from /usr/gcc/4.8/lib/gcc/i386-pc-solaris2.11/4.8.2/include-fixed/limits.h:168,
                 from /usr/gcc/4.8/lib/gcc/i386-pc-solaris2.11/4.8.2/include-fixed/syslimits.h:7,
                 from /usr/gcc/4.8/lib/gcc/i386-pc-solaris2.11/4.8.2/include-fixed/limits.h:34,
                 from swapcontext.c:7:
/usr/include/sys/feature_tests.h:354:2: error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications  and pre-2001 POSIX applications"
#error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
  ^

So make the #define _XOPEN_SOURCE conditional on darwin.

Modify cxx17_aligned_new testcase to accommdate clang.

Explicitly use ordinary scalar delete and update the expecteds.
Otherwise g++ uses sized scalar delete whilse clang uses
ordinary scalar delete which causes a diff.

Added one redir too many on Solaris, which causes a runtime error.

Fix compilation on OS X

Bug 388787 - Support for C++17 new/delete

These over-aligned new and delete operators were added in C++ 17.

Remove deep-D.post.exp-ppc64 from EXTRA_DIST.

massif/tests/deep-D.post.exp-ppc64 was remove in commit 24a94df73
"VG_(get_fnname_kind): Recognize gcc "optimized" below main functions."
but was still listed in massif/tests/Makefile.am (EXTRA_DIST). Causing
make dist to fail.

VG_(get_fnname_kind): Recognize gcc "optimized" below main functions.

The VG_(get_fnname_kind) function detects some special "below main"
function names. Specifically __libc_start_main and generic_start_main
both of which are used to call the actual main () function from the
application. We already recognized one variant, generic_start_main.isra.0,
but only for powerpc. Recognize all possibly specialed optimized variants
gcc can produce by simply checking for the function name with dot as
prefix. This fixes the memcheck/tests/supp_unknown.vgtest and
massif/tests/deep-D.vgtest with gcc 11.

We can now also get rid of the special cases in
massif/tests/deep-D.post.exp-ppc64 and memcheck/tests/supp_unknown.supp.

https://bugs.kde.org/show_bug.cgi?id=430158

sys_newfstatat: don't complain if |file_name| is NULL.

This is a followup to 2a7d3ae76, in the case rust code runs against a
glibc that supports statx but a kernel that doesn't, in which case glibc
falls back to fstatat.

https://bugs.kde.org/show_bug.cgi?id=433641

Fix README spelling mistake adb -> and

Reported by: satbek@unist.ac.kr

https://bugs.kde.org/show_bug.cgi?id=433629

Use pkglibexec as vglibdir.

vglibdir is the directory from where valgrind loads its internal tool
executables and vgpreloads. Currently vglibdir is pkglibdir, so those
internal tools are intermingeled with normal executables and libraries
that the user might use directly.

Make vglibdir equal to pkglibexecdir so the internal tools get installed
and loaded from libexec and don't get get stored under lib.

This leaves just the static archives and the mpiwrapper libraries that
the user would link/load themselves under pkglibdir.

This seems more in line with the FHS lib/libexec standard and makes it
slightly easier to combine the tools from a multilib target (say the
memcheck-amd64-linux and memcheck-x86-linux tools) because they would
be installed under the same directory, while the pkglibdir can differ
depending on arch/target (lib/lib64).

https://bugs.kde.org/show_bug.cgi?id=433323

none/tests/ifunc.c: Fix a compiler warning

Fix the following compiler warning:

ifunc.c:9:15: warning: 'ifunc' resolver for 'test' should return 'void (*)(int)' [-Wattribute-alias=]
9 | static void (*resolve_test(void))(void)
| ^~~~~~~~~~~~

gdbserver_tests: filter out Download failed: messages.

gdb can also use debuginfod and is excessively chatty when downloads
fail (even when DEBUGINFOD_URLS isn't set). Filter those messages out
of the gdb output.

syswrap-ppc64-linux.c: Use FMT_REGWORD and [S]ARG[1234] in PRINT macros.

This prevents some print formatting related warnings.

Make the dwarf3 reader more robust and less chatty when things go wrong

Skip some stuff when seeing an unknown language, be less chatty about
parser issues.

All the issues seem to come from the multi-file, that is the shared
(supplementary or alt) file containing debuginfo shared by all the
gcc/runtime libraries.

There are a couple of issues that this patch works around:

- The multifile contains entries for the 'D' language, which has some
  constructs we don't expect.
- We don't read partial units correctly, which means we often don't know
  the language we are looking at.
- The parser is very chatty about issues it didn't expect (even if they
  are ignored, it will still output something)

It only shows up with --read-var-info=yes which some tests enable, but
which is disabled by default.

Also increate the timeout of drd/tests/pth_cleanup_handler.c because
DWARF reading is so slow.

https://bugs.kde.org/show_bug.cgi?id=433500

PR432215 Add debuginfod functionality

debuginfod is an HTTP server for distributing ELF/DWARF debugging
information. When a debuginfo file cannot be found locally, Valgrind
is able to query debuginfod servers for the file using its build-id.

readelf.c: Add debuginfod_find_debug_file(). Spawns a child process to
exec `debuginfod-find` in order to query servers for the debuginfo
file. Also add helper debuginfod_find_path().

pub_core_pathscan.h: Moved from priv_initimg_pathscan.h in order to use
VG_(find_executable)() in readelf.c.

docs: Add information regarding debuginfod to valgrind.1

memcheck/tests/linux: Add new test debuginfod-check.

tests/vg_regtest.in: Clear $DEBUGINFOD_URLS before running any tests.

https://bugs.kde.org/show_bug.cgi?id=432215

PPC64: 128-bit Binary Integer Operations, part tests.

PPC64: ISA 3.1 (new Iops) 128-bit Binary Integer Operations

Add support for:

  dcffixqq DFP Convert From Fixed Quadword Quad
  dctfixqq DFP Convert To Fixed Quadword Quad

  vdivesq Vector Divide Extended Signed Quadword
  vdiveuq Vector Divide Extended Unsigned Quadword
  vdivsq Vector Divide Signed Quadword
  vdivuq Vector Divide Unsigned Quadword

  vmodsq Vector Modulo Signed Quadword
  vmoduq Vector Modulo Unsigned Quadword
  vmulesd Vector Multiply Even Signed Doubleword
  vmuleud Vector Multiply Even Unsigned Doubleword
  vmulosd Vector Multiply Odd Signed Doubleword
  vmuloud Vector Multiply Odd Unsigned Doubleword
  vmsumcud Vector Multiply-Sum & write Carry-out Unsigned Doubleword

  xscvqpsqz VSX Scalar Convert with round to zero Quad-Precision to Signed
    Quadword
  xscvqpuqz VSX Scalar Convert with round to zero Quad-Precision toUnsigned
    Quadword
  xscvsqqp VSX Scalar Convert Signed Quadword to Quad-Precision
  xscvuqqp VSX Scalar Convert Unsigned Quadword to Quad-Precision

PPC64: Fix naming trinary to ternary

PPC64: Add ACC register file registers to get_otrack_shadow_offset_wrkget_otrack_shadow_offset_wrk()

PPC64: Fix V-bit casting for existing Iops.

Iop_TruncF128toI64S, Iop_TruncF128toI32S, Iop_TruncF128toI64U,
Iop_TruncF128toI32U, Iop_ReinterpI32asF32, Iop_ReinterpF32asI32,
Iop_ReinterpF64asI64, Iop_ReinterpI64asF64, Iop_ReinterpI64asD64,
Iop_ReinterpD64asI64

Hopefully the last small changes for the drd swapcontext test

drd/tests/swapcontext: Improve the portability of this test further

- Remove the VALGRIND_STACK_REGISTER() invocation for the initial thread
  stack since it is superfluous. Remove the pthread_attr_getstack() call
  that became superfluous by this change.
- Change SIGINT into SIGALRM for FreeBSD since pthread_kill(..., SIGINT)
  causes the application to return a SIGINT status.
- Reduce the stack size of the threads created by this test.

Filter out unsupported instructions from HWCAP2 on powerpc.

Valgrind currently doesn't support the DARN random number instruction
and the SCV syscall instruction. Filter them out of HWCAP2 so glibc
and applications don't try to use them when running under valgrind.

Also suppress printing a log message for scv instructions in the
instruction stream.

Reported by: Florian Weimer <fweimer@redhat.com>

DARN bug: https://bugs.kde.org/show_bug.cgi?id=411189
SCV bug: https://bugs.kde.org/show_bug.cgi?id=431157

gdbserver_tests/hgtls.vgtest: Make sure gdb is installed before running

The other gdbserver_tests that need to run gdb make sure it is actually
available before trying to run it, otherwise the test is skipped. Do the
same to hgtls.vgtest by adding test -e gdb to the prereq.

Fix typo in DWARF 5 line table readers

This typo meant the directory entry was most often zero, which
happened to be sometimes correct anyway (since zero is the compdir).
So for simple testcases it looked correct. But it would be wrong for
compilation units not in the current compdir. Like files compiled with
a relative of absolute path (and then combined into the same compilation
unit with LTO).

The same typo was in both readdwarf.c (read_dwarf2_lineblock) and
readdwarf3.c (read_filename_table). read_dwarf2_lineblock also had
an extra "dwarf" string in the --debug-dump=line output.

https://bugzilla.redhat.com/show_bug.cgi?id=1927153

Add enomem and swapcontext tests to .gitignore

swapcontext.vgtest fails with glibc-debuginfo installed

With debuginfo installed the backtace contains the swapcontext.S
source file. Filter that out, like the clone.S source file is in
drd/tests/filter_stderr.

drd/tests/swapcontext: Improve portability

Linux: Add wrapper for fcntl(F_{GET,ADD}_SEALS)

Add also a testcase to memcheck/tests/linux, enabled according to a new
check for memfd_create() in configure.ac.

https://bugs.kde.org/show_bug.cgi?id=361770

Fix valgrind.h include in drd/tests/swapcontext.c

In tree tests should include "valgrind.h" not <valgrind/valgrind.h>
the later might pick up the system installed valgrind.h and doesn't
work when srcdir != builddir.

drd/tests/swapcontext: Add a swapcontext() test

drd: Process stack registration client requests

Reset stack information if the client registers a new stack

core: Pass stack change user requests on to tools

Since DRD tracks the lowest and highest stack address that has been used,
it needs to know about stack registration events. Hence pass on stack
registration events to tools.

Update NEWS with some core and platform (s390) changes and bug fixes.

Mention the new DWARF version 5 support needed with GCC 11.
s390 now supports z14 vector instructions.
Add missing bugs fixed and sort them by bug number (n-i-bz last).
Pull in 3.16.1 release data.

PR217695 malloc/calloc/realloc/memalign failure doesn't set errno to ENOMEM

When one of the allocation functions in vg_replace_malloc failed
they return NULL, but didn't set errno. This is slightly tricky since
errno is implementation defined and might be a macro. In the case of
glibc ernno is defined as:

extern int *__errno_location (void) __THROW __attribute__ ((__const__));
#define errno (*__errno_location ())

We can use the same trick as we use for __libc_freeres in
coregrind/vg_preloaded.c. Define the function as "weak". This means
it will only be defined if another library (glibc in this case)
actually provides a definition. Otherwise it will be NULL.
So we will only call it if it is defined and one of the allocation
functions failed, returned NULL.

Include a new linux only memcheck testcase, enomem.vgtest.

https://bugs.kde.org/show_bug.cgi?id=217695

Fix compilation on macOS with new debuginfo reader

PR432809 VEX should support REX.W + POPF

It seems a REX.W prefix simply explicitly sets the operant size to 8,
and so can/must be ignored as redundant. This is what we already do
for PUSH, POP and PUSHF. All instructions are described as "When in
64-bit mode, instruction defaults to 64-bit operand size and cannot
encode 32-bit operand size." in the instruction manual.

Original patch and analysis by Mike Dalessio <mike.dalessio@gmail.com>

https://bugs.kde.org/show_bug.cgi?id=432809

PPC, modsw and modsd instruction fix

vg_regtest: test-specific environment variables not reset between tests

Test-specific environment variables set in .vgtest files are not reset
between tests. This can result in tests running with environment variables
intended for a previously run test. This can be easily fixed by clearing
the @env and @envB arrays in tests/vg_regtest:read_vgtest_file()

Original patch by Aaron Merey <amerey@redhat.com>

https://bugs.kde.org/show_bug.cgi?id=432672

PR140939 --track-fds reports leakage of stdout/in/err and doesn't respect -q

Make --track-fds=yes not report on file descriptors 0, 1, and 2 (stdin,
stdout, and stderr) by default. Add a new option --track-fds=all that does
report on the std file descriptors still being open. Update testsuite and
documentation.

Original patch by Peter Kelly <pmk@cs.adelaide.edu.au>
Updated by Daniel Fahlgren <daniel@fahlgren.se>

https://bugs.kde.org/show_bug.cgi?id=140939

PR140178 Support opening /proc/self/exe

Some programs open /proc/self/exe to read some data. Currently valgrind
supports following the /proc/self/exe link (to the original binary, so you
could then open that), but directly opening /proc/self/exe will open the
valgrind tool, not the executable file itself.

Add ML_(handle_self_exe_open) which dups VG_(cl_exec_fd) if the file
to open is /proc/self/exe or /proc/<pid>/exe. And do the same for openat.

https://bugs.kde.org/show_bug.cgi?id=140178

PR423361 Adds io_uring support on arm64/aarch64 (and all other arches)

io_uring syscalls only work on x86/amd64, but they can be enabled on
all arches. Based on a patch by Nathan Ringo <nathan@remexre.xyz>.

https://bugs.kde.org/show_bug.cgi?id=423361

PR422261 platform selection fails for unqualified client name

Bug introduced with commit f15beea76
"Fix memory leak in launcher-linux.c"

Need to try opening the actual 'client' path, not just the 'clientname'
file name.

Reported-by: Michael Wojcik <michael.wojcik@microfocus.com>
https://bugs.kde.org/show_bug.cgi?id=422261

syswrap-linux.c: Pass implicit VKI_IPC_64 for shmctl also on arm64.

The shmctl syscall on amd64, arm64 and riscv (but we don't have a port
for that last one) always use IPC_64. Explicitly pass it to the generic
PRE/POST handlers so they select the correct (64bit) data structures on
those architectures.

https://bugzilla.redhat.com/show_bug.cgi?id=1909548

Fix shmat() on Linux nanomips and x86

On Linux, there are two variants of the direct shmctl syscall:
- sys_shmctl: always uses shmid64_ds, does not accept IPC_64
- sys_old_shmctl: uses shmid_ds or shmid64_ds depending on IPC_64

The following Linux ABIs have the sys_old_shmctl variant:
alpha, arm, microblaze, mips n32/n64, xtensa

Other ABIs (and future ABIs) have the sys_shmctl variant, including ABIs
that only got sys_shmctl in Linux 5.1 (such as x86, mips o32, ppc,
s390x).

We incorrectly assume the sys_old_shmctl variant on nanomips and x86,
causing shmat() calls under valgrind to fail with EINVAL.

On x86, the issue was previously masked by the non-existence of
__NR_shmctl until a9fc7bceeb0b0 ("Update Linux x86 system call number
definitions") in 2019.

On mips o32, ppc, and s390x this issue is not visible as our headers do
not have __NR_shmctl for those ABIs (396 since Linux 5.1).

Fix the issue by correcting the preprocessor check in get_shm_size() to
only assume the old Linux sys_old_shmctl behavior on the specific
affected platforms.

Also, exclude the use of direct shmctl entirely on Linux x86, ppc,
mips o32, s390x in order to keep compatibility with pre-5.1 kernel
versions that did not yet have direct shmctl for those ABIs.
This currently only has actual effect on x86 as only it has __NR_shmctl
in our headers.

Fixes tests mremap4, mremap5, mremap6.

https://bugs.kde.org/show_bug.cgi?id=410743

Handle Iop_NegF16, Iop_AbsF16 and Iop_SqrtF16 as non-trapping.

Add Iop_NegF16, Iop_AbsF16 and Iop_SqrtF16 to VEX/priv/ir_defs.c
primopMightTrap. Also rewrite case statement slightly so GCC will warn
if an enumeration value is missed.

Bug 432161 Addition of arm64 v8.2 FADDP, FNEG and FSQRT

This patch adds FP half-precision support for the following:
FABS <Hd>, <Hn>
FABS <Vd>.<T>, <Vn>.<T>
FNEG <Hd>, <Hn>
FNEG <Vd>.<T>, <Vn>.<T>
FSQRT <Hd>, <Hn>
FSQRT <Vd>.<T>, <Vn>.<T>

Fixes https://bugs.kde.org/show_bug.cgi?id=432161

Add support for DWARF5 as produced by GCC11

Implement DWARF5 in readdwarf.c and readdwarf3.c

Since gcc11 will default to DWARF5 by default it is time for
valgrind to support it. The patch handles everything gcc11 produces
(except for the new DWARF expressions).

There is some duplication in the patch since we actually have two DWARF
readers which use slightly different abstractions (Slices vs Cursors).
It would be nice if we could merge these somehow. The reader in
readdwarf3.c is only used when --read-var-info=yes is used (which
drd uses to provide the allocation context).

The handling of DW_FORM_implicit_const is tricky with the current design.
An abbrev which contains an attribute encoded with DW_FORM_implicit_const
has its value also in the abbrev. The code in readdwarf3.c assumed it
always could simply get the data from the .debug_info/current Cursor.
For now I added a value field to the name_form field that holds the
associated value. This is slightly wasteful since the extra field is
not necessary for other forms.

Tested against GCC10 (defaulting to DWARF4) and GCC11 (defaulting to
DWARF5) on x86_64. No regressions in the regtests.

https://bugs.kde.org/show_bug.cgi?id=432102

Define AT as UChar in VEX/priv/guest_ppc_toIR.c (dis_vsx_accumulator_prefix)

GCC notices that AT is passed around as char, specifically as %u argument
to DIP. But ifieldAT returns an UChar and vsx_matrix_ger takes AT as UChar.
This causes lots of format string warnings when building with GCC11.

Simply declare AT as UChar instead of char.

Fix indentation in coregrind/m_debuginfo/readpdb.c (DEBUG_SnarfLinetab)

GCC warns:

readpdb.c:1631:16: warning: this 'if' clause does not guard...
  [-Wmisleading-indentation]
1631 |                if (debug)
      |                ^~
In file included from ./pub_core_basics.h:38,
                 from m_debuginfo/readpdb.c:38:
../include/pub_tool_basics.h:69:30: note: ...this statement, but the latter
  is misleadingly indented as if it were guarded by the 'if'
   69 | #define ML_(str)    VGAPPEND(vgModuleLocal_,    str)
      |                              ^~~~~~~~~~~~~~
../include/pub_tool_basics.h:66:29: note: in definition of macro 'VGAPPEND'
   66 | #define VGAPPEND(str1,str2) str1##str2
      |                             ^~~~
m_debuginfo/readpdb.c:1636:19: note: in expansion of macro 'ML_'
1636 |                   ML_(addLineInfo)(
      |                   ^~~

The warning message is slightly hard to read because of the macro expansion.
But GCC is right that the indentation is misleading. Fixed by reindenting.

Bug 431556 - Complete arm64 FADDP v8.2 instruction support started in 413547.

Patch from/by Assad Hashmi (assad.hashmi@linaro.org).

PPC64: Fix load store instructions

This patch fixes numerous errors in the ISA support.

The word and prefix versions of the instructions do not use the same mask
to extract the immediate values. The prefix instructions should all use
the DFOM_IMMASK.

The parsing of prefix instructions has been fixed to ensure the ISA 3.1
instructions all have the ISA_3_1_PREFIX_CHECK check.

Fixed the commenting to improve the comments for the instruction parsing.

Fixed the parsing of the plxv instruction.

General code cleanup.

PPC64: Fix EA calculation for prefixed instructions

The effective address (EA) calculation for the prefixed instructions
concatenate an 18-bit immediate value from the prefix word and a 16-bit
immediate value fro the instruction word. This results in a 34-bit value.
The concatenated value must be stored into a long long int not a 32-bit
integer.

PPC64: Fix for VG_MAX_INSTR_SZB, max instruction size is now 8bytes for prefix inst

The ISA 3.1 support has both word instructions of length 4-bytes and prefixed
instruction of length 8-bytes. The following fix is needed when Valgrind
is compiled using an ISA 3.1 compiler.

Fix a couple of comment / crash-message typos. No functional change.

Bug 413547 - regression test does not check for Arm 64 features.

Patches from/by Assad Hashmi (assad.hashmi@linaro.org).

Bug 391853 - Makefile.all.am:L247 and @SOLARIS_UNDEF_LARGESOURCE@ being empty

arm64 isel: in a couple places, use `xzr` as a source rather than loading zero into a reg.

Reduces code size by 0.27% for /usr/bin/date.

arm64 insn selector: improved handling of Or1/And1 trees.

This is the exact analog of cadd90993504678607a4f95dfe5d1df5207c1eb0, to the
point of almost being a copy-n-paste.  That commit split (amd64) iselCondCode
into two functions, iselCondCode_C (existing) and iselCondCode_R (new).  The
latter computes an I1-typed expression into a register rather than a condition
code.  The two functions cooperate so as to minimise between conversions between
a condition-code value and a value in a register.

More arm64 isel tuning: create {and,orr,eor,add,sub} reg,reg,reg-shifted-by-imm

Thus far the arm64 isel can't generate instructions of the form

   {and,or,xor,add,sub} reg,reg,reg-shifted-by-imm

and hence sometimes winds up generating pairs like

   lsh x2, x1, #13 ; orr x4, x3, x2

when instead it could just have generated

   orr x4, x3, x1, lsh #13

This commit fixes that, although only for the 64-bit case, not the 32-bit
case.  Specifically, it can transform the IR forms

  {Add,Sub,And,Or,Xor}(E1, {Shl,Shr,Sar}(E2, immediate))   and
  {Add,And,Or,Xor}({Shl,Shr,Sar}(E1, immediate), E2)

into a single arm64 instruction.  Note that `Sub` is not included in the
second line, because shifting the first operand requires inverting the arg
order in the arm64 instruction, which isn't allowable with `Sub`, since it's
not commutative and arm64 doesn't offer us a reverse-subtract instruction to
use instead.

This gives a 1.1% reduction generated code size when running
/usr/bin/date on Memcheck.

A bit of tuning of the arm64 isel: do PUT(..) = 0x0:I64 in a single insn.

When running Memcheck, most blocks will do one and often two of `PUT(..) =
0x0:I64`, as a result of the way the front end models arm64 condition codes.
The arm64 isel would generate `mov xN, #0 ; str xN, [xBaseblock, #imm]`,
which is pretty stupid. This patch changes it to a single insn:
`str xzr, [xBaseblock, #imm]`.

This is a special-case for `PUT(..) = 0x0:I64`. General-case integer stores
of 0x0:I64 are unchanged.

This gives a 1.9% reduction in generated code size when running
/usr/bin/date on Memcheck.

Add an extra suppression.

On Fedora 33 with gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
it looks like fun:__static_initialization_and_destruction_0 is
now inlined which causes the existing suppression for the
same reachable to no longer match.

expr_is_guardable doesn't handle Iex_Qop

IRExpr_Qop uses the Iex_Qop tag, which expr_is_guardable didn't handle.

https://bugs.kde.org/show_bug.cgi?id=430485

arm64 front end: sfbm: handle sign-extends explicitly

This is a follow-on to 41504d33dec8773c591d45192d1dda6e9c670031.

For the cases of sfbm that are actually just sign-extensions to a wider width,
emit that directly and do disassembly-printing accordingly. No functional
change.

Fix 397605 - Add support for Linux FICLONE ioctl

Fix magic cookie reference in mc-manual.

The URL to the original C++ front-end for GCC internals document
disappeared. Replace it with an URL that still has a description of
the original magic cookie added by operator new [] by that frontend.

arm64 front end: ufbm/sfbm: handle plain shifts explicitly

The ufbm and sfbm instructions implement some kind of semi-magical rotate,
mask and sign/zero-extend functionality.  Boring old left and right shifts are
special cases of it.  The existing translation into IR is correct, but has the
disadvantage that the IR optimiser isn't clever enough to simplify the
resulting IR back into a single shift in the case where the instruction is
used simply to encode a shift.  This induces inefficiency and it also makes
the resulting disassembly pretty difficult to read, if you're into that kind
of thing.

This commit does the obvious thing: detects cases where the required behaviour
is just a single shift, and emits IR and disassembly-printing accordingly.
All other cases fall through to the existing general-case handling and so are
unchanged.

arm64 VEX frontend and backend support for Iop_M{Add,Sub}F{32,64}

The arm64 frontend used to implement the scalar fmadd, fmsub, fnmadd
and fnmsub iinstructions into separate addition/substraction and
multiplication instructions, which caused rounding issues.

This patch turns them into Iop_M{Add,Sub}F{32,64} instructions
(with some arguments negated). And the backend now emits fmadd or fmsub
instructions.

Alexandra Hajkova <ahajkova@redhat.com> added tests and fixed up the
implementation to make sure rounding (and sign) are correct now.

https://bugs.kde.org/show_bug.cgi?id=426014

ppc stxsibx and stxsihx instructions write too much data

stxsibx (Store VSX Scalar as Integer Byte Indexed X-form) is implemented
by first reading a whole word, merging in the new byte, and then writing
out the whole word. Causing memcheck to warn when the destination might
have room for less than 8 bytes.

The stxsihx (Store VSX Scalar as Integer Halfword Indexed X-form)
instruction does something similar reading and then writing a full
word instead of a half word.

The code can be simplified (and made more correct) by storing the byte
(or half-word) directly, IRStmt_Store seems fine to store byte or half
word sized data, and so seems the ppc backend.

https://bugs.kde.org/show_bug.cgi?id=430354

Bug 414268 - Enable AArch64 feature detection and decoding for v8.x instructions (where x>0).

Patch from Assad Hashmi <assad.hashmi@linaro.org>.

Initial change for Bug 429952 didn't work well with older GCC. Use the __clang__ macro instead.

Fix dhat/tests/copy on Solaris

Bug 404076 - s390x: Implement z14 vector instructions

Implement the new instructions/features that were added to z/Architecture
with the vector-enhancements facility 1. Also cover the instructions from
the vector-packed-decimal facility that are defined outside the chapter
"Vector Decimal Instructions", but not the ones from that chapter itself.

For a detailed list of newly supported instructions see the updates to
`docs/internals/s390-opcodes.csv'.

Since the miscellaneous instruction extensions facility 2 was already
addressed by Bug 404406, this completes the support necessary to run
general programs built with `--march=z14' under Valgrind. The
vector-packed-decimal facility is currently not exploited by the standard
toolchain and libraries.

Bug 408663 - Patch: Suppression file for musl libc

Bug 429952 - Errors when building regtest with clang

dhat/tests/Makefile.am: Add filter_copy to dist_noinst_SCRIPTS

Make sure that make dist includes all needed test filters.

check_headers_and_includes: Add dhat/dhat.h to tool_export_header

dhat now has a public header dhat/dhat.h, this header may include
valgrind.h directly. Make sure check_headers_and_includes knows.

Bug 429864 - s390: Use Iop_CasCmp* to fix memcheck false positives

Compare-and-swap instructions can cause memcheck false positives when
operating on partially uninitialized data.  An example is where a 1-byte
lock is allocated on the stack and then manipulated using CS on the
surrounding word.  This is correct, and the uninitialized data has no
influence on the result, but memcheck still complains.

This is caused by logic in the s390 backend, where the expected and actual
memory values are compared using Iop_Sub32.  Fix this by using
Iop_CasCmpNE32 instead.

Add support for copy and ad hoc profiling to DHAT.

Bug 428909 - helgrind: need to intercept duplicate libc definitions for Fedora 33

Add a comment to previous commit.

Fix wcscpy wrapper.

wcscpy deals with wchar_t, which has a size of 4, so the adjustment in
the wrapper must be +4 instead of +1.

lmw, lswi and related PowerPC insns aren't allowed on ppc64le

Newer binutils produce an error when the assembly contains lmw, stmw,
lswi, lswx, stswi, or stswx instructions in little-endian mode.

Only build and run the lsw and ldst_multiple testcases on ppc64[be].

https://bugs.kde.org/show_bug.cgi?id=427870

Fix an obscure problem with peak finding.

Currently, if there are multiple equal global peaks, `intro_Block` and
`resize_Block` record the first one while `check_for_peak` records the
last one. This could lead to inconsistent output, though it's unlikely
in practice.

This commit fixes things so that all functions record the last peak.

Hook up unhandled ppc64le-linux syscall: 147 (getsid)

https://bugs.kde.org/show_bug.cgi?id=429692

PowerPC, fix for conv_f16_to_double xscvhpdp assembler code

The previous commit:

  commit eb82a294573d15c1be663673d55b559a82ca29d3
  Author: Julian Seward <jseward@acm.org>
  Date:   Tue Nov 10 21:10:48 2020 +0100

      Add a missing ifdef, whose absence caused build breakage on non-POWER targets.

fixed the compile issue in conv_f16_to_double() where non-Power platforms
do not support the power xscvhpdp assembly instructions.  The instruction
is supported by ISA 3.0 platforms.  Older Power platforms still fail to
compile with the assembly instruction.  This patch fixes the if def for
power systems that do not support ISA 3.0.

memcheck: on arm64, use expensive instrumentation for Cmp{EQ,NE}64 by default.

Add a missing ifdef, whose absence caused build breakage on non-POWER targets.

Reduced Precision Outer Product Operation tests

Fix, add ISA 3.1 check to set ISA 3.1 in Valgrind hwcaps value

ISA 3.1 Reduced-Precision: Outer Product Operations

Add support for:

pmxvf16ger2 Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update)
pmxvf16ger2nn Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) (Negative multiply, Negative accumulate)
pmxvf16ger2np Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) (Negative multiply, Positive accumulate)
pmxvf16ger2pn Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) (Positive multiply, Negative accumulate)
pmxvf16ger2pp Prefixed Masked VSX Vector 16-bit Floating-Point GER (rank-2 update) (Positive multiply, Positive accumulate)
pmxvf32ger Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update)
pmxvf32gernn Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) (Negative multiply, Negative accumulate)
pmxvf32gernp Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) (Negative multiply, Positive accumulate)
pmxvf32gerpn Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) (Positive multiply, Negative accumulate)
pmxvf32gerpp Prefixed Masked VSX Vector 32-bit Floating-Point GER (rank-1 update) (Positive multiply, Positive accumulate)
pmxvf64ger Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update)
pmxvf64gernn Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) (Negative multiply, Negative accumulate)
pmxvf64gernp Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) (Negative multiply, Positive accumulate)
pmxvf64gerpn Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) (Positive multiply, Negative accumulate)
pmxvf64gerpp Prefixed Masked VSX Vector 64-bit Floating-Point GER (rank-1 update) (Positive multiply, Positive accumulate)
pmxvi16ger2s Prefixed Masked VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation
pmxvi16ger2spp Prefixed Masked VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation (Positive multiply, Positive accumulate)
pmxvi4ger8 Prefixed Masked VSX Vector 4-bit Signed Integer GER (rank-8 update)
pmxvi4ger8pp Prefixed Masked VSX Vector 4-bit Signed Integer GER (rank-8 update) (Positive multiply, Positive accumulate)
pmxvi8ger4 Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
pmxvi8ger4pp Prefixed Masked VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) (Positive multiply, Positive accumulate)

xvf16ger2 VSX Vector 16-bit Floating-Point GER (rank-2 update)
xvf16ger2nn VSX Vector 16-bit Floating-Point GER (rank-2 update) (Positive multiply, Positive accumulate)
xvf16ger2np VSX Vector 16-bit Floating-Point GER (rank-2 update) (Negative multiply, Positive accumulate)
xvf16ger2pn VSX Vector 16-bit Floating-Point GER (rank-2 update) (Positive multiply, Negative accumulate)
xvf16ger2pp VSX Vector 16-bit Floating-Point GER (rank-2 update) (Positive multiply, Positive accumulate)
xvf32ger VSX Vector 32-bit Floating-Point GER (rank-1 update)
xvf32gernn VSX Vector 32-bit Floating-Point GER (rank-1 update) (Negative multiply, Negative accumulate)
xvf32gernp VSX Vector 32-bit Floating-Point GER (rank-1 update) (Negative multiply, Positive accumulate)
xvf32gerpn VSX Vector 32-bit Floating-Point GER (rank-1 update) (Positive multiply, Negative accumulate)
xvf32gerpp VSX Vector 32-bit Floating-Point GER (rank-1 update) (Positive multiply, Positive accumulate)
xvf64ger VSX Vector 64-bit Floating-Point GER (rank-1 update)
xvf64gernn VSX Vector 64-bit Floating-Point GER (rank-1 update) (Negative multiply, Negative accumulate)
xvf64gernp VSX Vector 64-bit Floating-Point GER (rank-1 update) (Negative multiply, Positive accumulate)
xvf64gerpn VSX Vector 64-bit Floating-Point GER (rank-1 update) (Positive multiply, Negative accumulate)
xvf64gerpp VSX Vector 64-bit Floating-Point GER (rank-1 update) (Positive multiply, Positive accumulate)
xvi16ger2s VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation
xvi16ger2spp VSX Vector 16-bit Signed Integer GER (rank-2 update) with Saturation (Positive multiply, Positive accumulate)
xvi4ger8 VSX Vector 4-bit Signed Integer GER (rank-8 update)
xvi4ger8pp VSX Vector 4-bit Signed Integer GER (rank-8 update) (Positive multiply, Positive accumulate)
xvi8ger4 VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update)
xvi8ger4pp VSX Vector 8-bit Signed/Unsigned Integer GER (rank-4 update) (Positive multiply, Positive accumulate)

xxmfacc VSX Move From ACC
xxmtacc VSX Move To ACC
xxsetaccz VSX Set ACC to Zero

Make memcheck/tests/sized_delete conditional upon the compiler having -fsized-deallocators, add 384729 to NEWS

VSX Load/Store rightmost element operation tests

Test LSB by Byte operation tests

String operation tests

ISA 3.1 VSX Load/Store Rightmost Element Operations

Add support for:

lxvrbx Load VSX Vector Rightmost Byte Indexed
lxvrdx Load VSX Vector Rightmost Doubleword Indexed
lxvrhx Load VSX Vector Rightmost Halfword Indexed
lxvrwx Load VSX Vector Rightmost Word Indexed
stxvrbx Store VSX Vector Rightmost Byte Indexed
stxvrdx Store VSX Vector Rightmost Doubleword Indexed
stxvrhx Store VSX Vector Rightmost Halfword Indexed
stxvrwx Store VSX Vector Rightmost Word Indexed

ISA 3.1 Test LSB by Byte Operation

Add support for:

xvtlsbb

ISA 3.1 String Operations

Add support for:

vclrlb Vector Clear Leftmost Bytes
vclrrb Vector Clear Rightmost Bytes
vstribl[.] Vector String Isolate Byte Left -Justified
vstribr[.] Vector String Isolate Byte Right -Justified
vstrihl[.] Vector String Isolate Halfword Left -Justified
vstrihr[.] Vector String Isolate Halfword Right -Justified

Bit Manipulation Operation tests

128-bit Binary Integer Operation tests