Mark Wielaard [Sat, 2 Oct 2021 10:03:46 +0000 (12:03 +0200)]
Ajust filter_gdb for arm64 with eglibc 2.19 and gdb 7.7.1
Older ubuntu arm64 setups used eglibc 2.19 and gdb 7.7.1. In that
case select.c could be under linux/generic and the select argument
list could be split up differently over several lines. Adjust
filter_gdb to catch those differences.
Also checked against an Debian arm64 with glibc 2.31 and gdb 10.1.
Mark Wielaard [Fri, 1 Oct 2021 20:25:40 +0000 (22:25 +0200)]
Add none/tests/scripts/shell.stderr.exp-dash4 for dash 0.5.11
dash 0.5.11 produces slightly different error messagess.
The new exp file is similar to shell.stderr.exp-dash3 but
with the extra (second) "shell: " output removed.
Carl Love [Tue, 28 Sep 2021 15:49:10 +0000 (15:49 +0000)]
Fix tests for mfspr
Split out the mfspr tests into a separate test using command line option
"-M". The value in the LR and CTR registers changed. It appears the
changes are due to changes in the test program jm-insns.c. Splinting
these instructions out will help to minimize the size of future updates
when the test program changes.
Carl Love [Thu, 9 Sep 2021 23:10:07 +0000 (23:10 +0000)]
fix sraw, srawi, srad, sradi instructions
For ISA 3.0 and beyond, the instructions also write the XER register.
Split the instructions out to a new command line option so we can create
an ISA 2.07 expect file, ISA 3.0 LE and ISA 3.0 BE expect file. The new
command line option is "-s" to just run just these four instructions.
Carl Love [Thu, 9 Sep 2021 19:06:00 +0000 (19:06 +0000)]
Add support for the mcrxrx instruction.
The mcrxrx instruction was introduced in ISA 3.0. It was missed when the
ISA 3.0 support was added to Valgrind.
The mcrxr instruction is not supported on ISA 3.0 and beyond. The
instructions both do a move to the condition register however the mcrxrx
moves [OV|OV32|CA|CA32]. Where the mcrxr instruction moves XER[32:35]
(S0, OV, and CA bits) to the CR.
Carl Love [Wed, 8 Sep 2021 22:01:05 +0000 (22:01 +0000)]
Fix dfp tests.
Due to changes between the compiler and linker, we need to add .machine
arguments to configure file to properly detect the availability of the
dfp instructions.
Add print statement if HAS_DFP is not enabled to make it
easier to determine when HAS_DFP is not enabled.
powerpc: Add .machine directives for scv, copy, paste, cpabort instructions
GCC is no longer passing the "-many" flag to the assembler. So, the
inline assembly instructions statements need to use the .machine directives
for the specific platform.
Andreas Arnez [Thu, 30 Sep 2021 12:10:29 +0000 (14:10 +0200)]
configure.ac: Avoid the use of "which"
The "which" command is not always installed, but configure.ac uses it in
the function AC_HWCAP_CONTAINS_FLAG to force invocation of the executable
"true" rather than the shell builtin with the same name. (The point here
is to get LD_SHOW_AUXV=1 evaluated by the dynamic loader.)
Another option might be to hard-wire the location /bin/true, because the
filesystem hierarchy standard requires it to be there. However, the FHS
doesn't apply to BSDs and at least some FreeBSD versions do not stick to
that specific rule.
On the other hand, the "env" command seems to be available on all relevant
platforms, so use that instead.
- prevent null dereferencing on dlang_type
- prevent buffer overflow when decoding user input
- Add support for demangling local D template declarations
- Add support for demangling D function literals as template
value parameters
- Add support for D `typeof(*null)' types
- Fix -Wundef warnings in ansidecl.h
- Fix endian bug in rust demangler
- Adjust mangling of __alignof__
- Avoid -Wstringop-truncation
Update libiberty demangler to support Rust v0 name mangling
Update the libiberty demangler using the auxprogs/update-demangler
script to the gcc git 01d92cfd79872e4cffc78bf233bb9b767336beb8.
Updates rust demangling to support the new v0 mangling scheme.
This includes the following changes:
- Update the update-demangler script to use gcc git instead of svn.
- The result of running the updated script to get an updated
demangler and resolving the merge conflicts.
- A change to long_namespace_xml.stderr.exp because two overly long
symbols aren't demangled anymore, but just returned as is.
- an update to the m_demangle/demangle.c source to deal with Rust
demangling in cp_demangle, which now directly demangles old and
new style rust symbols.
Mark Wielaard [Sun, 19 Sep 2021 12:30:19 +0000 (14:30 +0200)]
readdwarf3: Introduce abbv_state to read .debug_abbrev more lazily
With the inline parser often a lot of DIEs are skipped, so reading
all abbrevs up front wastes time and memory. A lot of time and memory
can be saved by reading the abbrevs on demand. Do this by introducing
an abbv_state that is used to keep track of the abbrevs already read.
This does technically make the CUConst struct not const.
Mark Wielaard [Sat, 18 Sep 2021 20:16:33 +0000 (22:16 +0200)]
readdwarf3: Reuse abbrev if possible between units
Instead of destroying the ht_abbrvs after processing a CU save it
and the offset so it can be reused for the next CU if that happens
to have the same abbrev offset. dwz compressed DWARF often reuse
the same abbrev for multiple CUs.
Mark Wielaard [Fri, 17 Sep 2021 22:24:38 +0000 (00:24 +0200)]
readdwarf3: Reuse fndn_ix_Table as much as possible
Both the var parser and the inl parser kept a fndn_ix_Table.
Initialize only one per debuginfo read pass and reuse if the stmt offset
is the same as last time (CUs can share the same line table and alt
files do share one for all units).
Mark Wielaard [Thu, 16 Sep 2021 20:01:47 +0000 (22:01 +0200)]
readdwarf3: Skip units without addresses when looking for inlined functions
When a unit doesn't cover any addresses skip it because no actual code
will be inside. Also use skip_DIE instead of read_DIE when not parsing
(skipping) children.
Andreas Arnez [Fri, 17 Sep 2021 16:48:12 +0000 (18:48 +0200)]
s390x: Fix 64-bit shift in s390_irgen_VSTRS
The function s390_irgen_VSTRS in guest_s390_toIR.c contains a shift
operation that is intended to yield a 64-bit number but uses 1UL instead
of 1ULL. This doesn't work on systems where 'unsigned long' is only 32
bits wide. Fix by replacing 1UL by 1ULL.
Carl Love [Fri, 10 Sep 2021 22:15:57 +0000 (17:15 -0500)]
Remove deprecated regression tests for mftgpr and mffgpr.
The mftgpr and mffgpr instructions are deprecated. Added comments in
VEX/priv/guest_ppc_toIR.c for the instructions stating the instructions
are deprecated. Valgrind support can be removed if the opcodes get reused
in the future. For now, leaving the functional support in Valgrind
for the instructions.
Removed the regression test power6_mf_gpr.c, expect files and vgtest file
from none/tests/ppc64.
Carl Love [Fri, 3 Sep 2021 17:14:50 +0000 (17:14 +0000)]
Fix impossible constraint issue in P10 testcase.
This reworks the modulo operation as seen in
valgrind/none/tests/ppc64/test_isa_3_1_common.c:
initialize_source_registers().
Due to a GCC issue (PR101882), we will try to avoid a modulo operation with
both input and outputs set to a hard register. In this case, we can apply
the modulo operation to the args[] array value used to initialize the ra
value.
Andreas Arnez [Tue, 18 May 2021 17:59:32 +0000 (19:59 +0200)]
s390x: Wrap up misc-insn-3 and vec-enh-2 support
Wrap up support for the miscellaneous-instruction-extensions facility 3
and the vector-enhancements facility 2: Add 'case' statements for the
remaining unhandled arch13 instructions to 'guest_s390_toIR.c', document
the new support in 's390-opcodes.csv', adjust 's390-check-opcodes.pl', and
announce the new feature in 'NEWS'.
Andreas Arnez [Mon, 17 May 2021 13:34:15 +0000 (15:34 +0200)]
s390x: Vec-enh-2, test cases
Add test cases for verifying the new/enhanced instructions in the
vector-enhancements facility 2. For "vector string search" VSTRS add a
memcheck test case.
Andreas Arnez [Tue, 16 Feb 2021 16:52:09 +0000 (17:52 +0100)]
s390x: Mark arch13 features as supported
Make the STFLE instruction report the miscellaneous-instruction-extensions
facility 3 and the vector-enhancements facility 2 as supported. Indicate
support for the latter in the HWCAP vector as well.
Andreas Arnez [Wed, 10 Mar 2021 18:22:51 +0000 (19:22 +0100)]
s390x: Vec-enh-2, VSTRS
Support the new "vector string search" instruction VSTRS. The
implementation is a full emulation and follows a similar approach as for
the other vector string instructions.
Andreas Arnez [Tue, 16 Feb 2021 15:19:31 +0000 (16:19 +0100)]
s390x: Vec-enh-2, VLBR and friends
Add support for the new byte- and element-swapping vector load/store
instructions VLEBRH, VLEBRG, VLEBRF, VLLEBRZ, VLBRREP, VLBR, VLER,
VSTEBRH, VSTEBRG, VSTEBRF, VSTBR, and VSTER.
Andreas Arnez [Thu, 11 Feb 2021 19:02:03 +0000 (20:02 +0100)]
s390x: Vec-enh-2, extend VCDG, VCDLG, VCGD, and VCLGD
The vector-enhancements facility 2 extends the vector floating-point
conversion instructions VCDG, VCDLG, VCGD, and VCLGD. In addition to
64-bit elements, they now also handle 32-bit elements. Add support for
these new forms.
Andreas Arnez [Wed, 7 Apr 2021 10:29:32 +0000 (12:29 +0200)]
s390x: Vec-enh-2, extend VSL, VSRA, and VSRL
The vector-enhancements facility 2 extends the existing bitwise vector
shift instructions VSL, VSRA, and VSRL. Now they allow the shift
vector (the third operand) to contain different shift amounts for each
byte. Add support for these new forms.
Add support for the instructions NCRK, NCGRK, NNRK, NNGRK, NORK, NOGRK,
NXRK, NXGRK, OCRK, and OCGRK. Introduce a common helper and use it for
the existing instructions NRK, NGRK, XRK, XGRK, ORK, and OGRK as well.
Mark Wielaard [Fri, 6 Aug 2021 17:08:17 +0000 (19:08 +0200)]
unhandled ppc64le-linux syscall: 252 (statfs64) and 253 (fstatfs64)
glibc 2.34 consolidated all statfs implementations. All other arches
that have statfs64/fstat64 (including ppc32) already had that syscall
hooked up, it was just ppc64 that was missing it.
Mark Wielaard [Wed, 21 Jul 2021 17:53:13 +0000 (19:53 +0200)]
Generate a ENOSYS (sys_ni_syscall) for clone3 on all linux arches
glibc 2.34 will try to use clone3 first before falling back to
the clone syscall. So implement clone3 as sys_ni_syscall which
simply return ENOSYS without producing a warning.
Mark Wielaard [Fri, 16 Jul 2021 19:47:08 +0000 (15:47 -0400)]
Update helgrind and drd suppression libc and libpthread paths in glibc 2.34
glibc 2.34 moved all pthread functions into the main libc library.
And it changed the (in memory) path of the main libc library to
libc.so.6 (before it was libc-2.xx.so).
This breaks various standard suppressions for helgrind and drd.
Fix this by doing a configure check for whether we are using glibc
2.34 by checking whether pthread_create is in libc instead of in
libpthread. If we are using glibc then define GLIBC_LIBC_PATH and
GLIBC_LIBPTHREAD_PATH variables that point to the (regexp) path
of the library that contains all libc functions and pthread functions
(which will be the same path for glibc 2.34+).
Rename glibc-2.34567-NPTL-helgrind.supp to glibc-2.X-helgrind.supp.in
and glibc-2.X-drd.supp to glibc-2.X-drd.supp.in and replace the
GLIBC_LIBC_PATH and GLIBC_LIBPTHREAD_PATH at configure time.
The same could be done for the glibc-2.X.supp.in file, but hasn't
yet because it looks like most suppressions in that file are obsolete.
Mark Wielaard [Fri, 16 Jul 2021 19:37:21 +0000 (21:37 +0200)]
gdbserver_tests: update filters for newer glibc/gdb
With newer glibc/gdb we might see a __select call without anything
following on the line. Also when gdb cannot find a file it might
now print "Inappropriate ioctl for device" instead of the message
"No such file or directory"
amd64 front end: Make uses of 8- and 16-bit GPRs GET the entire containing register.
Until now, a read of a 32-bit GPR (eg, %ecx) in the amd64 front end actually
involved GETting the containing 64-bit reg (%rcx) and dropping off its top
32-bits, in the IR translation. This makes IR optimisation work well for code
that mixes 32 and 64 bit integer operations, which is very commont. In
particular it helps guarantee that PUT-to-GET and redundant-GET optimisations
work, hence that constant propagation/folding across such boundaries works,
and indirectly helps to avoid generating code in the back end that suffers
from store-forwarding or partial-register-read stalls.
This commit partially extends those advantages to 8- and 16-bit GPR reads. In
particular, all 16-bit GPR fetches are now a GET of the whole 64-bit register
followed by an Iop_64to16 cast. The same scheme is used for 8-bit register
fetches, except for the "anomalous four" (%ah, %bh, %ch, %dh), whose handling
is left unchanged.
With this in place, now, a wider write followed by a smaller read, will play
nice with constant folding, propagation, for example (somewhat artificially):
movl $17, %ecx // 32-bit write of %rcx
shrl %cl, %r15 // 8-bit read of %rcx
The 17 will be propagated, in IR, up to the shift.
The commit also adds a couple more rewrite rules in ir_opt.c to remove some of
the resulting pointless conversion pairings.
Consistently set CC_NDEP when setting the flags thunk.
For most settings of the flags thunk (guest_CC_{OP,DEP1,DEP2,NDEP}), the value
of the NDEP field is irrelevant, because of the setting of the OP field, and
so it is usually not set in such cases, which are the vast majority. This
saves a store (a PUT) in the final generated code. But it has the bad effect
that the IR optimiser cannot know that preceding PUTs to the field are
possibly dead and can be removed. Most of the time that is not important, but
just occasionally it can cause a lot of pointless extra computation (calling
of amd64g_calculate_rflags_all) to happen. This was observed in a long basic
block involved in a hash calculation, like this:
rolq .. // sets CC_NDEP to the previous value of the flags,
// as calculated by amd64g_calculate_rflags_all
mulq ..
(rolq/mulq repeated several times)
addq .. // effect is, all of the flag computation done for the rol/mul
// sequence is irrelevant, but iropt can't see that
Setting CC_NDEP consistently to zero, even if it isn't needed, avoids the
problem.
amd64 front end: more spec rules: S/NS after LOGICW, S after SHRL, Z after SHRW, C after SUBW.
This adds a few more spec rules that seem useful for running Firefox built
with gcc-O3 and clang-O3. At least one of them removes a false Memcheck
error.
There is also some improved debug printing, currently #if 0'd.
Remove redundant assertions and conditionals in move_CEnt_to_top.
move_CEnt_to_top is on the hot path when reading large amounts of debug info,
especially Dwarf inlined-function info. It shows up in 'perf' profiles. This
commit removes assertions which are asserted elsewhere, and tries to avoid a
couple of conditional branches.
Reimplement h_generic_calc_GetMSBs8x16 to be more efficient.
h_generic_calc_GetMSBs8x16 concatenates the top bit of each 8-bit lane in a
128-bit value, producing a 16-bit scalar value. (It is PMOVMSKB, really).
The existing implementation is excessively inefficient and shows up sometimes
in 'perf' profiles of V. This commit replaces it with a logarithmic (4-stage)
algorithm which is hopefully much faster.
Carl Love [Fri, 11 Jun 2021 15:59:53 +0000 (10:59 -0500)]
PPC64 Enable the MMA bit in the HWCAP.
The MMA bit should have been enabled when the last of the MMA instructions
were committed. Also, the header comments about filtering out the DARN
and SCV support should have been updated when DARN and SCV support was added.
Andreas Arnez [Mon, 7 Jun 2021 12:01:53 +0000 (14:01 +0200)]
s390x: Don't emit "vector or with complement" on z13
The z/Architecture instruction "vector or with complement" (VOC) can be
used as an optimization to combine "vector or" with "vector nor". This is
exploited in Valgrind since commit 6c1cb1a0128b00858b973e. However, VOC
requires the vector-enhancements facility 1, which is not installed on a
z13 CPU. Thus Valgrind can now run into SIGILL on z13 when trying to
execute vector string instructions.
Fix this by suppressing the VOC optimization unless the
vector-enhancements facility 1 is recognized on the host.
Andreas Arnez [Tue, 30 Mar 2021 15:45:20 +0000 (17:45 +0200)]
s390x: Fix/optimize Iop_64HLtoV128
In s390_vr_fill() in guest_s390_toIR.c, filling a vector with two copies
of a 64-bit value is realized with Iop_64HLtoV128, since there is no such
operator as Iop_Dup64x2. But the two args to Iop_64HLtoV128 use the same
expression, referenced twice. Although this hasn't been seen to cause
real trouble yet, it's problematic and potentially inefficient, so change
it: Assign to a temp and pass that twice instead.
In the instruction selector, if Iop_64HLtoV128 is found to be used for a
duplication as above, select "v-vdup" instead of "v-vinitfromgprs". This
mimicks the behavior we'd get if there actually was an operator
Iop_Dup64x2.
Andreas Arnez [Fri, 26 Mar 2021 18:27:47 +0000 (19:27 +0100)]
s390x: Rework insn "v-vdup" and add "v-vrep"
So far the only s390x insn for filling a vector with copies of the same
element is "v-vdup" (S390_VEC_DUPLICATE), which replicates the first
element of its vector argument. This is fairly restrictive and can lead
to unnecessarily long code sequences.
Redefine "v-vdup" to replicate any scalar value instead. And add
"v-vrep" (S390_INSN_VEC_REPLICATE) for replicating any given element of a
vector. Select the latter for suitable expressions like
Iop_Dup8x16(Iop_GetElem8x16(vector_expr, i))
This improves the generated code for some vector string instructions,
where a lot of element replications are performed.
Andreas Arnez [Tue, 23 Mar 2021 13:55:09 +0000 (14:55 +0100)]
s390x: Improve handling of amodes without base register
Addressing modes without a base or index register represent constants.
They can occur in some special cases such as shift operations and when
accessing individual vector elements. Perform some minor improvements to
the handling of such amodes.
Andreas Arnez [Fri, 16 Apr 2021 10:44:44 +0000 (12:44 +0200)]
Bug 434296 - s390x: Add memcheck test cases for vector string insns
Bug 434296 addresses memcheck false positives with the vector string
instructions VISTR, VSTRC, VFAE, VFEE, and VFENE. Add test cases that
verify the fix for that bug. Without the fix, memcheck yields many
complains with these tests, most of which are false positives.
Andreas Arnez [Tue, 27 Apr 2021 18:13:26 +0000 (20:13 +0200)]
Bug 434296 - s390x: Rework IR conversion of VISTR
The z/Architecture instruction VISTR is currently transformed to a dirty
helper that executes the instruction. This can cause false positives with
memcheck if the input string contains undefined characters after the
string terminator. Implement without a dirty helper and emulate the
instruction instead.
Andreas Arnez [Tue, 2 Mar 2021 13:12:29 +0000 (14:12 +0100)]
Bug 434296 - s390x: Rework IR conversion of VFENE
So far the z/Architecture instruction "vector find element not
equal" (VFENE) is transformed to a loop. This can cause spurious
"conditional jump or move depends on uninitialised value(s)" messages by
memcheck. Re-implement without a loop.
Andreas Arnez [Thu, 18 Mar 2021 17:01:10 +0000 (18:01 +0100)]
Bug 434296 - s390x: Rework IR conversion of VSTRC, VFAE, and VFEE
The z/Architecture instructions "vector string range compare" (VSTRC),
"vector find any element equal" (VFAE), and "vector find element
equal" (VFEE) are each implemented with a dirty helper that executes the
instruction. Unfortunately this approach leads to memcheck false
positives, because these instructions may yield a defined result even if
parts of the input vectors are undefined. There are multiple ways this
can happen: Wherever the flags in the fourth operand to VSTRC indicate
"match always" or "match never", the corresponding elements in the third
operand don't affect the result. The same is true for the elements
following the first zero-element in the second operand if the ZS flag is
set, or for the elements following the first matching element, if any.
Re-implement the instructions without dirty helpers and transform into
lengthy IR instead.
Andreas Arnez [Wed, 7 Apr 2021 14:48:29 +0000 (16:48 +0200)]
s390x: Support "expensive" comparisons Iop_ExpCmpNE32/64
Add support for Iop_ExpCmpNE32 and Iop_ExpCmpNE64 in the s390x instruction
selector. Handle them exactly like the "inexpensive" variants Iop_CmpNE32
and Iop_CmpNE64.
Andreas Arnez [Wed, 28 Apr 2021 16:52:30 +0000 (18:52 +0200)]
Bug 433863 - s390x: Remove memcheck test cases for cs, cds, and csg
The fix for bug 429864 - "s390x: C++ atomic test_and_set yields
false-positive memcheck diagnostics" changes the memcheck behavior at
various compare-and-swap instructions. The comparison between the old and
expected value now always yields a defined result, even if the input
values are (partially) undefined. However, some existing test cases
explicitly verify that memcheck complains about the use of uninitialised
values here. These test cases are no longer valid. Remove them.
Andreas Arnez [Tue, 30 Mar 2021 16:10:43 +0000 (18:10 +0200)]
s390x: Add missing UNOP insns to s390_insn_as_string
Some unary operator insns are not handled by s390_insn_as_string(). If
they are encountered while the appropriate trace flag is set, a vpanic
occurs. Fix this: add handling for the missing insns.
Yi Fan Yu [Thu, 1 Apr 2021 19:31:47 +0000 (15:31 -0400)]
drd/tests/swapcontext: Add SIGALRM handler to avoid stacktrace
During testing for oe-core build on QEMU,
SIGALRM can trigger during nanosleep.
This results a different stderr output than expected.
```
==277== Process terminating with default action of signal 14 (SIGALRM)
==277== at 0x36C74C3943: clock_nanosleep@@GLIBC_2.17 (clock_nanosleep.c:43)
==277== by 0x36C74C8726: nanosleep (nanosleep.c:25)
```
This stacktrace printing will not occur
if we add a handler that simply exits.
To improve its results, Callgrind does special handling for
the runtime linker entry point to resolve symbols. However,
it only used the exact symbol name "_dl_runtime_resolve",
as well as specific machine code templates (when the runtime
linker was stripped from symbol names) as basis.
Recent glibc added multiple similar symbol names as variants,
such as _dl_runtime_resolve_xsave.
The above-mentioned commit 86277041 solves this by extending
the check for machine code templates for specific Linux
distributions.
This patch extends this for more architectures and variants
by checking if a function starts with "_dl_runtime_resolve".
Furthermore, the original function names of the variants
still are visible in the output (and not forced to the prefix).
While the heuristic that every function symbol starting
with the prefix "_dl_runtime_resolve" as being an entry point
into the runtime linker for resolving a function address may
be a bit rough, this prefix is not expected to be used often in
other source code for anything else.
The worst case is a slightly misleading call graph only
visible in a very specific situation: if the wrongly-detected
function does a tail call (ie instead of returning, jumping
to another function), it will be shown as 2 calls in a row
from the original caller.