This is one part of the fix for PR109128, along with a corresponding
binutils's linker change. Without this patch, what happens in the
linker, when an unused object in a .a file has offload data, is that
elf_link_is_defined_archive_symbol calls bfd_link_plugin_object_p,
which ends up calling the plugin's claim_file_handler, which then
records the object as one with offload data. That is, the linker never
decides to use the object in the first place, but use of this _p
interface (called as part of trying to decide whether to use the
object) results in the plugin deciding to use its offload data (and a
consequent mismatch in the offload data present at runtime).
The new hook allows the linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid recording the object as one
with offload data.
Thomas Schwinge [Wed, 10 May 2023 07:17:47 +0000 (09:17 +0200)]
Testsuite: Add 'torture-init-done', and use it to conditionalize implicit 'torture-init'
Recent commit d6654a4be3ba44c0d57be7c8a51d76d9721345e1
"Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'"
made 'torture-init' non-idempotent re 'LTO_TORTURE_OPTIONS', in order to catch
certain classes of errors. Now, most of all '*.exp' files have 'torture-init'
followed by 'set-torture-options' before 'gcc-dg-runtest' etc., and therefore
don't run into the latter's
"Some callers set torture options themselves; don't override those." code.
Some '*.exp' files however do 'torture-init' but not 'set-torture-options', and
therefore we can't any longer conditionalize the implicit 'torture-init' by
'![torture-options-exist]'.
gcc/testsuite/
* lib/torture-options.exp (torture-init-done): Add.
* lib/gcc-dg.exp (gcc-dg-runtest): Use it to conditionalize
implicit 'torture-init'.
* lib/gfortran-dg.exp (gfortran-dg-runtest): Likewise.
* lib/obj-c++-dg.exp (obj-c++-dg-runtest): Likewise.
* lib/objc-dg.exp (objc-dg-runtest): Likewise.
Thomas Schwinge [Tue, 9 May 2023 08:35:27 +0000 (10:35 +0200)]
Testsuite: Add missing 'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage
Recent commit d6654a4be3ba44c0d57be7c8a51d76d9721345e1
"Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'"
made it a requirement that 'LTO_TORTURE_OPTIONS' usage be within
'torture-init'/'torture-finish', and missed a few cases that didn't have that.
Roger Sayle [Thu, 11 May 2023 07:15:21 +0000 (08:15 +0100)]
match.pd: Simplify popcount(X&Y)+popcount(X|Y) as popcount(X)+popcount(Y)
This patch teaches match.pd to simplify popcount(X&Y)+popcount(X|Y) as
popcount(X)+popcount(Y), and the related simplifications that
popcount(X)+popcount(Y)-popcount(X&Y) is popcount(X|Y). As surprising
as it might seem, this idiom is common in cheminformatics codes
(for Tanimoto coefficient calculations).
2023-05-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd <popcount optimizations>: Simplify popcount(X|Y) +
popcount(X&Y) as popcount(X)+popcount(Y). Likewise, simplify
popcount(X)+popcount(Y)-popcount(X&Y) as popcount(X|Y), and
vice versa.
gcc/testsuite/ChangeLog
* gcc.dg/fold-popcount-8.c: New test case.
* gcc.dg/fold-popcount-9.c: Likewise.
* gcc.dg/fold-popcount-10.c: Likewise.
Roger Sayle [Thu, 11 May 2023 07:10:04 +0000 (08:10 +0100)]
match.pd: Simplify popcount/parity of bswap/rotate.
This is the latest iteration of my patch from August 2020
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552391.html
incorperating feedback and suggestions from reviewers.
This patch to match.pd optimizes away bit permutation operations,
specifically bswap and rotate, in calls to popcount and parity.
2023-05-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd <popcount optimizations>: Simplify popcount(bswap(x))
as popcount(x). Simplify popcount(rotate(x,y)) as popcount(x).
<parity optimizations>: Simplify parity(bswap(x)) as parity(x).
Simplify parity(rotate(x,y)) as parity(x).
This is because LoopVectorizer assume target is able to handle
series const vector when we enable binary operations.
Then it will be easily causing ICE like that.
gcc/ChangeLog:
* config/riscv/autovec.md (@vec_series<mode>): New pattern
* config/riscv/riscv-protos.h (expand_vec_series): New function.
* config/riscv/riscv-v.cc (emit_binop): Ditto.
(emit_index_op): Ditto.
(expand_vec_series): Ditto.
(expand_const_vector): Add series vector handling.
* config/riscv/riscv.cc (riscv_const_insns): Enable series vector for testing.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/series-1.c: New test.
* gcc.target/riscv/rvv/autovec/series_run-1.c: New test.
std::add_rvalue_reference is defined as "If T is a function type that
has no cv- or ref- qualifier or an object type, provides a member typedef
type which is T&&, otherwise type is T."
In our case, T is cv-qualified, so the result is T, so we end up with
int () const declval() noexcept;
which is invalid. In other words, this is pretty much like:
using T = int () const;
T fn1(); // bad, fn returning a fn
T& fn2(); // bad, cannot declare reference to qualified function type
T* fn3(); // bad, cannot declare pointer to qualified function type
using U = int ();
U fn4(); // bad, fn returning a fn
U& fn5(); // OK
U* fn6(); // OK
I think is_convertible_helper needs to simulate std::declval better.
To that end, I'm introducing build_trait_object, to be used where
a declval is needed.
PR c++/109680
gcc/cp/ChangeLog:
* method.cc (build_trait_object): New.
(assignable_expr): Use it.
(ref_xes_from_temporary): Likewise.
(is_convertible_helper): Likewise. Check FUNC_OR_METHOD_TYPE_P.
Jason Merrill [Thu, 23 Mar 2023 14:41:29 +0000 (10:41 -0400)]
c++: adjust conversion diagnostics
While looking at PR109247 I made this change to improve diagnostics. I
don't think I'm going ahead with that patch, but this still seems like a
worthy cleanup.
gcc/cp/ChangeLog:
* call.cc (convert_like_internal): Share ck_ref_bind handling
between all bad conversions.
PR fortran/109624
* dump-parse-tree.cc (debug): New function for gfc_namespace.
(gfc_debug_code): Delete forward declaration.
(show_attr): Make sure to print balanced braces.
Andrew Pinski [Wed, 10 May 2023 17:56:15 +0000 (17:56 +0000)]
Add another new testcase
While working on improving min/max detection, this
code (which is reduced from worse_state in ipa-pure-const.cc)
was being miscompiled. Since there was no testcase in the
testsuite yet for this, this patch adds one.
Committed as obvious after testing the testcase via:
make check-gcc RUNTESTFLAGS="execute.exp=20230510-1.c"
François Dumont [Wed, 3 May 2023 04:45:47 +0000 (06:45 +0200)]
libstdc++: [_Hashtable] Implement several small methods implicitly inline
Make implementation of 3 simple _Hashtable methods implicitly inline.
Avoid usage of const_iterator abstraction within _Hashtable implementation.
Replace several usages of __node_type* with expected __node_ptr.
libstdc++-v3/ChangeLog:
* include/bits/hashtable_policy.h
(_NodeBuilder<>::_S_build): Use __node_ptr.
(_ReuseOrAllocNode<>): Use __node_ptr in place of __node_type*.
(_AllocNode<>): Likewise.
(_Equality<>::_M_equal): Remove const_iterator usages. Only preserved
to call std::is_permutation in the non-unique key implementation.
* include/bits/hashtable.h (_Hashtable<>::_M_update_begin()): Capture
_M_begin() once.
(_Hashtable<>::_M_bucket_begin(size_type)): Implement implicitly inline.
(_Hashtable<>::_M_insert_bucket_begin): Likewise.
(_Hashtable<>::_M_remove_bucket_begin): Likewise.
(_Hashtable<>::_M_compute_hash_code): Use __node_ptr rather than
const_iterator.
(_Hashtable<>::find): Likewise.
(_Hashtable<>::_M_emplace): Likewise.
(_Hashtable<>::_M_insert_unique): Likewise.
Jason Merrill [Mon, 6 Feb 2023 23:08:17 +0000 (15:08 -0800)]
c++: be stricter about constinit [CWG2543]
DR 2543 clarifies that constinit variables should follow the language, and
diagnose non-constant initializers (according to [expr.const]) even if they
can actually initialize the variables statically.
Jason Merrill [Tue, 9 May 2023 22:48:11 +0000 (18:48 -0400)]
c++: always check consteval address
The restriction on the "permitted result of a constant expression" to not
refer to an immediate function applies regardless of context. The previous
code tried to only check in cases where we wouldn't get the check in
cp_fold_r, but with the next patch I would need to add another case and it
shouldn't be a problem to always check.
We also shouldn't talk about immediate evaluation when we aren't dealing
with one.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_outermost_constant_expr): Always check
for address of immediate fn.
(maybe_constant_init_1): Evaluate PTRMEM_CST.
Jeff Law [Wed, 10 May 2023 11:25:12 +0000 (05:25 -0600)]
Fix a couple constraints on the H8 in preparation for LRA conversion
So this is the 2nd patch on the way to LRA for the H8.
LRA is more sensitive to getting define_constraint vs define_memory_constraint
vs define_special_memory_constraint correct. than reload.
The H8 port has the "Q" constraint, which is used to indicate memory addresses
that can be used under certain circumstances in various ALU operations. So it
should be a memory constraint. Ideally it'd would be a simple memory
constraint, but it's used in contexts where MEMs are valid only for certain
parts in the H8 family. So it really needs to be a special_memory_constraint.
The "Zz" constraint accepts memory, but the forms are limited and can not be
reloaded into a register. It seems to be working, but I wouldn't be totally
surprised if this got stressed in the right way if it broke.
Anyway, this patch fixes "Q" and "Zz" to be special memory constraints.
Regression tested with gdbsim and pushed to the trunk.
gcc
* config/h8300/constraints.md (Q): Make this a special memory
constraint.
(Zz): Similarly.
Jakub Jelinek [Wed, 10 May 2023 11:20:39 +0000 (13:20 +0200)]
ipa-prop: Fix ipa_get_callee_param_type for calls with argument type mismatches
The PR contains a testcase where the Fortran FE creates FUNCTION_TYPE
which doesn't really match the passed in arguments (FUNCTION_TYPE has
5 arguments, call has 6). Now, I think that is a Fortran FE bug that
should be fixed there, but I think with function pointers one can
create something similar (of course invalid) in C/C++ too,so IMHO IPA
should be also more careful.
The ipa_get_callee_param_type function can return NULL if something goes
wrong and it does e.g. if asked for 7th argument type on a function
with just 5 arguments and similar. But, if a function isn't varargs,
when asked for 6th argument type on a function with just 5 arguments
it actually returns void_type_node because the argument list is in that
case terminated with void_list_node.
The following patch makes sure we don't treat void_list_node as something
holding another argument.
2023-05-10 Jakub Jelinek <jakub@redhat.com>
PR fortran/109788
* ipa-prop.cc (ipa_get_callee_param_type): Don't return TREE_VALUE (t)
if t is void_list_node.
Kyrylo Tkachov [Wed, 10 May 2023 11:04:47 +0000 (12:04 +0100)]
aarch64: Simplify sqmovun expander
This patch is a no-op as it removes the explicit vec-concat-zero patterns in favour of vczle/vczbe.
This allows us to delete the explicit expander too. Tests are added to ensure the optimisation required
still triggers.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Kyrylo Tkachov [Wed, 10 May 2023 11:00:17 +0000 (12:00 +0100)]
[PATCH] aarch64: PR target/99195 annotate simple permutation patterns for vec-concat-zero
Another straightforward patch annotating patterns for the zip1, zip2, uzp1, uzp2, rev* instructions, plus tests.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Kyrylo Tkachov [Wed, 10 May 2023 10:50:01 +0000 (11:50 +0100)]
aarch64: PR target/99195 annotate simple saturating add/sub patterns for vec-concat-zero
Moving onto the saturating instructions, this one goes through the simple add/sub ones.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Kyrylo Tkachov [Wed, 10 May 2023 09:44:30 +0000 (10:44 +0100)]
aarch64: Simplify QSHRN expanders and patterns
This patch deletes the explicit BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN patterns for the QSHRN instructions in favour
of annotating a single one with <vczle><vczbe>. This allows simplification of the expander too.
Tests are added to ensure that we still optimise away the concat-with-zero use case.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Kyrylo Tkachov [Wed, 10 May 2023 09:40:06 +0000 (10:40 +0100)]
aarch64: PR target/99195 annotate simple narrowing patterns for vec-concat-zero
This patch cleans up some almost-duplicate patterns for the XTN, SQXTN, UQXTN instructions.
Using the <vczle><vczbe> attributes we can remove the BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN cases,
as well as the intrinsic expanders that select between the two.
Tests are also added. Thankfully the diffstat comes out negative \O/.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_xtn<mode>_insn_le): Delete.
(aarch64_xtn<mode>_insn_be): Likewise.
(trunc<mode><Vnarrowq>2): Rename to...
(trunc<mode><Vnarrowq>2<vczle><vczbe>): ... This.
(aarch64_xtn<mode>): Move under the above. Just emit the truncate RTL.
(aarch64_<su>qmovn<mode>): Likewise.
(aarch64_<su>qmovn<mode><vczle><vczbe>): New define_insn.
(aarch64_<su>qmovn<mode>_insn_le): Delete.
(aarch64_<su>qmovn<mode>_insn_be): Likewise.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_4.c: Add tests for vmovn, vqmovn.
Jakub Jelinek [Wed, 10 May 2023 09:37:04 +0000 (11:37 +0200)]
c++: Reject attributes without arguments used as pack expansion [PR109756]
The following testcase shows we silently accept (and ignore) attributes without
arguments used as pack expansions. This is because we call
make_pack_expansion and that starts with
if (!arg || arg == error_mark_node)
return arg;
Now, an attribute without arguments like [[noreturn...]] is IMHO always
invalid, in this case for 2 reasons; one is that as it has no arguments,
no pack can be present and second is that the standard says that
attributes need to specially permit uses of parameter pack and doesn't
explicitly permit it for any of the standard attributes (except for alignas?
which has different syntax).
If an attribute has some arguments but doesn't contain packs in those
arguments, make_pack_expansion will already diagnose it.
The patch also changes cp_parser_std_attribute, such that for attributes unknown
to the compiler (or perhaps registered just for -Wno-attributes=) we differentiate
between the attribute having no arguments (in that case we want to diagnose them
when followed by ellipsis even if they are unknown, as they can't contain a pack
in that case) and the case where they do have arguments but we've just skipped over
those arguments because we don't know how to parse them (except that they are
a balanced token sequence) - in that case we really don't know if they contain
packs or not.
2023-05-10 Jakub Jelinek <jakub@redhat.com>
PR c++/109756
* parser.cc (cp_parser_std_attribute): For unknown attributes with
arguments set TREE_VALUE (attribute) to error_mark_node after skipping
the balanced tokens.
(cp_parser_std_attribute_list): If ... is used after attribute without
arguments, diagnose it and return error_mark_node. If
TREE_VALUE (attribute) is error_mark_node, don't call
make_pack_expansion nor return early error_mark_node.
Li Xu [Wed, 10 May 2023 04:02:13 +0000 (04:02 +0000)]
RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s instructions satisfying REG_P(operand[1]) in -O0.
This issue happens is because the operand1 of scalar move can be
REG_P (operand[1]) in the O0 case, which causes the VSETVL PASS to
not insert the vsetvl instruction correctly, and the compiler crashes.
Juzhe-Zhong [Tue, 9 May 2023 12:05:50 +0000 (20:05 +0800)]
RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
This incorrect codes blocks the scalable RVV auto-vectorization.
Take a look at this target hook implementation of aarch64.
They only have the similiar handling on TARGET_SIMD.
They let movmisalign<mode> to handle scalable vector of SVE.
For RVV, we should follow the same implementation of ARM SVE.
Typo spotted while doing CCmode improvements, as a missed
optimization. It's almost visible from the patch context;
there's not much done in terms of "mode-adjustment" when
replacing (reg:CC CRIS_CC0_REGNUM) with a copy!
This bug affects functions in the newlib printf-formatting
functions (nothing else in libgcc or newlib libc), with the
performance impact on coremark scores being less than 1e-6
(3/5078992 cycles, 6/48543 bytes).
* config/cris/cris.cc (cris_postdbr_cmpelim): Correct mode
of modeadjusted_dccr.
Jonathan Wakely [Tue, 9 May 2023 17:18:01 +0000 (18:18 +0100)]
libstdc++: Fix <chrono> pretty printers and add tests
This fixes a couple of errors in the printers for chrono types, and adds
tests to ensure they keep working.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (StdChronoDurationPrinter):
Print floating-point durations correctly.
(StdChronoTimePointPrinter): Support printing only the value,
not the type name. Uncomment handling for known clocks.
(StdChronoZonedTimePrinter): Remove type names from output.
(StdChronoCalendarPrinter): Fix hh_mm_ss member access.
(StdChronoTimeZonePrinter): Add equals sign to output.
* testsuite/libstdc++-prettyprinters/chrono.cc: New test.
Patrick Palka [Tue, 9 May 2023 19:07:00 +0000 (15:07 -0400)]
c++: error-recovery ICE with unstable satisfaction [PR109752]
After diagnosing and recovering from unstable satisfaction, it's
possible to evaluate an atom for the first time noisily rather than
quietly. The satisfaction cache tries to handle this situation
gracefully, but apparently not gracefully enough: we inserted an empty
slot into the cache, and left it empty, which later makes
hash_table::check_complete_insertion unhappy. This patch fixes this by
removing the empty slot in this case.
PR c++/109752
gcc/cp/ChangeLog:
* constraint.cc (satisfaction_cache::satisfaction_cache): In the
unexpected case of evaluating an atom for the first time noisily,
remove the cache slot that we inserted.
Patrick Palka [Tue, 9 May 2023 19:06:34 +0000 (15:06 -0400)]
c++: noexcept-spec from nested class confusion [PR109761]
When late processing a noexcept-spec from a nested class after completion
of the outer class (since it's a complete-class context), we pass the wrong
class context to noexcept_override_late_checks -- the outer class type
instead of the nested class type -- which leads to bogus errors in the
below test.
This patch fixes this by making noexcept_override_late_checks obtain the
class context directly via DECL_CONTEXT instead of via an additional
parameter.
PR c++/109761
gcc/cp/ChangeLog:
* parser.cc (cp_parser_class_specifier): Don't pass a class
context to noexcept_override_late_checks.
(noexcept_override_late_checks): Remove 'type' parameter
and use DECL_CONTEXT of 'fndecl' instead.
Christophe Lyon [Mon, 13 Feb 2023 21:05:37 +0000 (21:05 +0000)]
arm: [MVE intrinsics] add support for mve_q_p_f
We can call code_for_mve_q_p_f only once this function exists, which
is the case after we factorized vmaxnmavq, vmaxnmvq, vminnmavq and
vminnmvq in a previous patch.
aarch64: Improve register allocation for lane instructions
REG_ALLOC_ORDER is much less important than it used to be, but it
is still used as a tie-breaker when multiple registers in a class
are equally good.
Previously aarch64 used the default approach of allocating in order
of increasing register number. But as the comment in the patch says,
it's better to allocate FP and predicate registers in the opposite
order, so that we don't eat into smaller register classes unnecessarily.
This fixes some existing FIXMEs and improves the register allocation
for some Arm ACLE code.
Doing this also showed that *vcond_mask_<mode><vpred> (predicated MOV/SEL)
unnecessarily required p0-p7 rather than p0-p15 for the unpredicated
movprfx alternatives. Only the predicated movprfx alternative requires
p0-p7 (due to the movprfx itself, rather than due to the main instruction).
gcc/
* config/aarch64/aarch64-protos.h (aarch64_adjust_reg_alloc_order):
Declare.
* config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
(ADJUST_REG_ALLOC_ORDER): Likewise.
* config/aarch64/aarch64.cc (aarch64_adjust_reg_alloc_order): New
function.
* config/aarch64/aarch64-sve.md (*vcond_mask_<mode><vpred>): Use
Upa rather than Upl for unpredicated movprfx alternatives.
aarch64: Fix cut-&-pasto in aarch64-sve2-acle-asm.exp
aarch64-sve2-acle-asm.exp tried to prevent --with-cpu/tune
from affecting the results, but it used sve_flags rather than
sve2_flags. This was a silent failure when running the full
testsuite, but was a fatal error when running the harness
individually.
gcc/testsuite/
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Use
sve2_flags instead of sve_flags.
Gaius Mulley [Tue, 9 May 2023 17:17:13 +0000 (18:17 +0100)]
PR modula2/109779 isolib SkipLine skips the first character of the successive line
This is a patch for the m2iso library to prevent SkipLine from consuming
the next character on the next line.
gcc/m2/ChangeLog:
PR modula2/109779
* gm2-libs-iso/RTgen.mod (doLook): Remove old.
Remove re-assignment of result.
* gm2-libs-iso/TextIO.mod (CanRead): Rename into ...
(CharAvailable): ... this.
(DumpState): New procedure.
(SetResult): Rename as SetNul.
(WasGoodChar): Rename into ...
(EofOrEoln): ... this.
(SkipLine): Skip over the newline.
(ReadString): Flip THEN ELSE statements after testing for
EofOrEoln.
(ReadRestLine): Flip THEN ELSE statements after testing for
EofOrEoln.
gcc/testsuite/ChangeLog:
PR modula2/109779
* gm2/isolib/run/pass/skiplinetest.mod: New test.
Jakub Jelinek [Tue, 9 May 2023 14:05:22 +0000 (16:05 +0200)]
c++: Reject pack expansion of assume attribute [PR109756]
http://eel.is/c++draft/dcl.attr#grammar-4 says
"In an attribute-list, an ellipsis may appear only if that attribute's
specification permits it."
and doesn't explicitly permit it on any standard attribute.
The https://wg21.link/p1774r8 paper which introduced assume attribute says
"We could therefore hypothetically permit the assume attribute to directly
support pack expansion:
template <int... args>
void f() {
[[assume(args >= 0)...]];
}
However, we do not propose this. It would require substantial additional work
for a very rare use case. Note that this can instead be expressed with a fold
expression, which is equivalent to the above and works out of the box without
any extra effort:
template <int... args>
void f() {
[[assume(((args >= 0) && ...))]];
}
", but as the testcase shows, GCC 13+ ICEs on assume attribute followed by
... if it contains packs.
The following patch rejects those instead of ICE and for C++17 or later
suggests using fold expressions instead (it doesn't make sense to suggest
it for C++14 and earlier when we'd error on the fold expressions).
Jeff Law [Tue, 9 May 2023 13:18:45 +0000 (07:18 -0600)]
Eliminate more comparisons on the H8 port
This patch fixes a minor code quality issue I found while testing LRA on the
H8. Specifically we have a peephole which converts a comparison of a memory
location against zero into a load + comparison which is actually more
efficient. This triggers when there are registers available at the right
point during peephole2.
If the load is not a mode dependent address we can actually do better by
realizing the load itself sets the proper flags and eliminate the comparison.
I may have expected this to happen when I wrote the original peephole2,
but cmpelim runs before peephole2, so clearly if we want to eliminate the
comparison we have to do it manually.
gcc/
* config/h8300/testcompare.md: Add peephole2 which uses a memory
load to set flags, thus eliminating a compare against zero.
Thomas Schwinge [Mon, 3 Nov 2014 08:58:38 +0000 (09:58 +0100)]
libgomp testsuite: Only use 'blddir' if set
(It is unclear to me why the current working directory needs to be in
'LD_LIBRARY_PATH'; leaving that alone for now.)
libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Only use 'blddir' if
set.
* testsuite/libgomp.c++/c++.exp: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
Christophe Lyon [Thu, 9 Feb 2023 19:11:31 +0000 (19:11 +0000)]
arm: [MVE intrinsics] factorize several unary operations
Factorize vabs vcls vclz vneg vqabs vqneg vrnda vrndm vrndn vrndp vrnd
vrndx so that they use the same pattern.
This patch introduces the mve_mnemo iterator because some of the
involved intrinsics have a different name from their mnenonic: for
instance vrndq vs vrintz.
Jakub Jelinek [Tue, 9 May 2023 10:14:18 +0000 (12:14 +0200)]
testsuite: Add further testcase for already fixed PR [PR109778]
I came up with a testcase which reproduces all the way to r10-7469.
LTO to avoid early inlining it, so that ccp handles rotates and not
shifts before they are turned into rotates.
2023-05-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109778
* gcc.dg/lto/pr109778_0.c: New test.
* gcc.dg/lto/pr109778_1.c: New file.
Jakub Jelinek [Tue, 9 May 2023 10:10:07 +0000 (12:10 +0200)]
tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp [PR109778]
The following testcase is miscompiled, because bitwise ccp2 handles
a rotate with a signed type incorrectly.
Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
arguments, all other callers just rotate in the right precision and
I think work correctly. ccp works with widest_ints and so rotations
by the excessive precision certainly don't match what it wants
when it sees a rotate in some specific bitsize. Still, if it is
unsigned rotate and the widest_int is zero extended from width,
the functions perform left shift and logical right shift on the value
and then at the end zero extend the result of left shift and uselessly
also the result of logical right shift and return | of that.
On the testcase we the signed char rrotate by 4 argument is
CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2.
The mask is correctly rotated to 0x20, but because the 8-bit constant
is sign extended to 192-bit one, the logical right shift by 4 doesn't
yield expected 0xb, but gives 0xfffffffffff....ffffb, and then
return wi::zext (left, width) | wi::zext (right, width); where left is
0xfffffff....fb50, so we return 0xfb instead of the expected
0x5b.
The following patch fixes that by doing the zero extension in case of
the right variable before doing wi::lrshift rather than after it.
Also, wi::[lr]rotate widht width < precision always zero extends
the result. I'm afraid it can't do better because it doesn't know
if it is done for an unsigned or signed type, but the caller in this
case knows that very well, so I've done the extension based on sgn
in the caller. E.g. 0x5b rotated right (or left) by 4 with width 8
previously gave 0xb5, but sgn == SIGNED in widest_int it should be
0xffffffff....fffb5 instead.
2023-05-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109778
* wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
wi::zext (x, width) rather than x if width != precision, rather
than using wi::zext (right, width) after the shift.
* tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results
of wi::lrotate or wi::rrotate.
get_out_file did not follow the coding conventions (mixing three-space
and two-space indentation, missing linebreak before function name).
Take that as an excuse to reimplement it in a more terse manner and
rename as 'choose_output', which is hopefully more descriptive.
gcc/ChangeLog:
* genmatch.cc (get_out_file): Make static and rename to ...
(choose_output): ... this. Reimplement. Update all uses ...
(decision_tree::gen): ... here and ...
(main): ... here.
Eliminate boolean parameters of emit_func. The first ('open') just
prints 'extern' to generated header, which is unnecessary. Introduce a
separate function to use when finishing a declaration in place of the
second ('close').
Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak
indentation in several places.
Reshuffle emitted line breaks in a few places to make generated
declarations less ugly.
gcc/ChangeLog:
* genmatch.cc (header_file): Make static.
(emit_func): Rename to...
(fp_decl): ... this. Adjust all uses.
(fp_decl_done): New function. Use it...
(decision_tree::gen): ... here and...
(write_predicate): ... here.
(main): Adjust.
aarch64: Avoid hard-coding specific register allocations
Some tests hard-coded specific allocations for temporary registers,
whereas the RA should be free to pick anything that doesn't force
unnecessary moves or spills.
Most governing predicate operands require p0-p7, but some
instructions also allow p8-p15. Non-gp uses of predicates
often also allow all of p0-p15.
This patch fixes up cases where we required p0-p7 unnecessarily.
In some cases we match the definition (typically a comparison,
PFALSE or PTRUE), sometimes we match the use (like a logic
instruction, MOV or SEL), and sometimes we match both.
aarch64: Relax ordering requirements in SVE dup tests
Some of the svdup tests expand to a SEL between two constant vectors.
This patch allows the constants to be formed in either order.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/dup_s16.c: When using SEL to select
between two constant vectors, allow the constant moves to appear in
either order.
* gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.
aarch64: Allow moves after tied-register intrinsics
Some ACLE intrinsics map to instructions that tie the output
operand to an input operand. If all the operands are allocated
to different registers, and if MOVPRFX can't be used, we will need
a move either before the instruction or after it. Many tests only
matched the "before" case; this patch makes them accept the "after"
case too.
Some of the SVE ACLE asm tests tried to be agnostic about the
instruction order, but only one of the alternatives was exercised
in practice. This patch fixes latent typos in the other versions.
gcc/testsuite/
* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Fix expected register
allocation in the case where a move occurs after the intrinsic
instruction.
* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn. Allocating the
same register to the input and the output then saves the cost of a move.
If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency. This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.
This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common. It is then impossible for the output
operand to reuse the dying input register.
This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output. (Matched earlyclobbers are fine.)
Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. A SPEC2017 run
on aarch64 showed no differences outside the noise. Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:
with other targets showing no sensitivity to the patch. The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.
gcc/
* ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching
earlyclobbers.
This is a repost/respin of a patch that was conditionally approved:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html
This patch adds a convenient post-reload splitter for setting/updating
the highpart of a TImode variable, using i386's previously added
split_double_concat infrastructure.
For the new test case below:
__int128 foo(__int128 x, unsigned long long y)
{
__int128 t = (__int128)y << 64;
__int128 r = (x & ~0ull) | t;
return r;
}
with this patch, GCC instead now generates the much better:
foo: movq %rdi, %rcx
movq %rcx, %rax
ret
It turns out that the -m32 equivalent of this testcase, already
avoids using explict orl/xor instructions, as it gets optimized
(in combine) by a completely different path. Given that this idiom
isn't seen in 32-bit code (so this pattern doesn't match with -m32),
and also that the shorter 32-bit AND bitmask is represented as a
CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
is implemented for just TARGET_64BIT rather than contort a "generic"
implementation using DWI mode iterators.
2023-05-08 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (any_or_plus): Move definition earlier.
(*insvti_highpart_1): New define_insn_and_split to overwrite
(insv) the highpart of a TImode register/memory.
gcc/testsuite/ChangeLog
* gcc.target/i386/insvti_highpart-1.c: New test case.
Eugene Rozenfeld [Tue, 28 Feb 2023 23:58:40 +0000 (15:58 -0800)]
Fix cfg maintenance after inlining in AutoFDO
Todo from early_inliner needs to be propagated so that
cleanup_tree_cfg () is called if necessary.
This bug was causing an assert in get_loop_body during
ipa-sra in autoprofiledbootstrap build since loops weren't
fixed up and one of the loops had num_nodes set to 0.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
* auto-profile.cc (auto_profile): Check todo from early_inline
to see if cleanup_tree_vfg needs to be called.
(early_inline): Return todo from early_inliner.
Andrew Pinski [Mon, 8 May 2023 17:58:06 +0000 (10:58 -0700)]
Fix pr81192.c for int16 targets
I had missed when converting this
testcase to Gimple that there was a define
for int/unsigned type specifically to get
an INT32 type. This means when using a
literal integer constant you need to use the
`_Literal (type)` to form the types correctly on the
constants.
This fixes the issue and has been both tested on
xstormy16-elf and x86_64-linux-gnu.
Committed as obvious.
gcc/testsuite/ChangeLog:
PR testsuite/109776
* gcc.dg/pr81192.c: Fix integer constants for int16 targets.
Kito Cheng [Mon, 8 May 2023 13:44:30 +0000 (21:44 +0800)]
RISC-V: Improve portability of testcases
stdint.h will require having corresponding multi-lib existing, so using
stdint-gcc.h instead, also added a riscv_vector.h wrapper to
gcc.target/riscv/rvv/autovec/.
Jeff Law [Mon, 8 May 2023 14:28:26 +0000 (08:28 -0600)]
Fix minor length computation on stormy16
Today's build of xstormy16-elf failed due to a branch to an out of range
target. Manual inspection of the assembly code for the affected function
(divdi3) showed that the zero-extension patterns were claiming a length
of 2, but clearly assembled into 4 bytes.
This patch adds an explicit length to the zero extension pattern and
appears to resolve the issue in my test builds.
That is, if 'c.exp' executes first, it does successfully evaluate
'dg-require-effective-target openacc_cublas' -- and does cache this result (so
it isn't reevaluated for 'c++.exp'). However, for 'c++.exp' alone (that is,
without the 'c.exp' result cached), we run into:
spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...]
In file included from /usr/include/cuda_fp16.h:3673,
from /usr/include/cublas_api.h:75,
from /usr/include/cublas_v2.h:65,
from openacc_cublas2311907.c:3:
/usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory
We're missing include paths to C++/libstdc++ build-tree headers.
Fix this by using the mechanism introduced for Fortran in
r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re
"libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}".
libgomp/
* testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead
of 'libstdcxx_includes'.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
Thomas Schwinge [Tue, 2 May 2023 17:57:47 +0000 (19:57 +0200)]
Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'
Otherwise, for example for 'RUNTESTFLAGS' of '--target_board=unix\{-m64,-m32\}'
vs. '--target_board=unix\{-m32,-m64\}', both variants exercise testing with
always the first flag variant's 'LTO_OPTIONS'/'LTO_TORTURE_OPTIONS', which
results in unequal test results between the two 'RUNTESTFLAGS' variants if one
of the flag variants has 'check_linker_plugin_available' but the other doesn't.
Patrick Palka [Mon, 8 May 2023 13:03:35 +0000 (09:03 -0400)]
c++: list CTAD and resolve_nondeduced_context [PR106214]
This extends the PR93107 fix, which made us do resolve_nondeduced_context
on the elements of an initializer list during auto deduction, to happen for
CTAD as well.
PR c++/106214
PR c++/93107
gcc/cp/ChangeLog:
* pt.cc (do_auto_deduction): Move up resolve_nondeduced_context
calls to happen before do_class_deduction. Add some
error_mark_node tests.