Martin Liska [Tue, 17 Aug 2021 14:24:26 +0000 (16:24 +0200)]
gcov: fix output location for JSON mode.
PR gcov-profile/89961
gcc/ChangeLog:
* gcov.c (make_gcov_file_name): Rewrite using std::string.
(mangle_name): Simplify, do not used the second argument.
(strip_extention): New function.
(get_md5sum): Likewise.
(get_gcov_intermediate_filename): Handle properly -p and -x
options.
(output_gcov_file): Use string type.
(generate_results): Likewise.
(md5sum_to_hex): Remove.
A change to the way gas interprets the .fpu directive in binutils-2.34
means that issuing .fpu will clear any features set by .arch_extension
that apply to the floating point or simd units. This unfortunately
causes problems for more recent versions of the architecture because
we currently emit .arch, .arch_extension and .fpu directives at
different times and try to suppress redundant changes.
This change addresses this by firstly unifying all the places where we
emit these directives to a single block of code and secondly
(re)emitting all the directives if any changes have been made to the
target options. Whilst this is slightly more than the strict minimum
it should be enough to catch all cases where a change could have
happened. The new code also emits the directives in the order: .arch,
.fpu, .arch_extension. This ensures that the additional architectural
extensions are not removed by a later .fpu directive.
Whilst writing this patch I also noticed that in the corner case where
the last function to be compiled had a non-standard set of
architecture flags, the assembler would add an incorrect set of
derived attributes for the file as a whole. Instead of reflecting the
command-line options it would reflect the flags from the last file in
the function. To address this I've also added a call to re-emit the
flags from the asm_file_end callback so the assembler will be in the
correct state when it finishes processing the intput.
There's some slight churn to the testsuite as a consequence of this,
because previously we had a hack to suppress emitting a .fpu directive
for one specific case, but with the new order this is no-longer
necessary.
gcc/ChangeLog:
PR target/101723
* config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
writing .cpu directive in asm output.
* config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
(arm_last_printed_arch_string): Delete.
(arm_last-printed_fpu_string): Delete.
(arm_configure_build_target): If use of floating-point/SIMD is
disabled, remove all fp/simd related features from the target ISA.
(last_arm_targ_options): New variable.
(arm_print_asm_arch_directives): Add new parameters. Change order
of emitted directives and handle all cases here.
(arm_file_start): Always call arm_print_asm_arch_directives, move
all generation of .arch/.arch_extension here.
(arm_file_end): Call arm_print_asm_arch.
(arm_declare_function_name): Call arm_print_asm_arch_directives
instead of printing .arch/.fpu directives directly.
gcc/testsuite/ChangeLog:
PR target/101723
* gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected output.
* gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
* gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do assemble.
Add a non-no-op function body.
* gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
* gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
* gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
* gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
check-function-bodies.
* gcc.target/arm/attr-neon3.c: Likewise.
* gcc.target/arm/pr69245.c: Tighten scan-assembler match, but allow
multiple instances.
* gcc.target/arm/pragma_fpu_attribute.c: Likewise.
* gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
Richard Earnshaw [Tue, 27 Jul 2021 14:44:57 +0000 (15:44 +0100)]
arm: Don't reconfigure globals in arm_configure_build_target
arm_configure_build_target is usually used to reconfigure the
arm_active_target structure, which is then used to reconfigure a
number of other global variables describing the current target.
Occasionally, however, we need to use arm_configure_build_target to
construct a temporary target structure and in that case it is wrong to
try to reconfigure the global variables (although probably harmless,
since arm_option_reconfigure_globals() only looks at
arm_active_target). At the very least, however, this is wasted work,
so it is best not to do it unless needed. What's more, several
callers of arm_configure_build target call
arm_option_reconfigure_globals themselves within a few lines, making
the call from within arm_configure_build_target completely redundant.
So this patch moves the responsibility of calling of
arm_configure_build_target to its callers (only two places needed
updating).
gcc:
* config/arm/arm.c (arm_configure_build_target): Don't call
arm_option_reconfigure_globals.
(arm_option_restore): Call arm_option_reconfigure_globals after
reconfiguring the target.
* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
(cherry picked from commit 6a37d0331c25f23628d4308e5a75624005c223b2)
Richard Earnshaw [Mon, 26 Jul 2021 16:07:14 +0000 (17:07 +0100)]
arm: ensure the arch_name is always set for the build target
This should never happen now if GCC is invoked by the driver, but in
the unusual case of calling cc1 (or its ilk) directly from the command
line the build target's arch_name string can remain NULL. This can
complicate later processing meaning that we need to check for this
case explicitly in some circumstances. Nothing should rely on this
behaviour, so it's simpler to always set the arch_name when
configuring the build target and be done with it.
gcc:
* config/arm/arm.c (arm_configure_build_target): Ensure the target's
arch_name is always set.
Thomas Schwinge [Tue, 17 Aug 2021 15:58:30 +0000 (17:58 +0200)]
libstdc++: Avoid illegal argument to verbose in dg-test callback, continued
This is a follow-up to commit 697b94cfaef4a958132faf0cf4b35b15dfb29acc
"libstdc++: Avoid illegal argument to verbose in dg-test callback".
I'm confirming the original problem, but on one system, it's not
resolved by this change, because instead we get:
extra_tool_flags are:
ERROR: tcl error sourcing [...]/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp.
ERROR: usage: send [args] string
while executing
"send_log "$message\n""
(procedure "verbose" line 48)
invoked from within
"verbose -log -- $extra_tool_flags"
(procedure "libstdc++-dg-test" line 45)
invoked from within
"${tool}-dg-test $prog [lindex ${dg-do-what} 0] "$tool_flags ${dg-extra-tool-flags}""
(procedure "saved-dg-test" line 115)
invoked from within
[...]
That's Ubuntu's dejagnu 1.5-3ubuntu1 being so old that it doesn't include
DejaGnu commit 57c22601afe43d2c2b8819df4f2ecacb034516fd "Protect from leading
dash in message". (I suppose that's what'd make this work, but have not
verified.)
libstdc++-v3/
* testsuite/lib/libstdc++.exp: Avoid illegal argument to verbose,
continued.
Richard Biener [Tue, 17 Aug 2021 06:38:35 +0000 (08:38 +0200)]
tree-optimization/101868 - avoid PRE of trapping mems across calls
This backports a fix for the omission of a check of trapping mems
when hoisting them across calls that might not return. This was
originally done as part of a fix to handle const functions that throw
properly.
2021-08-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/101373
PR tree-optimization/101868
* tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
references when the BB may not return.
Harald Anlauf [Sun, 15 Aug 2021 18:13:11 +0000 (20:13 +0200)]
Fortran: fix checks for STAT= and ERRMSG= arguments of SYNC ALL/SYNC IMAGES
gcc/fortran/ChangeLog:
PR fortran/99351
* match.c (sync_statement): Replace %v code by %e in gfc_match to
allow for function references as STAT and ERRMSG arguments.
* resolve.c (resolve_sync): Adjust checks of STAT= and ERRMSG= to
being definable arguments. Function references with a data
pointer result are accepted.
* trans-stmt.c (gfc_trans_sync): Adjust assertion.
Eric Botcazou [Mon, 16 Aug 2021 13:26:22 +0000 (15:26 +0200)]
Fix regression in debug info for Ada with DWARF 5
add_scalar_info can directly generate a reference to an existing DIE for a
scalar attribute, e.g the upper bound of a VLA, but it does so only if this
existing DIE has a location or is a constant:
Now, in DWARF 5, members of a structure that are bitfields no longer have a
DW_AT_data_member_location but a DW_AT_data_bit_offset attribute instead, so
the condition is bypassed.
gcc/
* dwarf2out.c (add_scalar_info): Deal with DW_AT_data_bit_offset.
Jakub Jelinek [Thu, 12 Aug 2021 20:38:18 +0000 (22:38 +0200)]
libcpp: Fix ICE with -Wtraditional preprocessing [PR101638]
The following testcase ICEs in cpp_sys_macro_p, because cpp_sys_macro_p
is called for a builtin macro which doesn't use node->value.macro union
member but a different one and so dereferencing it ICEs.
As the testcase is distilled from contemporary glibc headers, it means
basically -Wtraditional now ICEs on almost everything.
The fix can be either the patch below, return true for builtin macros,
or we could instead return false for builtin macros, or the fix could
be also (untested):
--- libcpp/expr.c 2021-05-07 10:34:46.345122608 +0200
+++ libcpp/expr.c 2021-08-12 09:54:01.837556365 +0200
@@ -783,13 +783,13 @@ cpp_classify_number (cpp_reader *pfile,
/* Traditional C only accepted the 'L' suffix.
Suppress warning about 'LL' with -Wno-long-long. */
- if (CPP_WTRADITIONAL (pfile) && ! cpp_sys_macro_p (pfile))
+ if (CPP_WTRADITIONAL (pfile))
{
int u_or_i = (result & (CPP_N_UNSIGNED|CPP_N_IMAGINARY));
int large = (result & CPP_N_WIDTH) == CPP_N_LARGE
&& CPP_OPTION (pfile, cpp_warn_long_long);
- if (u_or_i || large)
+ if ((u_or_i || large) && ! cpp_sys_macro_p (pfile))
cpp_warning_with_line (pfile, large ? CPP_W_LONG_LONG : CPP_W_TRADITIONAL,
virtual_location, 0,
"traditional C rejects the \"%.*s\" suffix",
The builtin macros at least currently don't add any suffixes
or numbers -Wtraditional would like to warn about. For floating
point suffixes, -Wtraditional calls cpp_sys_macro_p only right
away before emitting the warning, but in the above case the ICE
is because cpp_sys_macro_p is called even if the number doesn't
have any suffixes (that is I think always for builtin macros
right now).
2021-08-12 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/101638
* macro.c (cpp_sys_macro_p): Return true instead of
crashing on builtin macros.
Jakub Jelinek [Wed, 11 Aug 2021 08:23:34 +0000 (10:23 +0200)]
sanitizer: Cherry-pick realpath fix
tsan in some cases starts ignoring interceptors and only calls the
intercepted functions. But for realpath the behavior for NULL second argument
was only handled in the interceptor and intercepted function was the one
found by dlsym which is often one that doesn't handle NULL as second argument.
Fixed by using dlvsym with "GLIBC_2.3" if possible for intercepted function
and don't emulate behavior in the wrapper.
Jakub Jelinek [Wed, 4 Aug 2021 09:53:48 +0000 (11:53 +0200)]
c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing
When parsing default arguments, we need to temporarily clear parser->omp_declare_simd
and parser->oacc_routine, otherwise it can clash with further declarations
inside of e.g. lambdas inside of those default arguments.
2021-08-04 Jakub Jelinek <jakub@redhat.com>
PR c++/101759
* parser.c (cp_parser_default_argument): Temporarily override
parser->omp_declare_simd and parser->oacc_routine to NULL.
* g++.dg/gomp/pr101759.C: New test.
* g++.dg/goacc/pr101759.C: New test.
Jakub Jelinek [Wed, 28 Jul 2021 16:43:15 +0000 (18:43 +0200)]
ubsan: Fix ICEs with DECL_REGISTER tests [PR101624]
The following testcase ICEs, because the base is a CONST_DECL for
the Fortran parameter, and ubsan/sanopt uses DECL_REGISTER macro on it.
/* In VAR_DECL and PARM_DECL nodes, nonzero means declared `register'. */
#define DECL_REGISTER(NODE) (DECL_WRTL_CHECK (NODE)->decl_common.decl_flag_0)
while CONST_DECL doesn't satisfy DECL_WRTL_CHECK.
The following patch checks explicitly for VAR_DECL/PARM_DECL/RESULT_DECL
only before using DECL_REGISTER, assumes other decls aren't DECL_REGISTER.
Not really sure about RESULT_DECL but it at least satisfies DECL_WRTL_CHECK...
2021-07-28 Jakub Jelinek <jakub@redhat.com>
PR middle-end/101624
* ubsan.c (maybe_instrument_pointer_overflow,
instrument_object_size): Only test DECL_REGISTER on VAR_DECLs,
PARM_DECLs or RESULT_DECLs.
* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.
* gfortran.dg/ubsan/ubsan.exp: New file.
* gfortran.dg/ubsan/pr101624.f90: New test.
Patrick Palka [Thu, 12 Aug 2021 00:59:53 +0000 (20:59 -0400)]
c++: constexpr std::construct_at on empty field [PR101663]
Here during constexpr evaluation of
std::construct_at(&a._M_value)
we find ourselves in cxx_eval_store_expression where the target object
is 'a._M_value' and the initializer is {}. Since _M_value is an empty
[[no_unique_address]] member we don't create a sub-CONSTRUCTOR for it,
so we end up in the early exit code path for empty stores with mismatched
types and trip over the assert therein
because lval is true. The reason it's true is because the INIT_EXPR in
question is the LHS of a COMPOUND_EXPR, and evaluation of the LHS is
always performed with lval=true (to indicate there's no lvalue-to-rvalue
conversion).
This patch makes the code path in question handle the lval=true case
appropriately rather than asserting. In passing, it also consolidates
the duplicate implementations of std::construct_at/destroy_at in some
of the C++20 constexpr tests into a common header file.
PR c++/101663
gcc/cp/ChangeLog:
* constexpr.c (cxx_eval_store_expression): Handle the lval=true
case in the early exit code path for empty stores with mismatched
types.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/construct_at.h: New convenience header file that
defines minimal implementations of std::construct_at/destroy_at,
split out from ...
* g++.dg/cpp2a/constexpr-new5.C: ... here.
* g++.dg/cpp2a/constexpr-new6.C: Use the header.
* g++.dg/cpp2a/constexpr-new14.C: Likewise.
* g++.dg/cpp2a/constexpr-new20.C: New test.
Eric Botcazou [Thu, 12 Aug 2021 07:30:31 +0000 (09:30 +0200)]
Make -no-pie option work for native Windows
Binutils 2.36/2.37 generate PIE executables by default on native Windows
(because --dynamicbase is the default) so it makes sense to have a simple
way to counter that and -no-pie seems appropriate, all the more so that
it is automatically passed when building the compiler itself.
gcc/
* configure.ac (PE linker --disable-dynamicbase support): New check.
* configure: Regenerate.
* config.in: Likewise.
* config/i386/mingw32.h (LINK_SPEC_DISABLE_DYNAMICBASE): New define.
(LINK_SPEC): Use it.
* config/i386/mingw-w64.h (LINK_SPEC_DISABLE_DYNAMICBASE): Likewise.
(LINK_SPEC): Likewise.
Jonathan Wakely [Wed, 11 Aug 2021 21:11:19 +0000 (22:11 +0100)]
libstdc++: Fix test that fails randomly [PR101866]
This test assumes that the same sequence of three values cannot occur,
which is incorect. It's unlikely, but not impossible.
Perform the check in a loop, so that in the unlikely event of an
identical sequence, we retry. If the library code is buggy it will keep
producing the same sequence and the test will time out. If the code is
working correctly then we will usually break out of the loop after one
iteration, or very rarely after two or three.
libstdc++-v3/ChangeLog:
PR libstdc++/101866
* testsuite/experimental/random/randint.cc: Loop and retry if
reseed() produces the same sequence.
Patrick Palka [Wed, 11 Aug 2021 20:53:53 +0000 (16:53 -0400)]
c++: parameterized requires-expr as default argument [PR101725]
Here we're rejecting the default template argument
requires (T t) { x(t); }
because we consider the 't' in the requirement to be a local variable
(according to local_variable_p), and we generally forbid local variables
from appearing inside default arguments. We can perhaps fix this by
giving special treatment to parameters introduced by requires-expressions,
but DR 2082 relaxed the restriction about local variables appearing within
default arguments to permit them inside unevaluated operands thereof.
So this patch just implements DR 2082 which also fixes this PR since a
requires-expression is an unevaluated context.
PR c++/101725
DR 2082
gcc/cp/ChangeLog:
* cp-tree.h (unevaluated_p): Return true for REQUIRES_EXPR.
* decl.c (local_variable_p_walkfn): Don't walk into unevaluated
operands.
* parser.c (cp_parser_primary_expression) <case CPP_NAME>: Never
reject uses of local variables in unevaluated contexts.
* tree.c (cp_walk_subtrees) <case REQUIRES_EXPR>: Increment
cp_unevaluated_operand. Use cp_walk_tree directly instead of
WALK_SUBTREE to avoid the goto. Use REQUIRES_EXPR_REQS instead
of TREE_OPERAND directly.
gcc/testsuite/ChangeLog:
* g++.dg/DRs/dr2082.C: New test.
* g++.dg/cpp2a/concepts-uneval4.C: New test.
Harald Anlauf [Wed, 28 Jul 2021 17:11:27 +0000 (19:11 +0200)]
Fortran: ICE in resolve_allocate_deallocate for invalid STAT argument
gcc/fortran/ChangeLog:
PR fortran/101564
* expr.c (gfc_check_vardef_context): Add check for KIND and LEN
parameter inquiries.
* match.c (gfc_match): Fix comment for %v code.
(gfc_match_allocate, gfc_match_deallocate): Replace use of %v code
by %e in gfc_match to allow for function references as STAT and
ERRMSG arguments.
* resolve.c (resolve_allocate_deallocate): Avoid NULL pointer
dereferences and shortcut for bad STAT and ERRMSG argument to
(DE)ALLOCATE. Remove bogus parts of checks for STAT and ERRMSG.
Patrick Palka [Mon, 2 Aug 2021 13:59:56 +0000 (09:59 -0400)]
c++: Improve memory usage of subsumption [PR100828]
Constraint subsumption is implemented in two steps. The first step
computes the disjunctive (or conjunctive) normal form of one of the
constraints, and the second step verifies that each clause in the
decomposed form implies the other constraint. Performing these two
steps separately is problematic because in the first step the DNF/CNF
can be exponentially larger than the original constraint, and by
computing it ahead of time we'd have to keep all of it in memory.
This patch fixes this exponential blowup in memory usage by interleaving
the two steps, so that as soon as we decompose one clause we check
implication for it. In turn, memory usage during subsumption is now
worst case linear in the size of the constraints rather than
exponential, and so we can safely remove the hard limit of 16 clauses
without introducing runaway memory usage on some inputs. (Note the
_time_ complexity of subsumption is still exponential in the worst case.)
In order for this to work we need to make formula::branch() insert the
copy of the current clause directly after the current clause rather than
at the end of the list, so that we fully decompose a clause shortly
after creating it. Otherwise we'd end up accumulating exponentially
many (partially decomposed) clauses in memory anyway.
PR c++/100828
gcc/cp/ChangeLog:
* logic.cc (formula::formula): Use emplace_back instead of
push_back.
(formula::branch): Insert a copy of m_current directly after
m_current instead of at the end of the list.
(formula::erase): Define.
(decompose_formula): Remove.
(decompose_antecedents): Remove.
(decompose_consequents): Remove.
(derive_proofs): Remove.
(max_problem_size): Remove.
(diagnose_constraint_size): Remove.
(subsumes_constraints_nonnull): Rewrite directly in terms of
decompose_clause and derive_proof, interleaving decomposition
with implication checking. Remove limit on constraint complexity.
Use formula::erase to free the current clause before moving on to
the next one.
Jonathan Wakely [Tue, 20 Jul 2021 17:15:48 +0000 (18:15 +0100)]
libstdc++: Fix create_directories to resolve symlinks [PR101510]
When filesystem__create_directories checks to see if the path already
exists and resolves to a directory, it uses filesystem::symlink_status,
which means it reports an error if the path is a symlink. It should use
filesystem::status, so that the target directory is detected, and no
error is reported.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/101510
* src/c++17/fs_ops.cc (fs::create_directories): Use status
instead of symlink_status.
* src/filesystem/ops.cc (fs::create_directories): Likewise.
* testsuite/27_io/filesystem/operations/create_directories.cc:
Check symlink to existing directory.
* testsuite/27_io/filesystem/operations/create_directory.cc: Do
not test with symlinks on Windows.
* testsuite/experimental/filesystem/operations/create_directories.cc:
Check symlink to existing directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Do not test with symlinks on Windows.
Jonathan Wakely [Tue, 20 Jul 2021 11:35:37 +0000 (12:35 +0100)]
libstdc++: Add more tests for filesystem::create_directory [PR101510]
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/101510
* src/c++17/fs_ops.cc (create_dir): Adjust whitespace.
* testsuite/27_io/filesystem/operations/create_directory.cc:
Test creating directory with name of existing symlink to
directory.
* testsuite/experimental/filesystem/operations/create_directory.cc:
Likewise.
Jonathan Wakely [Mon, 14 Jun 2021 11:25:43 +0000 (12:25 +0100)]
libstdc++: Change [cmp.alg] assertions to constraints
This moves the same_as<decay_t<_Tp>, decay_t<_Up>> checks from the
[cmp.alg] function bodies into their constraints.
Also add a test for the compare_xxx_order_fallback algorithms.
libstdc++-v3/ChangeLog:
* libsupc++/compare (__decayed_same_as): New helper concept.
(strong_order, weak_order, partial_order): Constrain with new
concept instead of using static_assert.
(compare_strong_order_fallback, compare_weak_order_fallback)
(compare_partial_order_fallback): Likewise. Do not deduce return
types. Remove redundant if-constexpr checks.
* testsuite/18_support/comparisons/algorithms/fallback.cc: New test.
Jonathan Wakely [Wed, 30 Jun 2021 23:30:54 +0000 (00:30 +0100)]
libstdc++: Improvements to Doxygen markup
This attempts to improve the doxygen output to work around what seems to
be some bugs in doxygen (issues 8635 and 8638).
The @addtogroup command doesn't work for entities inside a nested
namespace (see 8635) so we need to close and reopen groups on entering
and elaving nested namespaces. This fixes the problem that
chrono::duration and chrono::time_point were not documented in the
"Time" documentation group. I am unable to make the path classes appear
as part of their relevant groups (File System and Filesystem TS), nor
the contents of <exception> or <system_error>. I have made some minor
improvements to the docs for those types, including starting to address
PR 97001 by adding @since to the doxygen comments.
This change also excludes the <experimental/bits/net.h> header from
Doxygen processing, so we don't get an unwanted "Networking-ts" group
in the documentation.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
Jonathan Wakely [Wed, 30 Jun 2021 15:00:58 +0000 (16:00 +0100)]
libstdc++: Improve Doxygen documentation groups [PR 101258]
This defines some new Doxygen groups for C++17 variable templates and
for the contents of <experimental/type_traits>. By documenting the group
as a whole and adding each template to a group we don't need to document
them individually.
Also mark more internals with "@cond undocumented" so that Doxygen
ignores them by default. Also make Doxygen process <experimental/simd>.
For some reason, many of the class templates in <type_traits> do not
appear in the "Metaprogramming" group. For example, add_cv,
remove_extent, and all the is_xxx_constructible and is_xxx_assignable
traits. For some reason, Doxygen doesn't include them in the group,
despite doing it correctly for other traits in the same header.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/101258
* doc/doxygen/user.cfg.in (INPUT): Add <experimental/simd>.
(COLS_IN_ALPHA_INDEX): Remove obsolete tag.
(PREDEFINED): Add/fix some more macros that need to be expanded.
* include/bits/random.h: Stop Doxygen from documenting internal
implementation details.
* include/bits/random.tcc: Likewise.
* include/bits/this_thread_sleep.h: Fix @file name.
* include/experimental/bits/simd.h: Add to Doxygen group. Do not
document internal implementation details.
* include/experimental/bits/simd_detail.h: Do not document
internal implementation details.
* include/experimental/simd: Define Doxygen groups.
* include/experimental/type_traits: Improve documentation for
the header file. Define groups. Use @since commands.
* include/std/scoped_allocator (scoped_allocator_adaptor): Move
declaration before undocumented region.
* include/std/type_traits (true_type, false_type): Use using
declaration instead of typedef.
(is_invocable_v, is_nothrow_invocable_v, is_invocable_r_v)
(is_nothrow_invocable_r_v): Move definitions next to other C++17
variable templates.
Do not document internal implementation details. Move misplaced
group-end command. Define group for variable templates.
* include/std/variant: Do not document internal implementation
details.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.
This patch adds an option to tune for Neoverse cores that have
a total vector bandwidth of 512 bits (4x128 for Advanced SIMD
and a vector-length-dependent equivalent for SVE). This is intended
to be a compromise between tuning aggressively for a single core like
Neoverse V1 (which can be too narrow) and tuning for AArch64 cores
in general (which can be too wide).
-mcpu=neoverse-512tvb is equivalent to -mcpu=neoverse-v1
-mtune=neoverse-512tvb.
gcc/
* doc/invoke.texi: Document -mtune=neoverse-512tvb and
-mcpu=neoverse-512tvb.
* config/aarch64/aarch64-cores.def (neoverse-512tvb): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.c (neoverse512tvb_sve_vector_cost)
(neoverse512tvb_sve_issue_info, neoverse512tvb_vec_issue_info)
(neoverse512tvb_vector_cost, neoverse512tvb_tunings): New structures.
(aarch64_adjust_body_cost_sve): Handle -mtune=neoverse-512tvb.
(aarch64_adjust_body_cost): Likewise.
aarch64: Restrict issue heuristics to inner vector loop
The AArch64 vector costs try to take issue rates into account.
However, when vectorising an outer loop, we lumped the inner
and outer operations together, which is somewhat meaningless.
This patch restricts the heuristic to the inner loop.
gcc/
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Only
record issue information for operations that occur in the
innermost loop.
The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction. This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two values and adding
an invariant will end up being a move and an MLA.
The only target to use the issue-based vector costs is Neoverse V1,
which would generally prefer SVE in this case anyway. I therefore
don't have a self-contained testcase. However, the distinction
becomes more important with a later patch.
gcc/
* config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
parameter. Detect cases in which an Advanced SIMD MLA would almost
certainly require a MOV.
(aarch64_count_ops): Update accordingly.
When the vectoriser scalarises a strided store, it counts one
scalar_store for each element plus one vec_to_scalar extraction
for each element. However, extracting element 0 is free on AArch64,
so it should have zero cost.
I don't have a testcase that requires this for existing -mtune
options, but it becomes more important with a later patch.
gcc/
* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
function, split out from...
(aarch64_detect_vector_stmt_subtype): ...here.
(aarch64_add_stmt_cost): Treat extracting element 0 as free.
This patch adds tuning fields for the total cost of a gather load
instruction. Until now, we've costed them as one scalar load
per element instead. Those scalar_load-based values are also
what the patch uses to fill in the new fields for existing
cost structures.
gcc/
* config/aarch64/aarch64-protos.h (sve_vec_cost):
Add gather_load_x32_cost and gather_load_x64_cost.
* config/aarch64/aarch64.c (generic_sve_vector_cost)
(a64fx_sve_vector_cost, neoversev1_sve_vector_cost): Update
accordingly, using the values given by the scalar_load * number
of elements calculation that we used previously.
(aarch64_detect_vector_stmt_subtype): Use the new fields.
This patch splits the SVE-specific part of aarch64_adjust_body_cost
out into its own subroutine, so that a future patch can call it
more than once. I wondered about using a lambda to avoid having
to pass all the arguments, but in the end this way seemed clearer.
gcc/
* config/aarch64/aarch64.c (aarch64_adjust_body_cost_sve): New
function, split out from...
(aarch64_adjust_body_cost): ...here.
aarch64: Add a simple fixed-point class for costing
This patch adds a simple fixed-point class for holding fractional
cost values. It can exactly represent the reciprocal of any
single-vector SVE element count (including the non-power-of-2 ones).
This means that it can also hold 1/N for all N in [1, 16], which should
be enough for the various *_per_cycle fields.
For now the assumption is that the number of possible reciprocals
is fixed at compile time and so the class should always be able
to hold an exact value.
The class uses a uint64_t to hold the fixed-point value, which means
that it can hold any scaled uint32_t cost. Normally we don't worry
about overflow when manipulating raw uint32_t costs, but just to be
on the safe side, the class uses saturating arithmetic for all
operations.
As far as the changes to the cost routines themselves go:
- The changes to aarch64_add_stmt_cost and its subroutines are
just laying groundwork for future patches; no functional change
intended.
- The changes to aarch64_adjust_body_cost mean that we now
take fractional differences into account.
gcc/
* config/aarch64/fractional-cost.h: New file.
* config/aarch64/aarch64.c: Include <algorithm> (indirectly)
and cost_fraction.h.
(vec_cost_fraction): New typedef.
(aarch64_detect_scalar_stmt_subtype): Use it for statement costs.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_sve_adjust_stmt_cost, aarch64_adjust_stmt_cost): Likewise.
(aarch64_estimate_min_cycles_per_iter): Use vec_cost_fraction
for cycle counts.
(aarch64_adjust_body_cost): Likewise.
(aarch64_test_cost_fraction): New function.
(aarch64_run_selftests): Call it.
aarch64: Turn sve_width tuning field into a bitmask
The tuning structures have an sve_width field that specifies the
number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not
applicable). This patch turns the field into a bitmask so that
it can specify multiple widths at the same time. For now we
always treat the mininum width as the likely width.
An alternative would have been to add extra fields, which would
have coped correctly with non-power-of-2 widths. However,
we're very far from supporting constant non-power-of-2 vectors
in GCC, so I think the non-power-of-2 case will in reality always
have to be hidden behind VLA.
gcc/
* config/aarch64/aarch64-protos.h (tune_params::sve_width): Turn
into a bitmask.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Update
accordingly.
(aarch64_estimated_poly_value): Likewise. Use the least significant
set bit for the minimum and likely values. Use the most significant
set bit for the maximum value.
Richard Biener [Mon, 19 Jul 2021 11:29:16 +0000 (13:29 +0200)]
tree-optimization/101505 - properly determine stmt precision for PHIs
Loop vectorization pattern recog fails to walk PHIs when determining
stmt precisions. This fails to recognize non-mask uses for bools
in PHIs and outer loop vectorization.
2021-07-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/101505
* tree-vect-patterns.c (vect_determine_precisions): Walk
PHIs also for loop vectorization.
Xi Ruoyao [Fri, 30 Jul 2021 15:44:14 +0000 (23:44 +0800)]
mips: Fix up mips_atomic_assign_expand_fenv [PR94780]
Commit message shamelessly copied from 1777beb6b129 by jakub:
This function, because it is sometimes called even outside of function
bodies, uses create_tmp_var_raw rather than create_tmp_var. But in order
for that to work, when first referenced, the VAR_DECLs need to appear in a
TARGET_EXPR so that during gimplification the var gets the right
DECL_CONTEXT and is added to local decls.
gcc/
PR target/94780
* config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR.
d: Return the correct value for C++ constructor calls (PR101664)
C++ constructors return void, even though the front-end semantic treats
them as implicitly returning `this'. To handle this correctly, the
object reference is cached and used as the result of the expression.
PR d/101664
gcc/d/ChangeLog:
* expr.cc (ExprVisitor::visit (CallExp *)): Use object expression as
result for C++ constructor calls.
gcc/testsuite/ChangeLog:
* gdc.dg/extern-c++/extern-c++.exp: New.
* gdc.dg/extern-c++/pr101664.d: New test.
* gdc.dg/extern-c++/pr101664_1.cc: New test.
Martin Uecker [Wed, 28 Jul 2021 06:41:38 +0000 (08:41 +0200)]
Correct a mistake in a warnung for -Wnonnull.
In the warning for -Wnonnull when warning about array parameters
with bounds > 0 and which are NULL the numbers referring to the
two arguments are switched. This patch corrects the mistake.
2021-07-28 Martin Uecker <muecker@gwdg.de>
gcc/
* calls.c (maybe_warn_rdwr_sizes): Correct argument
numbers in warning that were switched.
gcc/testsuite/
* gcc.dg/Wnonnull-4.c: Correct argument numbers in warnings.
Harald Anlauf [Wed, 21 Jul 2021 16:54:00 +0000 (18:54 +0200)]
Fortran: ICE, OOM while calculating sizes of derived type array components
gcc/fortran/ChangeLog:
PR fortran/101514
* target-memory.c (gfc_interpret_derived): Size of array component
of derived type can only be computed here for explicit shape.
* trans-types.c (gfc_get_nodesc_array_type): Do not dereference
NULL pointers.
gcc/testsuite/ChangeLog:
PR fortran/101514
* gfortran.dg/pr101514.f90: New test.
d: Wrong evaluation order of binary expressions (PR101640)
The use of fold_build2 can in some cases swap the order of its operands
if that is the more optimal thing to do. However this breaks semantic
guarantee of left-to-right evaluation in D.
PR d/101640
gcc/d/ChangeLog:
* expr.cc (binary_op): Use build2 instead of fold_build2.
gcc/testsuite/ChangeLog:
* gdc.dg/pr96429.d: Update test.
* gdc.dg/pr101640.d: New test.
d: Compile-time reflection for supported built-ins (PR101127)
In order to allow user-code to determine whether a back-end builtin is
available without error, LANG_HOOKS_BUILTIN_FUNCTION_EXT_SCOPE has been
defined to delay putting back-end builtin functions until the ISA that
defines them has been declared.
However in D, there is no global namespace. All builtins get pushed
into the `gcc.builtins' module, which is constructed during the semantic
analysis pass, which has already finished by the time target attributes
are evaluated. So builtins are not pushed by the new langhook because
they would be ultimately ignored. Builtins exposed to D code then can
now only be altered by the command-line.
d: Change in DotTemplateExp type semantics leading to regression (PR101619)
By giving dot templates a type, meant that properry resolving silently
started passing for code that should never have passed. The simple fix
is to provide implementations for checkType and checkValue that give an
error about dot templates having neither a value nor type.
Jakub Jelinek [Tue, 27 Jul 2021 07:59:37 +0000 (09:59 +0200)]
gimple-fold: Fix up __builtin_clear_padding on classes with virtual inheritence [PR101586]
For the following testcase, B is 16-byte type, containing 8-byte
virtual pointer and 1-byte A member, and C contains two FIELD_DECLs,
one with B type and size of just 8-byte and then a field with type
A and 1-byte size.
The __builtin_clear_padding code was upset about the B typed FIELD_DECL
containing FIELD_DECLs beyond the field size and triggered
assertion failure.
This patch makes it ignore all FIELD_DECLs that are (fully) beyond the sz
passed from the caller (except for the flexible array member
diagnostics that is kept).
2021-07-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/101586
* gimple-fold.c (clear_padding_type): Ignore FIELD_DECLs with byte
positions above or equal to sz except for diagnostics of flexible
array members.
* g++.dg/torture/builtin-clear-padding-4.C: New test.
Jakub Jelinek [Fri, 23 Jul 2021 17:55:16 +0000 (19:55 +0200)]
expmed: Fix store_integral_bit_field [PR101562]
Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
This expression code is used in only one context: as the
destination operand of a 'set' expression. In addition, the
operand of this expression must be a non-paradoxical 'subreg'
expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.
The following patch fixes it by ensuring the requirement is satisfied.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.
Alan Modra [Tue, 29 Jun 2021 04:01:45 +0000 (13:31 +0930)]
[POWER10] __morestack calls from pcrel code
Compiling gcc/testsuite/gcc.dg/split-*.c and others with -mcpu=power10
and linking with a non-pcrel libgcc results in crashes due to the
power10 pcrel code not having r2 set for the generic-morestack.c
functions called from __morestack. There is also a problem when
non-pcrel code calls a pcrel libgcc. See the patch comments.
A similar situation theoretically occurs with ELFv1 multi-toc
executables, when __morestack might be located in a different toc
group to its caller. This patch makes no attempt to fix that, since
the gold linker does not support multi-toc (gold is needed for proper
support of -fsplit-stack code) nor does gcc emit __morestack calls
that support multi-toc.
* config/rs6000/morestack.S (R2_SAVE): Define.
(__morestack): Save and restore r2. Set up r2 for called
functions.