Jonathan Wakely [Tue, 27 Jul 2021 13:50:28 +0000 (14:50 +0100)]
libstdc++: Simplify std::optional::value()
The structure of these functions likely dates from the time before G++
fully supported C++14 extended constexpr, so that the throw expression
had to be the operand of a conditional expression. That is not true now,
so we can use a more straightforward version of the code.
We can also simplify the declaration of __throw_bad_optional_access by
using the C++11-style [[noreturn]] attribute so that a separate
declaration isn't needed.
libstdc++-v3/ChangeLog:
* include/experimental/optional (__throw_bad_optional_access):
Replace GNU attribute with C++11 attribute.
(optional::value, optional::value_or): Use if statements
instead of conditional expressions.
* include/std/optional (__throw_bad_optional_access)
(optional::value, optional::value_or): Likewise.
This is a new interface which is easily implemented using the
already existing code for the handling of the OMP_DISPLAY_ENV
environment variable.
libgomp/
* env.c (wait_policy, stacksize): New static variables,
move out of handle_omp_display_env.
(omp_display_env): New function. The meat of the old
handle_omp_display_env function.
(handle_omp_display_env): Change to not take parameters
and instead use the global variables. Only perform
parsing, defer to omp_display_env for the implementation.
(initialize_env): Remove local variables wait_policy and
stacksize. Don't pass parameters to handle_omp_display_env.
* fortran.c: Add ialias_redirect for omp_display_env.
(omp_display_env_, omp_display_env_8_): New functions.
* libgomp.map (OMP_5.1): New version. Add omp_display_env,
omp_display_env_, and omp_display_env_8_.
* omp.h.in: Declare omp_display_env.
* omp_lib.f90.in: Likewise.
* omp_lib.h.in: Likewise.
Abstract out (forward) jump threader state handling.
The *forward* jump threader has multiple places where it pushes and
pops state, and where it sets context up for the jump threading
simplifier callback. Not only are the idioms repetitive, but the only
reason for passing const_and_copies, avail_exprs_stack, and the evrp
engine around are so we can set up context.
As part of my jump threading work, I will divorce the evrp engine from
the DOM jump threader, replacing it with a subset of the path solver I
have just contributed. Since this will entail passing even more
context around, I've abstracted out the state handling so it can be
passed around in one object. This cleans up the code, and also makes
it trivial to set up context with another engine in the future.
FWIW, I've used these cleanups and the path solver in a POC to improve
DOM's threaded edges by an additional 5%, and the overall threading
opportunities in the compiler by 1%. This is in addition to the gains
I have documented in the backwards threader rewrite.
There are no functional changes with this patch.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-dom.c (dom_jump_threader_simplifier):
Put avail_exprs_stack in the class, instead of passing it to
jump_threader_simplifier.
(dom_jump_threader_simplifier::simplify): Add state argument.
(dom_opt_dom_walker): Add state.
(pass_dominator::execute): Pass state to threader.
(dom_opt_dom_walker::before_dom_children): Use state.
* tree-ssa-threadedge.c (jump_threader::jump_threader): Replace
arguments by state.
(jump_threader::record_temporary_equivalences_from_phis):
Register equivalences through the state variable.
(jump_threader::record_temporary_equivalences_from_stmts_at_dest):
Record ranges in a statement through the state variable.
(jump_threader::simplify_control_stmt_condition): Pass state to
simplify.
(jump_threader::simplify_control_stmt_condition_1): Same.
(jump_threader::thread_around_empty_blocks): Remove obsolete
comment.
(jump_threader::thread_through_normal_block): Record equivalences
on edge through the state variable.
(jump_threader::thread_across_edge): Abstract state pushing.
(jt_state::jt_state): New.
(jt_state::push): New.
(jt_state::pop): New.
(jt_state::register_equiv): New.
(jt_state::record_ranges_from_stmt): New.
(jt_state::register_equivs_on_edge): New.
(jump_threader_simplifier::jump_threader_simplifier): Move from
header.
(jump_threader_simplifier::simplify): Add state argument.
* tree-ssa-threadedge.h (class jt_state): New.
(class jump_threader): Add state to constructor.
(class jump_threader_simplifier): Add state to simplify. Remove
avail_exprs_stack from class.
* tree-vrp.c (vrp_jump_threader_simplifier::simplify): Add state
argument.
(vrp_jump_threader::vrp_jump_threader): Add state.
(vrp_jump_threader::~vrp_jump_threader): Cleanup state.
Marek Polacek [Fri, 16 Jul 2021 19:58:01 +0000 (15:58 -0400)]
c++: Reject ordered comparison of null pointers [PR99701]
When implementing DR 1512 in r11-467 I neglected to reject ordered
comparison of two null pointers, like nullptr < nullptr. This patch
fixes that omission.
DR 1512
PR c++/99701
gcc/cp/ChangeLog:
* cp-gimplify.c (cp_fold): Remove {LE,LT,GE,GT_EXPR} from
a switch.
* typeck.c (cp_build_binary_op): Reject ordered comparison
of two null pointers.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nullptr11.C: Remove invalid tests.
* g++.dg/cpp0x/nullptr46.C: Add dg-error.
* g++.dg/cpp2a/spaceship-err7.C: New test.
* g++.dg/expr/ptr-comp4.C: New test.
libstdc++-v3/ChangeLog:
* testsuite/20_util/tuple/comparison_operators/overloaded.cc:
Move a line...
* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
...here. New test.
Jonathan Wakely [Mon, 26 Jul 2021 14:08:00 +0000 (15:08 +0100)]
libstdc++: Move COW string definitions to separate header
This moves the definitions of the COW string to a separate file, so that
they don't need to be preprocessed for the common case. We could also
move the SSO string definitions to a new file, so that they don't need
to be preprocessed for the old ABI case, but that would require more
shovel work because there are some parts of <bits/basic_string.h> and
<bits/basic_string.tcc> that are common to both definitions.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/basic_string.h [!_GLIBCXX_USE_CXX11_ABI]
(basic_string): Move definition of Copy-on-Write string to
new file.
* include/bits/basic_string.tcc: Likewise.
* include/bits/cow_string.h: New file.
Jonathan Wakely [Thu, 22 Jul 2021 13:48:27 +0000 (14:48 +0100)]
libstdc++: Remove unnecessary uses of <utility>
The <algorithm> header includes <utility>, with a comment referring to
UK-300, a National Body comment on the C++11 draft. That comment
proposed to move std::swap to <utility> and then require <algorithm> to
include <utility>. The comment was rejected, so we do not need to
implement the suggestion. For backwards compatibility with C++03 we do
want <algorithm> to define std::swap, but it does so anyway via
<bits/move.h>. We don't need the whole of <utility> to do that.
A few other headers that need std::swap can include <bits/move.h> to
get it, instead of <utility>.
There are several headers that include <utility> to get std::pair, but
they can use <bits/stl_pair.h> to get it without also including the
rel_ops namespace and other contents of <utility>.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/algorithm: Do not include <utility>.
* include/std/functional: Likewise.
* include/std/regex: Include <bits/stl_pair.h> instead of
<utility>.
* include/debug/map.h: Likewise.
* include/debug/multimap.h: Likewise.
* include/debug/multiset.h: Likewise.
* include/debug/set.h: Likewise.
* include/debug/vector: Likewise.
* include/bits/fs_path.h: Likewise.
* include/bits/unique_ptr.h: Do not include <utility>.
* include/experimental/any: Likewise.
* include/experimental/executor: Likewise.
* include/experimental/memory: Likewise.
* include/experimental/optional: Likewise.
* include/experimental/socket: Use __exchange instead
of std::exchange.
* src/filesystem/ops-common.h: Likewise.
* testsuite/20_util/default_delete/48631_neg.cc: Adjust expected
errors to not use a hardcoded line number.
* testsuite/20_util/default_delete/void_neg.cc: Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc:
Include <utility> for std::as_const.
* testsuite/20_util/specialized_algorithms/uninitialized_default_construct/constrained.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.
* testsuite/20_util/specialized_algorithms/uninitialized_value_construct/constrained.cc:
Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.
Jonathan Wakely [Thu, 22 Jul 2021 13:48:27 +0000 (14:48 +0100)]
libstdc++: Reduce header dependencies on <array> and <utility>
This refactoring reduces the memory usage and compilation time to parse
a number of headers that depend on std::pair, std::tuple or std::array.
Previously the headers for these class templates were all intertwined,
due to the common dependency on std::tuple_size, std::tuple_element and
their std::get overloads. This decouples the headers by moving some
parts of <utility> into a new <bits/utility.h> header. This means that
<array> and <tuple> no longer need to include the whole of <utility>,
and <tuple> no longer needs to include <array>.
This decoupling benefits headers such as <thread> and <scoped_allocator>
which only need std::tuple, and so no longer have to parse std::array.
Some other headers such as <any>, <optional> and <variant> no longer
need to include <utility> just for the std::in_place tag types, so
do not have to parse the std::pair definitions.
Removing direct uses of <utility> also means that the std::rel_ops
namespace is not transitively declared by other headers.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add bits/utility.h header.
* include/Makefile.in: Regenerate.
* include/bits/utility.h: New file.
* include/std/utility (tuple_size, tuple_element): Move
to new header.
* include/std/type_traits (__is_tuple_like_impl<tuple<T...>>):
Move to <tuple>.
(_Index_tuple, _Build_index_tuple, integer_sequence): Likewise.
(in_place_t, in_place_index_t, in_place_type_t): Likewise.
* include/bits/ranges_util.h: Include new header instead of
<utility>.
* include/bits/stl_pair.h (tuple_size, tuple_element): Move
partial specializations for std::pair here.
(get): Move overloads for std::pair here.
* include/std/any: Include new header instead of <utility>.
* include/std/array: Likewise.
* include/std/memory_resource: Likewise.
* include/std/optional: Likewise.
* include/std/variant: Likewise.
* include/std/tuple: Likewise.
(__is_tuple_like_impl<tuple<T...>>): Move here.
(get) Declare overloads for std::array.
* include/std/version (__cpp_lib_tuples_by_type): Change type
to long.
* testsuite/20_util/optional/84601.cc: Include <utility>.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/constrained.cc:
Likewise.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.
* testsuite/std/ranges/access/cbegin.cc: Include <utility>.
* testsuite/std/ranges/access/cend.cc: Likewise.
* testsuite/std/ranges/access/end.cc: Likewise.
* testsuite/std/ranges/single_view.cc: Likewise.
Aldy Hernandez [Tue, 15 Jun 2021 10:20:43 +0000 (12:20 +0200)]
Implement basic block path solver.
This is is the main basic block path solver for use in the ranger-based
backwards threader. Given a path of BBs, the class can solve the final
conditional or any SSA name used in calculating the final conditional.
gcc/ChangeLog:
* Makefile.in (OBJS): Add gimple-range-path.o.
* gimple-range-path.cc: New file.
* gimple-range-path.h: New file.
As a general principle, vec_duplicate should be as close to the root
of an expression as possible. Where unary operations have
vec_duplicate as an argument, these operations should be pushed
inside the vec_duplicate.
This patch modifies unary operation simplification to push
sign/zero-extension of a scalar inside vec_duplicate.
This patch also updates all RTL patterns in aarch64-simd.md to use
the new canonical form.
gcc/ChangeLog:
2021-07-19 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd.md: Push sign/zero-extension
inside vec_duplicate for all patterns.
* simplify-rtx.c (simplify_context::simplify_unary_operation_1):
Push sign/zero-extension inside vec_duplicate.
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.
An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Clarify
sequencing of 'async' data copying vs. profiling events.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Likewise.
Richard Biener [Thu, 22 Jul 2021 10:26:16 +0000 (12:26 +0200)]
tree-optimization/101573 - improve uninit warning at -O0
We can improve uninit warnings from the early pass by looking
at PHI arguments on fallthru edges that are uninitialized and
have uses that are before a possible loop exit. This catches
some cases earlier that we'd only warn in a more confusing
way after early inlining as seen by testcase adjustments.
It introduces
FAIL: gcc.dg/uninit-23.c (test for excess errors)
where we additionally warn
gcc.dg/uninit-23.c:21:13: warning: 't4' is used uninitialized [-Wuninitialized]
which I think is OK even if it's not obvious that the new
warning is an improvement when you look at the obvious source.
Somehow for all cases I never get the `'foo' was declared here`
notes, I didn't dig why that happens but it's odd.
2021-07-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/101573
* tree-ssa-uninit.c (warn_uninit_phi_uses): New function
looking at uninitialized PHI arg defs in some constrained cases.
(warn_uninitialized_vars): Call it.
(execute_early_warn_uninitialized): Calculate dominators.
Richard Biener [Tue, 27 Jul 2021 07:24:57 +0000 (09:24 +0200)]
tree-optimization/39821 - fix cost classification for widening arith
This adjusts the vectorizer to cost vector_stmt for widening
arithmetic instead of vec_promote_demote in the line of telling
the target that stmt_info->stmt is the meaningful piece we cost.
2021-07-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/39821
* tree-vect-stmts.c (vect_model_promotion_demotion_cost): Use
vector_stmt for widening arithmetic.
(vectorizable_conversion): Adjust.
Martin Jambor [Tue, 27 Jul 2021 08:02:38 +0000 (10:02 +0200)]
ipa: Adjust references to identify read-only globals
this patch has been motivated by SPEC 2017's 544.nab_r in which there is
a static variable which is never written to and so zero throughout the
run-time of the benchmark. However, it is passed by reference to a
function in which it is read and (after some multiplications) passed
into __builtin_exp which in turn unnecessarily consumes almost 10% of
the total benchmark run-time. The situation is illustrated by the added
testcase remref-3.c.
The patch adds a flag to ipa-prop descriptor of each parameter to mark
such parameters. IPA-CP and inling then take the effort to remove
IPA_REF_ADDR references in the caller and only add IPA_REF_LOAD
reference to the clone/overall inlined function. This is sufficient
for subsequent symbol table analysis code to identify the read-only
variable as such and optimize the code.
There are two changes from the RFC version posted to the list earlier.
First, three missing calls to get_base_address were added (there was
another one in an assert). Second, references are not stripped off
the callers if the cloned function cannot change the signature. The
second change reveals a real shortcoming stemming from the fact we
cannot adjust function prototypes with fnspecs. But that is a more
general problem.
gcc/ChangeLog:
2021-07-20 Martin Jambor <mjambor@suse.cz>
* cgraph.h (ipa_replace_map): New field force_load_ref.
* ipa-prop.h (ipa_param_descriptor): Reduce precision of move_cost,
aded new flag load_dereferenced, adjusted comments.
(ipa_get_param_dereferenced): New function.
(ipa_set_param_dereferenced): Likewise.
* cgraphclones.c (cgraph_node::create_virtual_clone): Follow it.
* ipa-cp.c: Include gimple.h.
(ipcp_discover_new_direct_edges): Take into account dereferenced flag.
(get_replacement_map): New parameter force_load_ref, set the
appropriate flag in ipa_replace_map if set.
(struct symbol_and_index_together): New type.
(adjust_refs_in_act_callers): New function.
(adjust_references_in_caller): Likewise.
(create_specialized_node): When appropriate, call
adjust_references_in_caller and force only load references.
* ipa-prop.c (load_from_dereferenced_name): New function.
(ipa_analyze_controlled_uses): Also detect loads from a
dereference, harden testing of call statements.
(ipa_write_node_info): Stream the dereferenced flag.
(ipa_read_node_info): Likewise.
(ipa_set_jf_constant): Also create refdesc when jump function
references a variable.
(cgraph_node_for_jfunc): Rename to symtab_node_for_jfunc, work
also on references of variables and return a symtab_node. Adjust
all callers.
(propagate_controlled_uses): Also remove references to VAR_DECLs.
Jakub Jelinek [Tue, 27 Jul 2021 07:59:37 +0000 (09:59 +0200)]
gimple-fold: Fix up __builtin_clear_padding on classes with virtual inheritence [PR101586]
For the following testcase, B is 16-byte type, containing 8-byte
virtual pointer and 1-byte A member, and C contains two FIELD_DECLs,
one with B type and size of just 8-byte and then a field with type
A and 1-byte size.
The __builtin_clear_padding code was upset about the B typed FIELD_DECL
containing FIELD_DECLs beyond the field size and triggered
assertion failure.
This patch makes it ignore all FIELD_DECLs that are (fully) beyond the sz
passed from the caller (except for the flexible array member
diagnostics that is kept).
2021-07-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/101586
* gimple-fold.c (clear_padding_type): Ignore FIELD_DECLs with byte
positions above or equal to sz except for diagnostics of flexible
array members.
* g++.dg/torture/builtin-clear-padding-4.C: New test.
Roger Sayle [Mon, 26 Jul 2021 16:30:26 +0000 (17:30 +0100)]
Fold bswap32(x) != 0 to x != 0 (and related transforms)
This patch to match.pd implements several closely related folding
simplifications at the tree-level, that make use of the property
that bit permutation functions, rotate and bswap have inverses.
[1] bswap(X) eq/ne C, for constant C, simplifies to X eq/ne C'
where C'=bswap(C), generalizing the transform in the subject.
[2] bswap(X) eq/ne bswap(Y) simplifies to X eq/ne Y.
[3] lrotate(X,C1) eq/ne C2 simplifies to X eq/ne C3, where
C3 = rrotate(C2,C1), i.e. apply the inverse rotation to C2.
[4] Likewise, rrotate(X,C1) eq/ne C2 simplifies to X eq/ne C3,
where C3 = lrotate(C2,C1).
[5] rotate(X,Z) eq/ne rotate(Y,Z) simplifies to X eq/ne Y, when
the bit-count Z (the same on both sides) has no side-effects.
[6] rotate(X,Y) eq/ne 0 simplifies to X eq/ne 0 if Y has no
side-effects.
[7] Likewise, rotate(X,Y) eq/ne -1 simplifies to X eq/ne -1,
if Y has no side-effects.
2010-07-26 Roger Sayle <roger@nextmovesoftware.com>
Marc Glisse <marc.glisse@inria.fr>
gcc/ChangeLog
* match.pd (rotate): Simplify equality/inequality of rotations.
(bswap): Simplify equality/inequality tests of byte swapping.
gcc/testsuite/ChangeLog
* gcc.dg/fold-eqrotate-1.c: New test case.
* gcc.dg/fold-eqbswap-1.c: New test case.
* trans-decl.c (convert_CFI_desc): Only copy out the descriptor
if necessary.
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Updated attribute
handling which reflect a previous intermediate version of the
standard. Only copy out the descriptor if necessary.
libgfortran/ChangeLog:
* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Add code
to verify the descriptor. Correct bounds calculation.
(gfc_desc_to_cfi_desc): Add code to verify the descriptor.
gcc/testsuite/ChangeLog:
* gfortran.dg/ISO_Fortran_binding_1.f90: Add pointer attribute,
this test is still erroneous but now it compiles.
* gfortran.dg/bind_c_array_params_2.f90: Update regex to match
code changes.
* gfortran.dg/PR93308.f90: New test.
* gfortran.dg/PR93963.f90: New test.
* gfortran.dg/PR94327.c: New test.
* gfortran.dg/PR94327.f90: New test.
* gfortran.dg/PR94331.c: New test.
* gfortran.dg/PR94331.f90: New test.
* gfortran.dg/PR97046.f90: New test.
Abstract out conditional simplification out of execute_vrp.
VRP simplifies conditionals involving casted values outside of the main
folding mechanism, because this optimization inhibits the VRP jump
threader from threading through the comparison.
As part of replacing VRP with an evrp instance, I am making sure we do
everything VRP does. Hence, I am abstracting this functionality out so
we can call it from from elsewhere.
ISTM that when the proposed ranger-based jump threader can handle
everything the forward threader does, there will be no need for this
optimization to be done outside of the evrp folder. Perhaps we can fold
this into the substitute_using_ranges class. But that's further down
the line.
Also, there is no need to pass a vr_values around, when the base
range_query class will do. I fixed this, at it makes it trivial to pass
down a ranger or evrp instance.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-vrp.c (vrp_simplify_cond_using_ranges): Rename vr_values
with range_query.
(execute_vrp): Abstract out simplification of conditionals...
(simplify_casted_conds): ...here.
I have changed the use of the array_bounds_checker in VRP to use a
ranger in my local tree to make sure there are no regressions when using
either VRP or the ranger. In doing so I noticed that the checker
does not pass context to get_value_range, which causes the ranger to miss a
few cases. This patch fixes the oversight.
Tested on x86-64 Linux using the array bounds checker both with VRP and
the ranger.
Tamar Christina [Mon, 26 Jul 2021 09:22:23 +0000 (10:22 +0100)]
AArch64: correct usdot vectorizer and intrinsics optabs
There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON. The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.
This means we need different patterns here. This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.
There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SUSS,
aarch64_types_ternop_suss_qualifiers): New.
* config/aarch64/aarch64-simd-builtins.def (usdot_prod): Use it.
* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Re-organize RTL.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use it.
Jakub Jelinek [Mon, 26 Jul 2021 07:13:47 +0000 (09:13 +0200)]
openmp: Add support for omp attributes section and scan directives
This patch adds support for expressing the section and scan directives
using the attribute syntax and additionally fixes some bugs in the attribute
syntax directive handling.
For now it requires that the scan and section directives appear as the only
attribute, not combined with other OpenMP or non-OpenMP attributes on the same
statement.
2021-07-26 Jakub Jelinek <jakub@redhat.com>
* parser.h (struct cp_lexer): Add orphan_p member.
* parser.c (cp_parser_statement): Don't change in_omp_attribute_pragma
upon restart from CPP_PRAGMA handling. Fix up condition when a lexer
should be destroyed and adjust saved_tokens if it records tokens from
the to be destroyed lexer.
(cp_parser_omp_section_scan): New function.
(cp_parser_omp_scan_loop_body): Use it. If
parser->lexer->in_omp_attribute_pragma, allow optional comma
after scan.
(cp_parser_omp_sections_scope): Use cp_parser_omp_section_scan.
* g++.dg/gomp/attrs-1.C: Use attribute syntax even for section
and scan directives.
* g++.dg/gomp/attrs-2.C: Likewise.
* g++.dg/gomp/attrs-6.C: New test.
* g++.dg/gomp/attrs-7.C: New test.
* g++.dg/gomp/attrs-8.C: New test.
Marek Polacek [Tue, 20 Jul 2021 20:26:28 +0000 (16:26 -0400)]
include: Fix -Wundef warnings in ansidecl.h
This quashes -Wundef warnings in ansidecl.h when compiled in C or C++.
In C, __cpp_constexpr and __cplusplus aren't defined so we evaluate
them to 0; conversely, __STDC_VERSION__ is not defined in C++.
This has caused grief when -Wundef is used with -Werror.
I've also tested -traditional-cpp.
include/ChangeLog:
* ansidecl.h: Check if __cplusplus is defined before checking
the value of __cpp_constexpr and __cplusplus. Don't check
__STDC_VERSION__ in C++.
Jakub Jelinek [Fri, 23 Jul 2021 17:55:16 +0000 (19:55 +0200)]
expmed: Fix store_integral_bit_field [PR101562]
Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
This expression code is used in only one context: as the
destination operand of a 'set' expression. In addition, the
operand of this expression must be a non-paradoxical 'subreg'
expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.
The following patch fixes it by ensuring the requirement is satisfied.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/101562
* expmed.c (store_integral_bit_field): Only use movstrict_optab
if the operand isn't paradoxical.
Now that all dependencies of array_bounds_checker take a range_query, we
can sever the relationship with vr_values. Changing this will allow us
to use the array_bounds_checker with VRP, evrp, or the ranger.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-array-bounds.h (class array_bounds_checker): Change
ranges type to range_query.
Jonathan Wright [Fri, 23 Jul 2021 12:41:39 +0000 (13:41 +0100)]
aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x2
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.
gcc/ChangeLog:
2021-07-23 Jonathan Wright <jonathan.wright@arm.com>
Jonathan Wright [Fri, 23 Jul 2021 11:41:05 +0000 (12:41 +0100)]
aarch64: Use memcpy to copy vector tables in vst1[q]_x3 intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x3
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x3 intrinsics.
gcc/ChangeLog:
2021-07-23 Jonathan Wright <jonathan.wright@arm.com>
Jonathan Wright [Wed, 21 Jul 2021 11:37:01 +0000 (12:37 +0100)]
aarch64: Use memcpy to copy vector tables in vst2[q] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst2[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst2q intrinsics.
gcc/ChangeLog:
2021-07-21 Jonathan Wrightt <jonathan.wright@arm.com>
Jonathan Wright [Wed, 21 Jul 2021 09:55:00 +0000 (10:55 +0100)]
aarch64: Use memcpy to copy vector tables in vst3[q] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst3[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst3q intrinsics.
gcc/ChangeLog:
2021-07-21 Jonathan Wright <jonathan.wright@arm.com>
Jonathan Wright [Tue, 20 Jul 2021 09:28:34 +0000 (10:28 +0100)]
aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst4[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst4q intrinsics.
gcc/ChangeLog:
2021-07-20 Jonathan Wright <jonathan.wright@arm.com>
Jonathan Wright [Thu, 8 Jul 2021 22:27:54 +0000 (23:27 +0100)]
aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbx4
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
gcc/ChangeLog:
2021-07-19 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbx4_u8): Likewise.
(vtbx4_p8): Likewise.
Jonathan Wright [Thu, 8 Jul 2021 22:27:54 +0000 (23:27 +0100)]
aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbl[34]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
gcc/ChangeLog:
2021-07-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vtbl3_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vtbl3_u8): Likewise.
(vtbl3_p8): Likewise.
(vtbl4_s8): Likewise.
(vtbl4_u8): Likewise.
(vtbl4_p8): Likewise.
Jonathan Wright [Thu, 8 Jul 2021 11:32:45 +0000 (12:32 +0100)]
aarch64: Use memcpy to copy vector tables in vqtbx[234] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbx[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbx[234] intrinsics.
gcc/ChangeLog:
2021-07-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vqtbx2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbx2_u8): Likewise.
(vqtbx2_p8): Likewise.
(vqtbx2q_s8): Likewise.
(vqtbx2q_u8): Likewise.
(vqtbx2q_p8): Likewise.
(vqtbx3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbx3_u8): Likewise.
(vqtbx3_p8): Likewise.
(vqtbx3q_s8): Likewise.
(vqtbx3q_u8): Likewise.
(vqtbx3q_p8): Likewise.
(vqtbx4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbx4_u8): Likewise.
(vqtbx4_p8): Likewise.
(vqtbx4q_s8): Likewise.
(vqtbx4q_u8): Likewise.
(vqtbx4q_p8): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: New tests.
Jonathan Wright [Tue, 6 Jul 2021 15:20:02 +0000 (16:20 +0100)]
aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbl[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.
Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.
gcc/ChangeLog:
2021-07-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
instead of constructing __builtin_aarch64_simd_oi one vector
at a time.
(vqtbl2_u8): Likewise.
(vqtbl2_p8): Likewise.
(vqtbl2q_s8): Likewise.
(vqtbl2q_u8): Likewise.
(vqtbl2q_p8): Likewise.
(vqtbl3_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_ci one vector at a time.
(vqtbl3_u8): Likewise.
(vqtbl3_p8): Likewise.
(vqtbl3q_s8): Likewise.
(vqtbl3q_u8): Likewise.
(vqtbl3q_p8): Likewise.
(vqtbl4_s8): Use __builtin_memcpy instead of constructing
__builtin_aarch64_simd_xi one vector at a time.
(vqtbl4_u8): Likewise.
(vqtbl4_p8): Likewise.
(vqtbl4q_s8): Likewise.
(vqtbl4q_u8): Likewise.
(vqtbl4q_p8): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector_structure_intrinsics.c: New test.
Jonathan Wakely [Fri, 23 Jul 2021 10:03:23 +0000 (11:03 +0100)]
libstdc++: Update documentation comments for namespace rel_ops
The comments in <bits/stl_relops.h> describe problems that were solved
years ago (for GCC 3.1). The comparison operators in <iterator> are no
longer ambiguous with the rel_ops ones, so the linked mailing list
thread and FAQ entry aren't relevant now. The reference to std_utility.h
is also outdated as it's just called utility now, both in the source
tree and when installed.
The use of rel_ops is still frowned upon though, so replace the
discussion of ambiguities within libstdc++ headers with adminition about
using rel_ops in user code.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
Jakub Jelinek [Fri, 23 Jul 2021 07:50:15 +0000 (09:50 +0200)]
openmp: Add support for __has_attribute(omp::directive) and __has_attribute(omp::sequence)
Now that the C++ FE supports these attributes, but not through registering
them in the attributes tables (they work quite differently from other
attributes), this teaches c_common_has_attributes about those.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
* c-lex.c (c_common_has_attribute): Call canonicalize_attr_name also
on attr_id. Return 1 for omp::directive or omp::sequence in C++11
and later.
* c-c++-common/gomp/attrs-1.c: New test.
* c-c++-common/gomp/attrs-2.c: New test.
* c-c++-common/gomp/attrs-3.c: New test.
Jakub Jelinek [Fri, 23 Jul 2021 07:37:36 +0000 (09:37 +0200)]
openmp: Diagnose invalid mixing of the attribute and pragma syntax directives
The OpenMP 5.1 spec says that the attribute and pragma syntax directives
should not be mixed on the same statement. The following patch adds diagnostic
for that,
[[omp::directive (...)]]
#pragma omp ...
is always an error and for the other order
#pragma omp ...
[[omp::directive (...)]]
it depends on whether the pragma directive is an OpenMP construct
(then it is an error because it needs a structured block or loop
or statement as body) or e.g. a standalone directive (then it is fine).
Only block scope is handled for now though, namespace scope and class scope
still needs implementing even the basic support.
2021-07-23 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP__START_ and
PRAGMA_OMP__LAST_ enumerators.
gcc/cp/
* parser.h (struct cp_parser): Add omp_attrs_forbidden_p member.
* parser.c (cp_parser_handle_statement_omp_attributes): Diagnose
mixing of attribute and pragma syntax directives when seeing
omp::directive if parser->omp_attrs_forbidden_p or if attribute syntax
directives are followed by OpenMP pragma.
(cp_parser_statement): Clear parser->omp_attrs_forbidden_p after
the cp_parser_handle_statement_omp_attributes call.
(cp_parser_omp_structured_block): Add disallow_omp_attrs argument,
if true, set parser->omp_attrs_forbidden_p.
(cp_parser_omp_scan_loop_body, cp_parser_omp_sections_scope): Pass
false as disallow_omp_attrs to cp_parser_omp_structured_block.
(cp_parser_omp_parallel, cp_parser_omp_task): Set
parser->omp_attrs_forbidden_p.
gcc/testsuite/
* g++.dg/gomp/attrs-4.C: New test.
* g++.dg/gomp/attrs-5.C: New test.
Bind(c): signed char is not a Fortran character type
CFI_allocate and CFI_select_part were incorrectly treating
CFI_type_signed_char as a Fortran character type for the purpose of
deciding whether or not to use the elem_len argument. It is a Fortran
integer type per table 18.2 in the 2018 Fortran standard.
Other functions in ISO_Fortran_binding.c appeared to handle this case
correctly already.
Jonathan Wakely [Thu, 22 Jul 2021 17:49:57 +0000 (18:49 +0100)]
libstdc++: Fix non-default constructors for hash containers [PR101583]
When I added the new mixin to _Hashtable, I forgot to explicitly
construct it in each non-default constructor. That means you can't
use any constructors unless all three of the hash function, equality
function, and allocator are all default constructible.
libstdc++-v3/ChangeLog:
PR libstdc++/101583
* include/bits/hashtable.h (_Hashtable): Replace mixin with
_Enable_default_ctor. Construct it explicitly in all
non-forwarding, non-defaulted constructors.
* testsuite/23_containers/unordered_map/cons/default.cc: Check
non-default constructors can be used.
* testsuite/23_containers/unordered_set/cons/default.cc:
Likewise.
Andrew Pinski [Tue, 20 Jul 2021 18:25:43 +0000 (11:25 -0700)]
Fix PR 10153: tail recusion for vector types.
The problem here is we try to an initialized value
from a scalar constant. For vectors we need to do
a vect_dup instead. This fixes that issue by using
build_{one,zero}_cst instead of integer_{one,zero}_node
when calling create_tailcall_accumulator.
Changes from v1:
* v2: Use build_{one,zero}_cst and get the correct type before.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/10153
* tree-tailcall.c (create_tailcall_accumulator):
Don't call fold_convert as the type should be correct already.
(tree_optimize_tail_calls_1): Use build_{one,zero}_cst instead
of integer_{one,zero}_node for the call of create_tailcall_accumulator.
gcc/testsuite/ChangeLog:
PR tree-optimization/10153
* gcc.c-torture/compile/pr10153-1.c: New test.
* gcc.c-torture/compile/pr10153-2.c: New test.
Allow non-null adjustments for pointers even when there is a known range.
Fix non_null_ref::adjust_range so it always adjust ranges, not just
varying ranges. This will allow pointers that have a range, but are not
necessarily non-null, to be adjusted.
gcc/ChangeLog:
* gimple-range-cache.cc (non_null_ref::adjust_range): Replace
varying_p check for null/non-null check.
David Edelsohn [Wed, 21 Jul 2021 18:06:45 +0000 (14:06 -0400)]
aix: Protect AIX math.h overloads with new macro.
AIX math.h provides C++ overloaded inlined math functions, which should
not be present for G++. The definitions have been guaded by
__COMPATMATH__, but that macro had other uses in IBM xlC++. A new
macro has been introduced with the sole purpose of guarding the functions.
This patch updates libstdc++ os_defines.h to define the additional macro.
The earlier macro definition is retained to guard the functions in the
math.h header of earlier AIX releases.
Jonathan Wakely [Thu, 22 Jul 2021 13:38:34 +0000 (14:38 +0100)]
libstdc++: Use __builtin_operator_new when available [PR94295]
Clang provides __builtin_operator_new and __builtin_operator_delete,
which have the same semantics as ::operator new and ::operator delete
except that the compiler is allowed to elide calls to them. This changes
std::allocator to use those built-in functions so that memory allocated
by std::allocator can be optimized away when using Clang. This avoids an
abstraction penalty for using std::allocator to allocate storage rather
than a new-expression.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/94295
* include/ext/new_allocator.h (_GLIBCXX_OPERATOR_NEW)
(_GLIBCXX_OPERATOR_DELETE, _GLIBCXX_SIZED_DEALLOC): Define.
(allocator::allocate, allocator::deallocate): Use new macros.
Jonathan Wakely [Thu, 22 Jul 2021 13:37:24 +0000 (14:37 +0100)]
libstdc++: Use std::addressof in ranges::uninitialized_xxx [PR101571]
Make the ranges::uninitialized_xxx algorithms use std::addressof to
protect against iterator types that overload operator&.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/101571
* include/bits/ranges_uninitialized.h (_DestroyGuard): Change
constructor parameter to reference and use addressof.
* testsuite/util/testsuite_iterators.h: Define deleted operator&
overloads for test iterators.
Jonathan Wakely [Thu, 22 Jul 2021 10:57:38 +0000 (11:57 +0100)]
libstdc++: Initialize all subobjects of std::function
The std::function::swap member swaps each data member unconditionally,
resulting in -Wmaybe-uninitialized warnings for a default constructed
object. This happens because the _M_invoker and _M_functor members are
only initialized if the function has a target.
This change ensures that all subobjects are zero-initialized on
construction.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/std_function.h (_Function_base): Add
default member initializers and define constructor as defaulted.
(function::_M_invoker): Add default member initializer.
Jonathan Wakely [Thu, 22 Jul 2021 10:45:32 +0000 (11:45 +0100)]
libstdc++: Restore __gnu_debug::array [PR100682]
As the PR points out, we removed the debug version of std::array without
any period of deprecation. Although std::array contains all the actual
debug checks now, removing the <debug/arrray> header breaks any code
that was using that explicitly. The manual still lists doing that as
supported.
This restores the <debug/array> header, but simply defines
__gnu_debug::array as an alias for std::array, and declares the alias
with the deprecated attribute. The docs are updated to match.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/100682
* doc/xml/manual/debug_mode.xml: Update documentation about
debug capability of std::array.
* doc/html/*: Regenerate.
* include/debug/array: New file.
Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.
By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.
Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.
Support logic shift left/right for avx512 mask type.
gcc/ChangeLog:
* config/i386/constraints.md (Wb): New constraint.
(Ww): Ditto.
* config/i386/i386.md (*ashlhi3_1): Extend to avx512 mask
shift.
(*ashlqi3_1): Ditto.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshr<mode>3_1): and this, also extend this pattern to avx512
mask registers.
(*<insn><mode>3_1): Split to ..
(*ashr<mode>3_1): this, ...
(*lshrqi3_1): and this, also extend this pattern to avx512
mask registers.
(*lshrhi3_1): And this, also extend this pattern to avx512
mask registers.
* config/i386/sse.md (k<code><mode>): New define_split after
it to convert generic shift pattern to mask shift ones.
David Malcolm [Wed, 21 Jul 2021 21:24:08 +0000 (17:24 -0400)]
analyzer: fix issues with phi handling
The analyzer's state purging code was overzealously purging state
for ssa names that might be used within phi nodes, leading to
false positives from -Wanalyzer-use-of-uninitialized-value.
This patch updates phi handling in the analyzer to fix these issues.
gcc/analyzer/ChangeLog:
* region-model.cc (region_model::handle_phi): Add "old_state"
param and use it.
(region_model::update_for_phis): Update so that all of the phi
stmts are effectively handled simultaneously, rather than in
order.
* region-model.h (region_model::handle_phi): Add "old_state"
param.
* state-purge.cc (self_referential_phi_p): Replace with...
(name_used_by_phis_p): ...this new function.
(state_purge_per_ssa_name::process_point): Update to use the
above, so that all phi stmts at a basic block are effectively
considered simultaneously, and only consider the phi arguments for
the pertinent in-edge.
* supergraph.cc (cfg_superedge::get_phi_arg_idx): New.
(cfg_superedge::get_phi_arg): Use the above.
* supergraph.h (cfg_superedge::get_phi_arg_idx): New decl.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/explode-2.c: Remove xfail.
* gcc.dg/analyzer/explode-2a.c: Remove expected leak warning on
while stmt.
* gcc.dg/analyzer/phi-2.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 21 Jul 2021 21:22:45 +0000 (17:22 -0400)]
analyzer: fixes to -fdump-analyzer-state-purge for phi nodes
gcc/analyzer/ChangeLog:
* state-purge.cc (state_purge_annotator::add_node_annotations):
Rather than erroneously always using the NULL in-edge, determine
each relevant in-edge, and print the appropriate data for each
in-edge. Use print_needed to print the data as comma-separated
lists of SSA names.
(print_vec_of_names): Add "within_table" param and use it.
(state_purge_annotator::add_stmt_annotations): Factor out
collation and printing code into...
(state_purge_annotator::print_needed): ...this new function.
* state-purge.h (state_purge_annotator::print_needed): New decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 21 Jul 2021 21:19:31 +0000 (17:19 -0400)]
analyzer: tweak dumping of min_expr/max_expr
gcc/analyzer/ChangeLog:
* svalue.cc (infix_p): New.
(binop_svalue::dump_to_pp): Use it to print MIN_EXPR and MAX_EXPR
in prefix form, rather than infix.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Bill Schmidt [Wed, 21 Jul 2021 12:39:37 +0000 (08:39 -0400)]
rs6000: Parsing of overload input file
2021-06-07 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-gen-builtins.c (ovld_stanza): New struct.
(MAXOVLDSTANZAS): New macro.
(ovld_stanzas): New variable.
(curr_ovld_stanza): Likewise.
(MAXOVLDS): New macro.
(ovlddata): New struct.
(ovlds): New variable.
(curr_ovld): Likewise.
(max_ovld_args): Likewise.
(parse_ovld_entry): New function.
(parse_ovld_stanza): Likewise.
(parse_ovld): Implement.