Joel Phillips [Tue, 23 Aug 2022 15:11:00 +0000 (16:11 +0100)]
gccrs: Add Lexer for Rust front-end
The lexer is referred to as a ManagedTokenSource within the parser. This
lexer does not currently support Unicode, but serves as a starting point
to do so.
Co-authored-by: Philip Herron <philip.herron@embecosm.com> Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com> Co-authored-by: Mark Wielaard <mark@klomp.org> Signed-off-by: Joel Phillips <simplytheother@gmail.com>
Co-authored-by: Philip Herron <philip.herron@embecosm.com> Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com> Signed-off-by: Joel Phillips <simplytheother@gmail.com>
Joel Phillips [Fri, 21 Oct 2022 11:34:11 +0000 (13:34 +0200)]
gccrs: Add Rust front-end base AST data structures
This is a full C++11 class hierarchy representing the Rust AST. We do not
allow dynamic_cast and so the main mechanism to work with the AST is by
using the visitor interface. Slowly we are adding TREE_CODE style node
types to the AST which will allow for more ways to work with the AST but
for now this is it.
Co-authored-by: Philip Herron <philip.herron@embecosm.com> Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com> Signed-off-by: Joel Phillips <simplytheother@gmail.com>
Philip Herron [Tue, 23 Aug 2022 15:02:25 +0000 (16:02 +0100)]
gccrs: Add execution test cases
This is similar to the compile/torture/*.rs test cases, but all of these are
dg-execute testcases. They are compiled, linked and executed by default. These
testcases are also compiled with the matrix of torture options.
The only caveat here is that gccrs does not currently support the main shim,
so we have a C-style main function here returning zero which is not proper Rust
code.
Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com> Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> Co-authored-by: Mark Wielaard <mark@klomp.org> Co-authored-by: Marc Poulhiès <dkm@kataplop.net>
Philip Herron [Tue, 23 Aug 2022 15:01:16 +0000 (16:01 +0100)]
gccrs: Add general compilation test cases
This suite of tests has two sections: compile/*.rs and compile/torture/*.rs.
The first section is comprised entirely of dg-compile tests, containing a
mixture of dg-warning and dg-error annotations and some with no annotations,
ensuring the creation of resulting asm output. The second section is the same,
but has tests which are ran with the full torture options, for coverage of test
cases that may have an issue with a specific optimization level.
Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com> Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> Co-authored-by: Mark Wielaard <mark@klomp.org> Co-authored-by: Marc Poulhiès <dkm@kataplop.net>
Philip Herron [Tue, 23 Aug 2022 15:00:51 +0000 (16:00 +0100)]
gccrs: Add link cases testsuite
This testsuite is heavily inspired from the LTO testsuite that uses a
pattern where each file is compiled to an object file and finally linked
together. Since Rust does not have headers/prototypes, we rely on the
ordering here so that all files numbered greater than zero get compiled to
object files first. This leaves the _0 file free to test the 'extern crate' and
'use' keywords to force testing of the compiler to read metadata from the
other 'crates'.
Richard Biener [Mon, 12 Dec 2022 16:52:46 +0000 (17:52 +0100)]
tree-optimization/108076 - if-conversion and forced labels
When doing if-conversion we simply throw away labels without checking
whether they are possibly targets of non-local gotos or have their
address taken. The following rectifies this and refuses to if-convert
such loops.
PR tree-optimization/108076
* tree-if-conv.cc (if_convertible_loop_p_1): Reject blocks
with non-local or forced labels that we later remove
labels from.
Jakub Jelinek [Tue, 13 Dec 2022 09:30:36 +0000 (10:30 +0100)]
libsanitizer: Fix up libbacktrace build after r13-4547 [PR108072]
The r13-4547 commit added new non-static function to libbacktrace:
backtrace_uncompress_zstd but for the libsanitizer use we need to
rename it, so that it is in __asan_* namespace and doesn't clash
with other copies of libbacktrace.
Haochen Gui [Tue, 13 Dec 2022 08:45:10 +0000 (16:45 +0800)]
rs6000: enable cbranchcc4
This patch enables "have_cbranchcc4" on rs6000 by defining a
"cbranchcc4" expander. "have_cbrnachcc4" is a flag in ifcvt.cc to
indicate if branching by CC bits is valid or not. With this flag
enabled, some branches can be optimized to conditional moves.
2022-12-07 Haochen Gui <guihaoc@linux.ibm.com>
gcc/
* config/rs6000/rs6000.md (cbranchcc4): New expander.
Haochen Gui [Thu, 8 Dec 2022 05:22:29 +0000 (13:22 +0800)]
optabs: make prepare_cmp_insn goto fail when cbranchcc4 checks unsatisfied
prepare_cmp_insn is a help function to generate comparison rtx.
It should not assume that cbranchcc4 exists and all sub-CC modes
are supported on a target. When the check fails, it could go to
fail and return a NULL rtx as its callers check the return value
for CCmode.
The test case (gcc.target/powerpc/cbranchcc4-1.c) which covers
failure path will be committed with an rs6000 specific patch.
2022-12-05 Haochen Gui <guihaoc@linux.ibm.com>
gcc/
* optabs.cc (prepare_cmp_insn): Return a NULL rtx other than
assertion failure when targets don't have cbranch optab or
predicate check fails.
Ian Lance Taylor [Mon, 12 Dec 2022 20:46:40 +0000 (12:46 -0800)]
libgo: bump major version
PR go/108057
The current version is the same as for the previous GCC release,
but there have been minor changes like new type descriptors that
make it impossible to run Go programs built with the previous GCC
release with the current libgo.
Harald Anlauf [Sun, 11 Dec 2022 22:24:03 +0000 (23:24 +0100)]
Fortran: improve checking of assumed-size array spec [PR102180]
gcc/fortran/ChangeLog:
PR fortran/102180
* array.cc (match_array_element_spec): Add check for bad
assumed-implied-spec.
(gfc_match_array_spec): Reorder logic so that the first bad array
element spec may trigger an error.
gcc/testsuite/ChangeLog:
PR fortran/102180
* gfortran.dg/pr102180.f90: New test.
Iain Buclaw [Sat, 10 Dec 2022 21:11:41 +0000 (22:11 +0100)]
d: Fix undefined reference to nested lambda in template (PR108055)
Sometimes, nested lambdas of templated functions get no code generation
due to them being marked as instantianted outside of all modules being
compiled in the current compilation unit. This despite enclosing
template instances being marked as instantiated inside the current
compilation unit. To fix, all enclosing templates are now checked in
`function_defined_in_root_p'.
Because of this change, `function_needs_inline_definition_p' has also
been fixed up to only check whether the regular function definition
itself is to be emitted in the current compilation unit.
PR d/108055
gcc/d/ChangeLog:
* decl.cc (function_defined_in_root_p): Check all enclosing template
instances for definition in a root module.
(function_needs_inline_definition_p): Replace call to
function_defined_in_root_p with test for outer module `isRoot'.
Wilco Dijkstra [Mon, 12 Dec 2022 15:44:03 +0000 (15:44 +0000)]
AArch64: Enable TARGET_CONST_ANCHOR
Enable TARGET_CONST_ANCHOR to allow complex constants to be created via
immediate add/sub. Use a 24-bit range as that enables a 3 or 4-instruction
immediate to be replaced by 2 add/sub instructions. Fix the costing of
add/sub to support 24-bit and 12-bit shifted immediates.
The generated code for the testcase is now the same or better than LLVM.
It also results in a small codesize reduction on SPEC.
gcc/
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Add correct costs
for 24-bit and 12-bit shifted immediate add/sub.
(TARGET_CONST_ANCHOR): Define.
* config/aarch64/predicates.md (aarch64_pluslong_immediate):
Fix range check.
gcc/testsuite/
* gcc.target/aarch64/movk_3.c: New test.
Tamar Christina [Mon, 12 Dec 2022 15:20:30 +0000 (15:20 +0000)]
AArch64: Fix vector re-interpretation between partial SIMD modes
While writing a patch series I started getting incorrect codegen out from
VEC_PERM on partial struct types.
It turns out that this was happening because the TARGET_CAN_CHANGE_MODE_CLASS
implementation has a slight bug in it. The hook only checked for SIMD to
Partial but never Partial to SIMD. This resulted in incorrect subregs to be
generated from the fallback code in VEC_PERM_EXPR expansions.
I have unfortunately not been able to trigger it using a standalone testcase as
the mid-end optimizes away the permute every time I try to describe a permute
that would result in the bug.
The patch now rejects any conversion of partial SIMD struct types, unless they
are both partial structures of the same number of registers or one is a SIMD
type who's size is less than 8 bytes.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_can_change_mode_class): Restrict
conversions between partial struct types properly.
Tamar Christina [Mon, 12 Dec 2022 15:18:56 +0000 (15:18 +0000)]
AArch64: Support new tbranch optab.
This implements the new tbranch optab for AArch64.
we cannot emit one big RTL for the final instruction immediately.
The reason that all comparisons in the AArch64 backend expand to separate CC
compares, and separate testing of the operands is for ifcvt.
The separate CC compare is needed so ifcvt can produce csel, cset etc from the
compares. Unlike say combine, ifcvt can not do recog on a parallel with a
clobber. Should we emit the instruction directly then ifcvt will not be able
to say, make a csel, because we have no patterns which handle zero_extract and
compare. (unlike combine ifcvt cannot transform the extract into an AND).
While you could provide various patterns for this (and I did try) you end up
with broken patterns because you can't add the clobber to the CC register. If
you do, ifcvt recog fails.
i.e.
int
f1 (int x)
{
if (x & 1)
return 1;
return x;
}
We lose csel here.
Secondly the reason the compare with an explicit CC mode is needed is so that
ifcvt can transform the operation into a version that doesn't require the flags
to be set. But it only does so if it know the explicit usage of the CC reg.
For instance
int
foo (int a, int b)
{
return ((a & (1 << 25)) ? 5 : 4);
}
Doesn't require a comparison, the optimal form is:
and no compare is actually needed. If you represent the instruction using an
ANDS instead of a zero_extract then you get close, but you end up with an ands
followed by an add, which is a slower operation.
Tamar Christina [Mon, 12 Dec 2022 15:16:50 +0000 (15:16 +0000)]
middle-end: Add new tbranch optab to add support for bit-test-and-branch operations
This adds a new test-and-branch optab that can be used to do a conditional test
of a bit and branch. This is similar to the cbranch optab but instead can
test any arbitrary bit inside the register.
This patch recognizes boolean comparisons and single bit mask tests.
gcc/ChangeLog:
* dojump.cc (do_jump): Pass along value.
(do_jump_by_parts_greater_rtx): Likewise.
(do_jump_by_parts_zero_rtx): Likewise.
(do_jump_by_parts_equality_rtx): Likewise.
(do_compare_rtx_and_jump): Likewise.
(do_compare_and_jump): Likewise.
* dojump.h (do_compare_rtx_and_jump): New.
* optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check.
(validate_test_and_branch): New.
(emit_cmp_and_jump_insns): Optiobally take a value, and when value is
supplied then check if it's suitable for tbranch.
* optabs.def (tbranch_eq$a4, tbranch_ne$a4): New.
* doc/md.texi (tbranch_@var{op}@var{mode}4): Document it.
* optabs.h (emit_cmp_and_jump_insns): New.
* tree.h (tree_zero_one_valued_p): New.
Tamar Christina [Mon, 12 Dec 2022 15:15:07 +0000 (15:15 +0000)]
aarch64: Make existing V2HF be usable.
The backend has an existing V2HFmode that is used by pairwise operations.
This mode was however never made fully functional. Amongst other things it was
never declared as a vector type which made it unusable from the mid-end.
It's also lacking an implementation for load/stores so reload ICEs if this mode
is every used. This finishes the implementation by providing the above.
Note that I have created a new iterator VHSDF_P instead of extending VHSDF
because the previous iterator is used in far more things than just load/stores.
It's also used for instance in intrinsics and extending this would force me to
provide support for mangling the type while we never expose it through
intrinsics.
Jonathan Wakely [Mon, 12 Dec 2022 12:51:49 +0000 (12:51 +0000)]
libstdc++: Add a test checking for chrono::duration overflows
This test fails if chrono::days::rep or chrono::years::rep is a 32-bit
type, because a large days or years value silently overflows a 32-bit
integer when converted to seconds. It would be conforming to implement
chrono::days as chrono::duration<int32_t, ratio<86400>>, but would make
this overflow case more likely. Similarly for chrono::years,
chrono::months and chrono::weeks. This test is here to remind us not to
make that change lightly.
libstdc++-v3/ChangeLog:
* testsuite/20_util/duration/arithmetic/overflow_c++20.cc: New
test.
Jonathan Wakely [Mon, 12 Dec 2022 11:22:45 +0000 (11:22 +0000)]
libstdc++: Change names that clash with Win32 or Clang
Clang now defines an __is_unsigned built-in, and Windows defines an
_Out_ macro. Replace uses of those as identifiers.
There might also be a problem with __is_signed, which we use in several
places.
libstdc++-v3/ChangeLog:
* include/std/chrono (hh_mm_ss): Rename __is_unsigned member to
_S_is_unsigned.
* include/std/format (basic_format_context): Rename _Out_
template parameter to _Out2.
* testsuite/17_intro/names.cc: Add Windows SAL annotation
macros.
Kyrylo Tkachov [Mon, 12 Dec 2022 11:07:45 +0000 (11:07 +0000)]
aarch64: Add __ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI ACLE defines
Recent ACLE additions specified the __ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI macros [1] that the compiler
should define when the pointer authentication and BTI instructions are available (and don't act as NOPs).
We've received requests to enable them in GCC for aarch64, similar to clang [2].
It's a fairly simple patch and should be non-intrusive at this stage.
Pointer authentication has its own "pauth" feature flag, whereas BTI depends on an architecture level
of Armv8.5-a or later.
Bootstrapped and tested on aarch64-none-linux-gnu.
Richard Biener [Mon, 12 Dec 2022 07:56:41 +0000 (08:56 +0100)]
Revert parts of ADDR_EXPR/CONSTRUCTOR treatment change in match.pd
This reverts the part that substitutes from the definition of an
SSA name to the capture, thus ADDR_EXPR@0 eventually yielding
&y_1->a[i_2] instead of _3. That's because I didn't think of
how to deal with substituting @0 in the result pattern. So
the following re-instantiates the SSA def CONSTRUCTOR handling
and in the ADDR_EXPR helpers used by match.pd handles SSA names
defined to ADDR_EXPRs transparently.
* genmatch.cc (dt_simplify::gen): Revert last change.
* match.pd: Revert simplification of CONSTUCTOR leaf handling.
(&x cmp SSA_NAME): Handle ADDR_EXPR in SSA defs.
* fold-const.cc (split_address_to_core_and_offset): Handle
ADDR_EXPRs in SSA defs.
(address_compare): Likewise.
Richard Biener [Mon, 12 Dec 2022 07:13:33 +0000 (08:13 +0100)]
tree-optimization/89317 - another pattern for &p->x != p + 4
As seen in the original testcase for PR89317 we are missing
comparison simplification patterns for &p->x != p + 4. Fixed
by making an existing one apply. To make the pattern apply
during CCP we need to simplify ccp_fold to not use GENERIC
folding of conditions but also use GIMPLE folding.
PR tree-optimization/89317
* tree-ssa-ccp.cc (ccp_fold): Handle GIMPLE_COND via
gimple_fold_stmt_to_constant_1.
* match.pd (&a != &a + c): Apply to pointer_plus with non-ADDR_EXPR
base as well.
Iain Buclaw [Sat, 10 Dec 2022 18:12:43 +0000 (19:12 +0100)]
d: Fix internal compiler error: in visit, at d/imports.cc:72 (PR108050)
The visitor for lowering IMPORTED_DECLs did not have an override for
dealing with importing OverloadSet symbols. This has now been
implemented in the code generator.
PR d/108050
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (Import *)): Handle build_import_decl
returning a TREE_LIST.
* imports.cc (ImportVisitor::visit (OverloadSet *)): New override.
- Import dmd v2.101.0.
- Deprecate the ability to call `__traits(getAttributes)' on
overload sets.
- Deprecate non-empty `for' statement increment clause with no
effect.
- Array literals assigned to `scope' array variables can now be
allocated on the stack.
D runtime changes:
- Import druntime v2.101.0.
Phobos changes:
- Import phobos v2.101.0.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd c8ae4adb2e.
* typeinfo.cc (check_typeinfo_type): Update for new front-end
interface.
(TypeInfoVisitor::visit (TypeInfoStructDeclaration *)): Remove warning
that toHash() must be declared 'nothrow @safe`.
Iain Buclaw [Sat, 10 Dec 2022 16:17:35 +0000 (17:17 +0100)]
d: Expand bsr intrinsic as `clz(arg) ^ (argsize - 1)'
As well as removing unnecessary casts, this results in less temporaries
being generated during the initial gimple lowering pass. Otherwise the
code generated is identical to the former intrinsic expansion.
gcc/d/ChangeLog:
* intrinsics.cc (expand_intrinsic_bsf): Fix comment.
(expand_intrinsic_bsr): Use BIT_XOR_EXPR instead of MINUS_EXPR.
Richard Biener [Sun, 11 Dec 2022 11:32:49 +0000 (12:32 +0100)]
Treat ADDR_EXPR and CONSTRUCTOR as GIMPLE/GENERIC magically
The following allows to match ADDR_EXPR for both the invariant
&a.b case as well as the &p->d case in a separate definition
transparently. This also allows to remove the hack we employ
for CONSTRUCTOR which we handle for example with
Note CONSTUCTORs always appear as separate definition in GIMPLE,
but I continue to play safe and ADDR_EXPRs are now matched in
both places where previously ADDR_EXPR@0 would have missed
the &p->x case.
This is a prerequesite for the PR89317 fix.
* genmatch.cc (dt_node::gen_kids): Handle ADDR_EXPR in both
the GENERIC and GIMPLE op position.
(dt_simplify::gen): Capture both GENERIC and GIMPLE op
position for ADDR_EXPR and CONSTRUCTOR.
* match.pd: Simplify CONSTRUCTOR leaf handling.
Richard Biener [Wed, 7 Dec 2022 13:42:24 +0000 (14:42 +0100)]
tree-optimization/106904 - bogus -Wstringopt-overflow with vectors
The following avoids CSE of &ps->wp to &ps->wp.hwnd confusing
-Wstringopt-overflow by making sure to produce addresses to the
biggest container from vectorization. For this I introduce
strip_zero_offset_components which turns &ps->wp.hwnd into
&(*ps) and use that to base the vector data references on.
That will also work for addresses with variable components,
alternatively emitting pointer arithmetic via calling
get_inner_reference and gimplifying that would be possible
but likely more intrusive.
This is by no means a complete fix for all of those issues
(avoiding ADDR_EXPRs in favor of pointer arithmetic might be).
Other passes will have similar issues.
In theory that might now cause false negatives.
PR tree-optimization/106904
* tree.h (strip_zero_offset_components): Declare.
* tree.cc (strip_zero_offset_components): Define.
* tree-vect-data-refs.cc (vect_create_addr_base_for_vector_ref):
Strip zero offset components before building the address.
* gcc.dg/Wstringop-overflow-pr106904.c: New testcase.
Tobias Burnus [Sun, 11 Dec 2022 10:47:55 +0000 (11:47 +0100)]
fortran/openmp.cc: Remove 's' that slipped in during %<..%> replacement
Seemingly, 's' (in VI that's the 's'ubstitute command) appeared verbatim in
a gfc_error message when to doing the '...' to %<...%> replacements in commit r13-4590-g84f6f8a2a97f88be01e223c9c9dbab801a4f501f
gcc/fortran/
* openmp.cc (gfc_match_omp_context_selector_specification):
Remove spurious 's' in an error message.
Jakub Jelinek [Sat, 10 Dec 2022 15:50:39 +0000 (16:50 +0100)]
ivopts: Fix IP_END handling for asm goto [PR107997]
The following testcase ICEs, because the latch bb ends with
asm goto which has both fallthrough to the header and one or more labels
in the header too. In that case there is just a single edge out of the
latch block, but still the asm goto is stmt_ends_bb_p statement, yet
ivopts decides to emit an IV bump at the IP_END position and inserts
it into the same bb as the asm goto after it, which then fails verification
(control flow in the middle of bb).
The following patch fixes it by splitting the latch -> header edge in that
case and inserting into the newly created bb, where split_edge ->
redirect_edge_and_branch is able to deal with this case correctly.
2022-12-10 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107997
* tree-ssa-loop-ivopts.cc: Include cfganal.h.
(create_new_iv) <case IP_END>: If ip_end_pos bb is non-empty and ends
with a stmt which ends bb, instead of adding iv update after it split
the latch edge and insert iterator into the new latch bb.
Tobias Burnus [Sat, 10 Dec 2022 12:42:08 +0000 (13:42 +0100)]
libgomp: Handle OpenMP's reverse offloads
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.
The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.
For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.
libgomp/ChangeLog:
* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.
Tobias Burnus [Sat, 10 Dec 2022 07:34:04 +0000 (08:34 +0100)]
Fortran: Replace simple '.' quotes by %<.%>
When using %qs instead of '%s' or %<=%> instead of '=' looks nicer
by having nicer quotes and bold text, if the terminal supports it;
otherwise, plain quotes are used.
gcc/fortran/ChangeLog:
* match.cc (gfc_match_member_sep): Use %<...%> in gfc_error.
* openmp.cc (gfc_match_oacc_routine, gfc_match_omp_context_selector,
gfc_match_omp_context_selector_specification,
gfc_match_omp_declare_variant, resolve_omp_clauses): Likewise;
use %qs instead of '%s'.
* primary.cc (match_real_constant, gfc_match_varspec): Likewise.
* resolve.cc (gfc_resolve_formal_arglist, resolve_operator,
resolve_ordinary_assign): Likewise.
* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
item about 'align'; mark clause as 'Y' and directive as 'N'.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
* testsuite/libgomp.fortran/allocate-3.f90: New test.
Jiufu Guo [Fri, 9 Dec 2022 05:50:37 +0000 (13:50 +0800)]
rs6000: Remove useless copy_rtx in rs6000_emit_set_{,long}_const
Function rs6000_emit_set_const/rs6000_emit_set_long_const are only invoked from
two "define_split"s where the target operand is limited to gpc_reg_operand or
int_reg_operand, then the operand must be REG_P.
And in rs6000_emit_set_const/rs6000_emit_set_long_const, to create temp rtx,
it is using code like "gen_reg_rtx({S|D}Imode)", it must also be REG_P.
So, copy_rtx is not needed for temp and dest.
This patch removes those "copy_rtx" for rs6000_emit_set_const and
rs6000_emit_set_long_const.
Similar story as PR103661, we again return a negative number
for __builtin_cpu_supports:
Documentation says:
int __builtin_cpu_supports(const char *feature)
This function returns a positive integer if the run-time CPU supports feature and returns 0 otherwise.
while we return -2147483648.
Moreover, I noticed "x86-64" is not a valid option for __builtin_cpu_is,
but for __builtin_cpu_supports.
PR target/107551
gcc/ChangeLog:
* config/i386/i386-builtins.cc (fold_builtin_cpu): Use same path
as for PR103661.
* doc/extend.texi: Fix "x86-64" use.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin_target.c: Add more checks.
Sebastian Huber [Fri, 9 Dec 2022 06:55:52 +0000 (07:55 +0100)]
Rename SUBTARGET_CC1_SPEC to OS_CC1_SPEC
This change resolves a naming conflict introduced by the recently added
SUBTARGET_CC1_SPEC to gcc.cc. Some targets (mips and loongarch) aready used
a SUBTARGET_CC1_SPEC define. Rename the define used by gcc.cc to OS_CC1_SPEC.
David Malcolm [Fri, 9 Dec 2022 02:19:23 +0000 (21:19 -0500)]
analyzer: fix ICE on region creation during get_referenced_base_regions [PR108003]
gcc/analyzer/ChangeLog:
PR analyzer/108003
* call-summary.cc
(call_summary_replay::convert_region_from_summary_1): Convert
heap_regs_in_use from auto_sbitmap to auto_bitmap.
* region-model-manager.cc
(region_model_manager::get_or_create_region_for_heap_alloc):
Convert from sbitmap to bitmap.
* region-model-manager.h: Likewise.
* region-model.cc
(region_model::get_or_create_region_for_heap_alloc): Convert from
auto_sbitmap to auto_bitmap.
(region_model::get_referenced_base_regions): Likewise.
* region-model.h: Include "bitmap.h" rather than "sbitmap.h".
(region_model::get_referenced_base_regions): Convert from
auto_sbitmap to auto_bitmap.
gcc/testsuite/ChangeLog:
PR analyzer/108003
* g++.dg/analyzer/pr108003.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 9 Dec 2022 02:19:23 +0000 (21:19 -0500)]
analyzer: handle memmove like memcpy
gcc/analyzer/ChangeLog:
* region-model-impl-calls.cc (class kf_memcpy): Rename to...
(class kf_memcpy_memmove): ...this.
(kf_memcpy::impl_call_pre): Rename to...
(kf_memcpy_memmove::impl_call_pre): ...this, and check the src for
poison.
(register_known_functions): Update for above renaming, and
register BUILT_IN_MEMMOVE and BUILT_IN_MEMMOVE_CHK.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/memcpy-1.c (test_8a, test_8b): New tests.
* gcc.dg/analyzer/memmove-1.c: New test, based on memcpy-1.c
* gcc.dg/analyzer/out-of-bounds-1.c (test7): Update expected
result for uninit srcBuf.
* gcc.dg/analyzer/out-of-bounds-5.c (test8, test9): Add
dg-warnings for memcpy from uninit src vla.
* gcc.dg/analyzer/pr104308.c (test_memmove_within_uninit):
Expect creation point note to be missing on riscv*-*-*.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jonathan Wakely [Fri, 2 Dec 2022 16:18:43 +0000 (16:18 +0000)]
libstdc++: Change class-key for duration and time_point to class
We define these with the 'struct' keyword, but the standard uses
'class'. This results in warnings if users try to refer to them using
elaborated type specifiers.
libstdc++-v3/ChangeLog:
* include/bits/chrono.h (duration, time_point): Change 'struct'
to 'class'.
Marek Polacek [Wed, 7 Dec 2022 20:27:27 +0000 (15:27 -0500)]
docs: Suggest options to improve ASAN stack traces
I got a complaint that while Clang docs suggest options that improve
the quality of the backtraces ASAN prints (cf.
<https://clang.llvm.org/docs/AddressSanitizer.html#usage>), our docs
don't say anything to that effect. This patch amends that with a new
paragraph. (It deliberately doesn't mention -fno-omit-frame-pointer.)
gcc/ChangeLog:
* doc/invoke.texi (-fsanitize=address): Suggest options to improve
stack traces.
The existing comparison was incorrect for non-PRECISE counts
(e.g., AFDO): we could end up with a 0 base_count, which could
lead to asserts, e.g., in good_cloning_opportunity_p.
David Faust [Thu, 8 Dec 2022 18:08:22 +0000 (10:08 -0800)]
bpf: add define_insn for bswap
The eBPF architecture provides 'end[be,le]' instructions for endianness
swapping. Add a define_insn for bswap<mode>2 to use them instaed of
falling back on a libcall.
gcc/
* config/bpf/bpf.md (bswap<mode>2): New define_insn.
Jason Merrill [Tue, 6 Dec 2022 23:10:48 +0000 (18:10 -0500)]
c++: build initializer_list<string> in a loop [PR105838]
The previous patch avoided building an initializer_list<string> at all when
building a vector<string>, but in situations where that isn't possible, we
could still build the initializer_list with a loop over a constant array.
This is represented using a VEC_INIT_EXPR, which required adjusting a couple
of places that expected the initializer array to have the same type as the
target array and fixing build_vec_init not to undo our efforts.
PR c++/105838
gcc/cp/ChangeLog:
* call.cc (convert_like_internal) [ck_list]: Use
maybe_init_list_as_array.
* constexpr.cc (cxx_eval_vec_init_1): Init might have
a different type.
* tree.cc (build_vec_init_elt): Likewise.
* init.cc (build_vec_init): Handle from_array from a
TARGET_EXPR. Retain TARGET_EXPR of a different type.
Jason Merrill [Tue, 6 Dec 2022 14:51:51 +0000 (09:51 -0500)]
c++: avoid initializer_list<string> [PR105838]
When constructing a vector<string> from { "strings" }, first is built an
initializer_list<string>, which is then copied into the strings in the
vector. But this is inefficient: better would be treat the { "strings" }
as a range and construct the strings in the vector directly from the
string-literals. We can do this transformation for standard library
classes because we know the design patterns they follow.
Jason Merrill [Mon, 5 Dec 2022 20:19:27 +0000 (15:19 -0500)]
c++: fewer allocator temps [PR105838]
In this PR, initializing the array of std::string to pass to the vector
initializer_list constructor gets very confusing to the optimizers as the
number of elements increases, primarily because of all the std::allocator
temporaries passed to all the string constructors. Instead of creating one
for each string, let's share an allocator between all the strings; we can do
this safely because we know that std::allocator is stateless and that string
doesn't care about the object identity of its allocator parameter.
PR c++/105838
gcc/cp/ChangeLog:
* cp-tree.h (is_std_allocator): Declare.
* constexpr.cc (is_std_allocator): Split out from...
(is_std_allocator_allocate): ...here.
* init.cc (find_temps_r): New.
(find_allocator_temp): New.
(build_vec_init): Use it.
Sebastian Pop [Wed, 30 Nov 2022 19:45:24 +0000 (19:45 +0000)]
AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]
Currently patchable area is at the wrong place on AArch64. It is placed
immediately after function label, before .cfi_startproc. This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
modifies aarch64_print_patchable_function_entry to avoid placing
patchable area before .cfi_startproc.
Jakub Jelinek [Thu, 8 Dec 2022 13:57:22 +0000 (14:57 +0100)]
cfgbuild: Fix DEBUG_INSN handling in find_bb_boundaries [PR106719]
The following testcase FAILs on aarch64-linux. We have some atomic
instruction followed by 2 DEBUG_INSNs (if -g only of course) followed
by NOTE_INSN_EPILOGUE_BEG followed by some USE insn.
Now, split3 pass replaces the atomic instruction with a code sequence
which ends with a conditional jump and the split3 pass calls
find_many_sub_basic_blocks.
For -g0, find_bb_boundaries sees the flow_transfer_insn (the new conditional
jump), then NOTE_INSN_EPILOGUE_BEG which can live in between basic blocks
and then the USE insn, so splits block after the NOTE_INSN_EPILOGUE_BEG
and puts the NOTE in between the blocks.
For -g, if sees a DEBUG_INSN after the flow_transfer_insn, so sets
debug_insn to it, then walks over another DEBUG_INSN, NOTE_INSN_EPILOGUE_BEG
until it finally sees the USE insn, and triggers the:
rtx_insn *prev = PREV_INSN (insn);
/* If the first non-debug inside_basic_block_p insn after a control
flow transfer is not a label, split the block before the debug
insn instead of before the non-debug insn, so that the debug
insns are not lost. */
if (debug_insn && code != CODE_LABEL && code != BARRIER)
prev = PREV_INSN (debug_insn);
code I've added for PR81325. If there are only DEBUG_INSNs, that is
the right thing to do, but if in between debug_insn and insn there are
notes which can stay in between basic blocks or simnilarly JUMP_TABLE_DATA
or their associated CODE_LABELs, it causes -fcompare-debug differences.
The following patch fixes it by clearing debug_insn if JUMP_TABLE_DATA
or associated CODE_LABEL is seen (I'm afraid there is no good answer
what to do with DEBUG_INSNs before those; the code then removes them:
/* Clean up the bb field for the insns between the blocks. */
for (x = NEXT_INSN (flow_transfer_insn);
x != BB_HEAD (fallthru->dest);
x = next)
{
next = NEXT_INSN (x);
/* Debug insns should not be in between basic blocks,
drop them on the floor. */
if (DEBUG_INSN_P (x))
delete_insn (x);
else if (!BARRIER_P (x))
set_block_for_insn (x, NULL);
}
but if there are NOTEs, the patch just reorders the NOTEs and DEBUG_INSNs,
such that the NOTEs come first (so that they stay in between basic blocks
like with -g0) and DEBUG_INSNs after those (so that bb is split before
them, so they will be in the basic block after NOTE_INSN_BASIC_BLOCK).
2022-12-08 Jakub Jelinek <jakub@redhat.com>
PR debug/106719
* cfgbuild.cc (find_bb_boundaries): If there are NOTEs in between
debug_insn (seen after flow_transfer_insn) and insn, move NOTEs
before all the DEBUG_INSNs and split after NOTEs. If there are
other insns like jump table data, clear debug_insn.
On Thu, Dec 01, 2022 at 09:09:51AM +0100, Jakub Jelinek via Gcc-patches wrote:
> BTW, I wonder if we couldn't add additional patterns which would catch
> the case where one of the operands is constant.
Andrew MacLeod [Tue, 6 Dec 2022 15:41:29 +0000 (10:41 -0500)]
Ensure arguments to range-op handler are supported.
PR tree-optimization/107985
gcc/
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Check if type
of the operands is supported.
* gimple-range.cc (gimple_ranger::prefill_stmt_dependencies): Do
not assert if here is no range-op handler.
Jiufu Guo [Wed, 30 Nov 2022 05:13:37 +0000 (13:13 +0800)]
rs6000: Update sign extension computation with sext_hwi
This patch just replaces the expression like:
((value & 0xf..f) ^ 0x80..0) - 0x80..0 to better code(e.g. sext_hwi) for
rs6000.cc, rs6000.md and predicates.md (files under rs6000/).
The following addresses a missed folding noticed in PR107699 that can
be fixed amending the existing &x + a != &x + b pattern to also handle
the case of only one side having a pointer plus. I'm moving the
patterns next to related simpifications showing there'd be an existing
pattern matching this if it were not gated with an explicit single_use
constraint. Note the new pattern also handles &x.a + a != &x.b, but
this hints at some unification / generalization opportunities here.
PR tree-optimization/107699
* match.pd (&a !=/== &a.b + c -> (&a - &a.b) !=/== c): New
pattern variant.
Alexandre Oliva [Thu, 8 Dec 2022 10:50:33 +0000 (07:50 -0300)]
[PR102706] [testsuite] -Wno-stringop-overflow vs Warray-bounds
The bogus Wstringop-overflow warnings conditionally issued for
Warray-bounds-48.c and -Wzero-length-array-bounds-2.c are expected
under conditions that depend on the availability of certain vector
patterns, but that don't seem to model the conditions under which the
warnings are expected.
On riscv64-elf and arm-eabi/-mcpu=cortex-r5, for example, though the
Warray-bounds-48.c condition passes, we don't issue warnings. On
riscv64-elf, we decide not to vectorize the assignments; on cortex-r5,
we do vectorize pairs of assignments, but that doesn't yield the
expected warning, even though assignments that should trigger the
bogus warning are vectorized and associated with the earlier line
where the bogus warning would be expected.
On riscv64, for Wzero-length-array-bounds-2.c, we issue the expected
warning in test_C_global_buf, but we also issue a warning for
test_C_local_buf under the same conditions, that would be expected on
other platforms but that is not issued on them. On
arm-eabi/-mcpu=cortex-r5, the condition passes so we'd expect the
warning in both functions, but we don't warn on either.
Instead of further extending the effective target tests, introduced to
temporarily tolerate these expected bogus warnings, so as to capture
the vectorizer analyses that lead to the mismatched decisions, I'm
disabling the undesired warnings for these two tests.
Alexandre Oliva [Thu, 8 Dec 2022 10:50:32 +0000 (07:50 -0300)]
[arm] xfail fp-uint64-convert-double tests
The FP emulation on ARM doesn't take rounding modes into account. The
tests require hard_float, but that only tests for calls when adding
doubles. There are arm targets that support hardware adds, but that
emulate conversions.
for gcc/testsuite/ChangeLog
* gcc.dg/torture/fp-uint64-convert-double-1.c: Expect fail on
arm-*-eabi*.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Likewise.
Alexandre Oliva [Thu, 8 Dec 2022 10:50:30 +0000 (07:50 -0300)]
[testsuite] [arm/aarch64] -fno-short-enums for auto-init-[12].c
On arm-eabi, and possibly on other platforms, -fshort-enums is enabled
by default, which breaks some tests' expectations as to enum sizes
with DEFERRED_INIT. Disable short enums so that the expectations are
met.
Jakub Jelinek [Thu, 8 Dec 2022 09:41:49 +0000 (10:41 +0100)]
range-op-float: frange_arithmetic tweaks for MODE_COMPOSITE_P
As mentioned in PR107967, ibm-ldouble-format documents that
+- has 1ulp accuracy, * 2ulps and / 3ulps.
So, even if the result is exact, we need to widen the range a little bit.
The following patch does that. I just wonder what it means for reverse
division (the op1_range case), which we implement through multiplication,
when division has 3ulps error and multiplication just 2ulps. In any case,
this format is a mess and for non-default rounding modes can't be trusted
at all, instead of +inf or something close to it it happily computes -inf.
2022-12-08 Jakub Jelinek <jakub@redhat.com>
* range-op-float.cc (frange_nextafter): For MODE_COMPOSITE_P from
denormal or zero, use real_nextafter on DFmode with conversions
around it.
(frange_arithmetic): For mode_composite, on top of rounding in the
right direction accept extra 1ulp error for PLUS/MINUS_EXPR, extra
2ulps error for MULT_EXPR and extra 3ulps error for RDIV_EXPR.
Jakub Jelinek [Thu, 8 Dec 2022 09:34:26 +0000 (10:34 +0100)]
range-op-float: Fix up frange_arithmetic [PR107967]
The addition of PLUS/MINUS/MULT/RDIV_EXPR frange handlers causes
miscompilation of some of the libm routines, resulting in lots of
glibc test failures. A part of them is purely PR107608 fold-overflow-1.c
etc. issues, say when the code does
return -0.5 / 0.0;
and expects division by zero to be emitted, but we propagate -Inf
and avoid the operation.
But there are also various tests where we end up with different computed
value from the expected ones. All those cases are like:
is: inf inf
should be: 1.18973149535723176502e+4932 0xf.fffffffffffffff0p+16380
is: inf inf
should be: 1.18973149535723176508575932662800701e+4932 0x1.ffffffffffffffffffffffffffffp+16383
is: inf inf
should be: 1.7976931348623157e+308 0x1.fffffffffffffp+1023
is: inf inf
should be: 3.40282346e+38 0x1.fffffep+127
and the corresponding source looks like:
static const double huge = 1.0e+300;
double whatever (...) {
...
return huge * huge;
...
}
which for rounding to nearest or +inf should and does return +inf, but
for rounding to -inf or 0 should instead return nextafter (inf, -inf);
The rules IEEE754 has are that operations on +-Inf operands are exact
and produce +-Inf (except for the invalid ones that produce NaN) regardless
of rounding mode, while overflows:
"a) roundTiesToEven and roundTiesToAway carry all overflows to ∞ with the
sign of the intermediate result.
b) roundTowardZero carries all overflows to the format’s largest finite
number with the sign of the intermediate result.
c) roundTowardNegative carries positive overflows to the format’s largest
finite number, and carries negative overflows to −∞.
d) roundTowardPositive carries negative overflows to the format’s most
negative finite number, and carries positive overflows to +∞."
The behavior around overflows to -Inf or nextafter (-inf, inf) was actually
handled correctly, we'd construct [-INF, -MAX] ranges in those cases
because !real_less (&value, &result) in that case - value is finite
but larger in magnitude than what the format can represent (but GCC
internal's format can), while result is -INF in that case.
But for the overflows to +Inf or nextafter (inf, -inf) was handled
incorrectly, it tested real_less (&result, &value) rather than
!real_less (&result, &value), the former test is true when already the
rounding value -> result rounded down and in that case we shouldn't
round again, we should round down when it didn't.
So, in theory this could be fixed just by adding one ! character,
- if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
+ if ((mode_composite || (real_isneg (&inf) ? !real_less (&result, &value)
: !real_less (&value, &result)))
but the following patch goes further. The distance between
nextafter (inf, -inf) and inf is large (infinite) and expressions like
1.0e+300 * 1.0e+300 always produce +inf in round to nearest mode by far,
so I think having low bound of nextafter (inf, -inf) in that case is
unnecessary. But if it isn't multiplication but say addition and we are
inexact and very close to the boundary between rounding to nearest
maximum representable vs. rounding to nearest +inf, still using [MAX, +INF]
etc. ranges seems safer because we don't know exactly what we lost in the
inexact computation.
The following patch implements that.
2022-12-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107967
* range-op-float.cc (frange_arithmetic): Fix a thinko - if
inf is negative, use nextafter if !real_less (&result, &value)
rather than if real_less (&result, &value). If result is +-INF
while value is finite and -fno-rounding-math, don't do rounding
if !inexact or if result is significantly above max representable
value or below min representable value.
* gcc.dg/pr107967-1.c: New test.
* gcc.dg/pr107967-2.c: New test.
* gcc.dg/pr107967-3.c: New test.
Joseph Myers [Wed, 7 Dec 2022 22:08:18 +0000 (22:08 +0000)]
c: Diagnose auto constexpr used with a type
The constraints on auto in C2x disallow use with other storage-class
specifiers unless the type is inferred from an initializer. That
includes constexpr; add the missing checks for this case (the
combination of auto, constexpr and a type specifier).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (declspecs_add_type, declspecs_add_scspec): Check for
auto, constexpr and a type used together.
gcc/testsuite/
* gcc.dg/c2x-constexpr-1.c: Do not use auto, constexpr and a type
together.
* gcc.dg/c2x-constexpr-3.c: Add tests of auto, constexpr and type
used together.