Tobias Burnus [Fri, 21 Aug 2020 15:49:50 +0000 (17:49 +0200)]
OpenMP: Support 'lastprivate (conditional:' in Fortran
gcc/fortran/ChangeLog:
* gfortran.h (gfc_omp_namelist): Add lastprivate_conditional.
* openmp.c (gfc_match_omp_clauses): Handle 'conditional:'
modifier of 'lastprivate'.
* trans-openmp.c (gfc_omp_clause_default_ctor): Don't assert
on OMP_CLAUSE__CONDTEMP_ and other OMP_*TEMP_.
(gfc_trans_omp_variable_list): Handle lastprivate_conditional.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/lastprivate-conditional-1.f90: New test.
* gfortran.dg/gomp/lastprivate-conditional-2.f90: New test.
* gfortran.dg/gomp/lastprivate-conditional-3.f90: New test.
* gfortran.dg/gomp/lastprivate-conditional-4.f90: New test.
* gfortran.dg/gomp/lastprivate-conditional-5.f90: New test.
Jonathan Wakely [Fri, 21 Aug 2020 11:01:05 +0000 (12:01 +0100)]
libstdc++: Skip PSTL tests when installed TBB is too old [PR 96718]
These tests do not actually require TBB, because they only inspect the
feature test macros present in the headers. However, if TBB is installed
then its headers will be included, and the version will be checked. If
the version is too old, compilation fails due to a #error directive.
This change disables the tests if TBB is not present, so that we skip
them instead of failing.
Tobias Burnus [Fri, 21 Aug 2020 07:12:44 +0000 (09:12 +0200)]
Move change-log items to ChangeLog.omp
Move entries from commit 7c10ae450b95495dda362cb66770bb78b546592e
from gcc/fortran/ChangeLog to gcc/fortran/ChangeLog.omp and
from gcc/testsuite/ChangeLog to gcc/testsuite/ChangeLog.omp.
Sandra Loosemore [Thu, 20 Aug 2020 02:24:43 +0000 (19:24 -0700)]
Annotate inner loops in "acc kernels loop" directives (Fortran).
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code. However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops. This patch implements this behavior in Fortran.
gcc/fortran/
* openmp.c (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.
Sandra Loosemore [Thu, 20 Aug 2020 02:18:57 +0000 (19:18 -0700)]
Annotate inner loops in "acc kernels loop" directives (C/C++).
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code. However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops. This patch implements this behavior for C and C++.
d: Field access in parentheses causes error: need 'this' for 'field' of type 'type'
1. Fixes an ICE in the front-end if a struct symbol were to appear twice
in the compilation unit.
2. Fixes a rejects-valid bug in the front-end where `(symbol)' was being
resolved as a `var' expression, instead of `this.var'.
gcc/d/ChangeLog:
PR d/96250
* dmd/dstruct.c (StructDeclaration::semantic): Error if redefinition
of struct exists in compilation.
* dmd/expressionsem.c (ExpressionSemanticVisitor::visit(TypeExp)):
Rewrite resolved field variables as 'this.var' before semantic.
* dmd/parse.c (Parser::parseUnaryExp): Mark '(type) una_exp' as a
parenthesized expression.
gcc/testsuite/ChangeLog:
PR d/96250
* gdc.test/fail_compilation/fail17492.d: New test.
* gdc.test/compilable/b9490.d: New test.
* gdc.test/compilable/ice14739.d: New test.
* gdc.test/fail_compilation/ice21060.d: New test.
* gdc.test/fail_compilation/imports/ice21060a/package.d: New file.
* gdc.test/fail_compilation/imports/ice21060b/package.d: New file.
* gdc.test/fail_compilation/imports/ice21060c/package.d: New file.
* gdc.test/fail_compilation/imports/ice21060d/package.d: New file.
* gdc.test/runnable/b16278.d: New test.
Iain Buclaw [Thu, 20 Aug 2020 16:18:40 +0000 (18:18 +0200)]
d: Fix ICE in setValue at dmd/dinterpret.c:7046
This was originally seen when running the testsuite for a 16-bit target,
however, it could be reproduced on 32-bit using long[] as well.
gcc/d/ChangeLog:
* dmd/ctfeexpr.c (isCtfeValueValid): Return true for array literals as
well as structs.
* dmd/dinterpret.c: Don't reinterpret static arrays into dynamic.
gcc/testsuite/ChangeLog:
* gdc.test/compilable/interpret3.d: Add test.
* gdc.test/fail_compilation/reg6769.d: New test.
Moves no frame access error to own function, adding use of it for both
when get_framedecl() cannot find a path to the outer function frame, and
guarding get_decl_tree() from recursively calling itself.
gcc/d/ChangeLog:
PR d/96254
* d-codegen.cc (error_no_frame_access): New.
(get_frame_for_symbol): Use fdparent name in error message.
(get_framedecl): Replace call to assert with error.
* d-tree.h (error_no_frame_access): Declare.
* decl.cc (get_decl_tree): Detect recursion and error.
gcc/testsuite/ChangeLog:
PR d/96254
* gdc.dg/pr96254a.d: New test.
* gdc.dg/pr96254b.d: New test.
Change test for CUDA callback context in nvptx_free() from using
GOMP_PLUGIN_acc_thread () into checking for CUDA_ERROR_NOT_PERMITTED,
for the former only works for OpenACC, but not OpenMP offloading.
libgomp/
* plugin/plugin-nvptx.c (nvptx_free):
Change "GOMP_PLUGIN_acc_thread () == NULL" test into check of
CUDA_ERROR_NOT_PERMITTED status for cuMemGetAddressRange. Adjust
comments.
Change test for CUDA callback context in nvptx_free() from using
GOMP_PLUGIN_acc_thread () into checking for CUDA_ERROR_NOT_PERMITTED,
for the former only works for OpenACC, but not OpenMP offloading.
libgomp/
* plugin/plugin-nvptx.c (nvptx_free):
Change "GOMP_PLUGIN_acc_thread () == NULL" test into check of
CUDA_ERROR_NOT_PERMITTED status for cuMemGetAddressRange. Adjust
comments.
Jonathan Wakely [Wed, 19 Aug 2020 12:41:26 +0000 (13:41 +0100)]
libstdc++: Add deprecated attributes to old iostream members
Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.
This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.
libstdc++-v3/ChangeLog:
* include/bits/c++config (_GLIBCXX_DEPRECATED_SUGGEST): New
macro for "use 'foo' instead" message in deprecated warnings.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/type_traits (__is_nullptr_t): Use
_GLIBCXX_DEPRECATED_SUGGEST instead of _GLIBCXX_DEPRECATED.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.
Joe Ramsay [Wed, 29 Jul 2020 13:04:28 +0000 (14:04 +0100)]
arm: Enable no-writeback vldr.16/vstr.16.
There was previously no way to specify that a register operand cannot
have any writeback modifiers, and as a result the argument to vldr.16
and vstr.16 could be erroneously output with post-increment. This
change adds a constraint which forbids all writeback, and
selects it in the relevant case for vldr.16 and vstr.16
gcc/ChangeLog:
PR target/96682
* config/arm/arm-protos.h (arm_coproc_mem_operand_no_writeback):
Declare prototype.
(arm_mve_mode_and_operands_type_check): Declare prototype.
* config/arm/arm.c (arm_coproc_mem_operand): Refactor to use
_arm_coproc_mem_operand.
(arm_coproc_mem_operand_wb): New function to cover full, limited
and no writeback.
(arm_coproc_mem_operand_no_writeback): New constraint for memory
operand with no writeback.
(arm_print_operand): Extend 'E' specifier for memory operand
that does not support writeback.
(arm_mve_mode_and_operands_type_check): New constraint check for
MVE memory operands.
* config/arm/constraints.md: Add Uj constraint for VFP vldr.16
and vstr.16.
* config/arm/vfp.md (*mov_load_vfp_hf16): New pattern for
vldr.16.
(*mov_store_vfp_hf16): New pattern for vstr.16.
(*mov<mode>_vfp_<mode>16): Remove MVE moves.
gcc/testsuite/ChangeLog:
PR target/96682
* gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: New test.
Peter Bergner [Tue, 18 Aug 2020 21:16:11 +0000 (16:16 -0500)]
rs6000: Rename instruction xvcvbf16sp to xvcvbf16spn
The xvcvbf16sp mnemonic, which was just added in ISA 3.1 has been renamed
to xvcvbf16spn, to make it consistent with the other non-signaling conversion
instructions which all end with "n". The only use of this instruction is in
an MMA conversion built-in function, so there is little to no compatibility
issue.
Peter Bergner [Thu, 13 Aug 2020 18:40:39 +0000 (13:40 -0500)]
rs6000: ICE when using an MMA type as a function param or return value [PR96506]
PR96506 shows a problem where we ICE on illegal usage, namely using MMA
types for function arguments and return values. The solution is to flag
these illegal usages as errors early, before we ICE.
2020-08-13 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/96506
* config/rs6000/rs6000-call.c (rs6000_promote_function_mode): Disallow
MMA types as return values.
(rs6000_function_arg): Disallow MMA types as function arguments.
gcc/testsuite/
PR target/96506
* gcc.target/powerpc/pr96506.c: New test.
Jason Merrill [Thu, 6 Aug 2020 06:40:10 +0000 (02:40 -0400)]
c++: Handle enumerator in C++20 alias CTAD. [PR96199]
To form a deduction guide for an alias template, we substitute the template
arguments from the pattern into the deduction guide for the underlying
class. In the case of B(A1<X>), that produces B(A1<B<T,1>::X>) -> B<T,1>.
But since an enumerator doesn't have its own template info, and B<T,1> is a
dependent scope, trying to look up B<T,1>::X fails and we crash. So we need
to produce a SCOPE_REF instead.
And trying to use the members of the template class is wrong for other
members, as well, as it gives a nonsensical result if the class is
specialized.
gcc/cp/ChangeLog:
PR c++/96199
* pt.c (maybe_dependent_member_ref): New.
(tsubst_copy) [CONST_DECL]: Use it.
[VAR_DECL]: Likewise.
(tsubst_aggr_type): Handle nested type.
gcc/testsuite/ChangeLog:
PR c++/96199
* g++.dg/cpp2a/class-deduction-alias4.C: New test.
If the walk_body on the various sequences of reduction, lastprivate and/or linear
clauses needs to create a temporary variable, we should declare that variable
in that sequence rather than outside, where it would need to be privatized inside of
the construct.
2020-08-08 Jakub Jelinek <jakub@redhat.com>
PR fortran/93553
* tree-nested.c (convert_nonlocal_omp_clauses): For
OMP_CLAUSE_REDUCTION, OMP_CLAUSE_LASTPRIVATE and OMP_CLAUSE_LINEAR
save info->new_local_var_chain around walks of the clause gimple
sequences and declare_vars if needed into the sequence.
* testsuite/libgomp.c-c++-common/critical-hint-1.c: New; moved from
gcc/testsuite/c-c++-common/gomp/.
* testsuite/libgomp.c-c++-common/critical-hint-2.c: Likewise.
* testsuite/libgomp.fortran/critical-hint-1.f90: New; moved
from gcc/testsuite/gfortran.dg/gomp/.
* testsuite/libgomp.fortran/critical-hint-2.f90: Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/critical-hint-1.c: Moved to libgomp/.
* c-c++-common/gomp/critical-hint-2.c: Moved to libgomp/.
* gfortran.dg/gomp/critical-hint-1.f90: Moved to libgomp/.
* gfortran.dg/gomp/critical-hint-2.f90: Moved to libgomp/.
* c-omp.c (c_finish_omp_critical): Check for no name but
nonzero hint provided.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_clause_hint): Require nonnegative hint clause.
(c_parser_omp_critical): Permit hint(0) clause without named critical.
(c_parser_omp_construct): Don't assert if error_mark_node is returned.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_clause_hint): Require nonnegative hint.
(cp_parser_omp_critical): Permit hint(0) clause without named critical.
* pt.c (tsubst_expr): Re-check the latter for templates.
gcc/fortran/ChangeLog:
* openmp.c (gfc_match_omp_critical): Fix handling hints; permit
hint clause without named critical.
(resolve_omp_clauses): Require nonnegative constant integer
for the hint clause.
(gfc_resolve_omp_directive): Check for no name but
nonzero value for hint clause.
* parse.c (parse_omp_structured_block): Fix same-name check
for critical.
* trans-openmp.c (gfc_trans_omp_critical): Handle hint clause properly.
libgomp/ChangeLog:
* omp_lib.f90.in: Add omp_sync_hint_* and omp_sync_hint_kind.
* omp_lib.h.in: Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/critical-3.C: Add nameless critical with hint testcase.
* c-c++-common/gomp/critical-hint-1.c: New test.
* c-c++-common/gomp/critical-hint-2.c: New test.
* gfortran.dg/gomp/critical-hint-1.f90: New test.
* gfortran.dg/gomp/critical-hint-2.f90: New test.
liuhongt [Wed, 12 Aug 2020 02:48:17 +0000 (10:48 +0800)]
Don't use pinsr/pextr for struct initialization/extraction.
gcc/
PR target/96562
PR target/93897
* config/i386/i386-expand.c (ix86_expand_pinsr): Don't use
pinsr for TImode.
(ix86_expand_pextr): Don't use pextr for TImode.
gcc/testsuite/
* gcc.target/i386/pr96562-1.c: New test.
Jakub Jelinek [Mon, 8 Jun 2020 08:30:48 +0000 (10:30 +0200)]
testsuite: Fix up pr95548.C testcase.
2020-06-08 Jakub Jelinek <jakub@redhat.com>
PR lto/95548
* g++.dg/torture/pr95548.C: Change from dg-do compile to dg-do link,
add return type for main, for __SIZEOF_INT128__ test with __uint128_t
enumerator constants and add a test with unsigned long long
enumerators for all targets.
PR fortran/94690
* openmp.c (OMP_DISTRIBUTE_CLAUSES): Add OMP_CLAUSE_LASTPRIVATE.
(gfc_resolve_do_iterator): Skip the private handling for SIMD as
that is handled by ME code.
* trans-openmp.c (gfc_trans_omp_do): Don't add private/lastprivate
for dovar_found == 0, unless !simple.
Jan Hubicka [Fri, 29 May 2020 10:25:48 +0000 (12:25 +0200)]
Fix streamer desynchornization caused by streamer debugging patch
it turns out I lost one hunk in the patch disabling extra streaming
which causes streamer to go out of sync in the case non-trivial scc
containing the node being streamed appears in local stream (which seems
quite rare since it does not happen during bootstrap).
Patrick Palka [Mon, 10 Aug 2020 13:39:29 +0000 (09:39 -0400)]
c++: constraints and address of template-id
When resolving the address of a template-id, we need to drop functions
whose associated constraints are not satisfied, as per [over.over]. We
do so in resolve_address_of_overloaded_function, but not in
resolve_overloaded_unification or resolve_nondeduced_context, which
seems like an oversight.
gcc/cp/ChangeLog:
* pt.c (resolve_overloaded_unification): Drop functions with
unsatisfied constraints.
(resolve_nondeduced_context): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-fn5.C: New test.
* g++.dg/concepts/fn8.C: Generalize dg-error directive to accept
"no matching function ..." diagnostic.
* g++.dg/cpp2a/concepts-fn1.C: Likewise.
* g++.dg/cpp2a/concepts-ts2.C: Likewise.
* g++.dg/cpp2a/concepts-ts3.C: Likewise.
Jan Hubicka [Wed, 3 Jun 2020 19:16:43 +0000 (21:16 +0200)]
Optimize ODR enum streaming
it turns out that half of the global decl stream of cc1 LTO build consits
TREE_LISTS, identifiers and integer cosntats representing TYPE_VALUES of enums.
Those are streamed only to produce ODR warning and used otherwise, so this
patch moves the info to a separate section that is represented and streamed
more effectively.
This also adds place for more info that may be used for ODR diagnostics
(i.e. at the moment we do not warn when the declarations differs i.e. by the
associated member functions and their types) and the type inheritance graph
rather then poluting the global stream.
I was bit unsure what enums we want to store into the section. All parsed
enums is probably too expensive, only those enums streamed to represent IL is
bit hard to get, so I went for those seen by free lang data.
As a plus we now get bit more precise warning because also the location of
mismatched enum CONST_DECL is streamed.
It changes:
[WPA] read 4608466 unshared trees
[WPA] read 2942094 mergeable SCCs of average size 1.365328
[WPA] 8625389 tree bodies read in total
[WPA] tree SCC table: size 524287, 247652 elements, collision ratio: 0.383702
[WPA] tree SCC max chain length 2 (size 1)
[WPA] Compared 2694442 SCCs, 228 collisions (0.000085)
[WPA] Merged 2694419 SCCs
[WPA] Merged 3731982 tree bodies
[WPA] Merged 633335 types
[WPA] 122077 types prevailed (155548 associated trees)
...
[WPA] Compression: 110593119 input bytes, 287696614 uncompressed bytes (ratio: 2.601397)
[WPA] Size of mmap'd section decls: 85628556 bytes
[WPA] Size of mmap'd section function_body: 13842928 bytes
[WPA] read 1720989 unshared trees
[WPA] read 1252217 mergeable SCCs of average size 1.858507
[WPA] 4048243 tree bodies read in total
[WPA] tree SCC table: size 524287, 226524 elements, collision ratio: 0.491759
[WPA] tree SCC max chain length 2 (size 1)
[WPA] Compared 1025693 SCCs, 196 collisions (0.000191)
[WPA] Merged 1025670 SCCs
[WPA] Merged 2063373 tree bodies
[WPA] Merged 633497 types
[WPA] 122299 types prevailed (155827 associated trees)
...
[WPA] Compression: 103428770 input bytes, 281151423 uncompressed bytes (ratio: 2.718310)
[WPA] Size of mmap'd section decls: 49390917 bytes
[WPA] Size of mmap'd section function_body: 13858258 bytes
...
[WPA] Size of mmap'd section odr_types: 29054816 bytes
So number of SCCs streamed drops to 38% and the number of unshared trees (that
are bit misnamed since it is mostly integer_cst) to 37%.
Things speeds up correspondingly, but I did not save time report from previous
build.
The enum values are still quite surprisingly large. I may take a look into
ways getting it smaller incrementally, but it streams reasonably fast:
Jan Hubicka [Fri, 22 May 2020 14:37:06 +0000 (16:37 +0200)]
Avoid streaming stray references.
this patch avoids stremaing completely useless stray references to gobal decl
stream. I am re-testing the patch (rebased to current tree) on x86_64-linux
and intend to commit once testing finishes.
gcc/ChangeLog:
2020-05-22 Jan Hubicka <hubicka@ucw.cz>
* lto-streamer-out.c (lto_output_tree): Do not stream final ref if
it is not needed.
gcc/lto/ChangeLog:
2020-05-22 Jan Hubicka <hubicka@ucw.cz>
* lto-common.c (lto_read_decls): Do not skip stray refs.
Jan Hubicka [Fri, 22 May 2020 10:29:19 +0000 (12:29 +0200)]
Simplify streaming of SCC components
this patch saves few bytes from SCC streaming. First we stream end markers
that are fully ignored at stream in.
Second I missed streaming of emtry_len in the previous change so it is
pointlessly streamed for LTO_trees. Moreover entry_len is almost always 1
(always during gcc bootstrap) and thus it makes sense to avoid stremaing it
in majority of cases.
gcc/ChangeLog:
2020-05-21 Jan Hubicka <hubicka@ucw.cz>
* lto-streamer-in.c (lto_read_tree): Do not stream end markers.
(lto_input_scc): Optimize streaming of entry lengths.
* lto-streamer-out.c (lto_write_tree): Do not stream end markers
(DFS::DFS): Optimize stremaing of entry lengths
Jan Hubicka [Wed, 20 May 2020 13:58:22 +0000 (15:58 +0200)]
Avoid SCC hashing on unmergeable trees
This is new incarantion of patch to identify unmergeable tree at streaming out
time rather than streaming in and to avoid pickling them to sccs with with hash
codes.
Building cc1 plus this patch reduces:
[WPA] read 4452927 SCCs of average size 1.986030
[WPA] 8843646 tree bodies read in total
[WPA] tree SCC table: size 524287, 205158 elements, collision ratio: 0.505204
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 947551 SCCs, 780270 collisions (0.823460)
[WPA] Merged 944038 SCCs
[WPA] Merged 5253521 tree bodies
[WPA] Merged 590027 types
...
[WPA] Size of mmap'd section decls: 99229066 bytes
[WPA] Size of mmap'd section function_body: 18398837 bytes
[WPA] Size of mmap'd section refs: 733678 bytes
[WPA] Size of mmap'd section jmpfuncs: 2965981 bytes
[WPA] Size of mmap'd section pureconst: 170248 bytes
[WPA] Size of mmap'd section profile: 17985 bytes
[WPA] Size of mmap'd section symbol_nodes: 3392736 bytes
[WPA] Size of mmap'd section inline: 2693920 bytes
[WPA] Size of mmap'd section icf: 435557 bytes
[WPA] Size of mmap'd section offload_table: 0 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651660 bytes
... to ...
[WPA] read 3312246 unshared trees
[WPA] read 1144381 mergeable SCCs of average size 4.833785
[WPA] 8843938 tree bodies read in total
[WPA] tree SCC table: size 524287, 197767 elements, collision ratio: 0.506446
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 946614 SCCs, 775077 collisions (0.818789)
[WPA] Merged 943798 SCCs
[WPA] Merged 5253336 tree bodies
[WPA] Merged 590105 types
....
[WPA] Size of mmap'd section decls: 81262144 bytes
[WPA] Size of mmap'd section function_body: 14702611 bytes
[WPA] Size of mmap'd section ext_symtab: 0 bytes
[WPA] Size of mmap'd section refs: 733695 bytes
[WPA] Size of mmap'd section jmpfuncs: 2332150 bytes
[WPA] Size of mmap'd section pureconst: 170292 bytes
[WPA] Size of mmap'd section profile: 17986 bytes
[WPA] Size of mmap'd section symbol_nodes: 3393358 bytes
[WPA] Size of mmap'd section inline: 2567939 bytes
[WPA] Size of mmap'd section icf: 435633 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651824 bytes
so results in about 22% reduction in global decl stream and 24% reduction on
function bodies stream (which is read mostly by ICF)
Martin, the zstd compression breaks the compression statistics (it works when
GCC is configured for zlib)
At first ltrans I get:
[LTRANS] Size of mmap'd section decls: 3734248 bytes
[LTRANS] Size of mmap'd section function_body: 4895962 bytes
... to ...
[LTRANS] Size of mmap'd section decls: 3479850 bytes
[LTRANS] Size of mmap'd section function_body: 3722935 bytes
So 7% reduction of global stream and 31% reduction of function bodies.
Stream in seems to get about 3% faster and stream out about 5% but it is
close to noise factor of my experiment. I expect bigger speedups on
Firefox but I did not test it today since my Firefox setup broke again.
GCC is not very good example on the problem with anonymous namespace
types since we do not have so many of them.
Sice of object files in gcc directory is reduced by 11% (because hash
numbers do not compress well I guess).
The patch makes DFS walk to recognize trees that are not merged (anonymous
namespace, local function/variable decls, anonymous types etc). As discussed
on IRC this is now done during the SCC walk rather than during the hash
computation. When local tree is discovered we know that SCC components of everything that is on
the stack reffers to it and thus is also local. Moreover we mark trees into hash set in output block
so if we get a cross edge referring to local tree it gets marked too.
Patch also takes care of avoiding SCC wrappers around some trees. In particular
1) singleton unmergeable SCCs are now streamed inline in global decl stream
This includes INTEGER_CSTs and IDENTIFIER_NODEs that are shared by different
code than rest of tree merging.
2) We use LTO_trees instead of LTO_tree_scc to wrap unmergeable SCC components.
It is still necessary to mark them because of forward references. LTO_trees
has simple header with number of trees and then things are streamed same way
as for LTO_tree_scc. That is tree headers first followed by pickled references
so things may point to future.
Of course it is not necessary for LTO_tree_scc to be single component and
streamer out may group more components together, but I decided to not snowball
the patch even more
3) In local streams when lto_output_tree is called and the topmost SCC components
turns out to be singleton we stream the tree directly
instead of LTO_tree_scc, hash code, pickled tree, reference to just stremaed tree.
LTO_trees is used to wrap all trees needed to represent tree being streamed.
It would make sense again to use only one LTO_trees rather than one per SCC
but I think this can be done incrementally.
In general local trees are now recognized by new predicate local_tree_p
Bit subtle is handing of TRANLSATION_UNIT_DECL, INTEGER_CST and
IDENTIFIER_NODE.
TRANSLATION_UNIT_DECL a local tree but references to it does not make
other trees local (because we also understand local decls now).
So I check for it later after localness propagation is done.
INTEGER_CST and IDENTIFIER_NODEs are merged but not via the tree merging
machinery. So it makes sense to stream them as unmergeable trees but we
still need to compute their hashes so SCCs referring them do not get too
large collision chains. For this reason they are checked just prior
stream out.
lto-bootstrapped/regteted x86_64-linux, OK?
gcc/ChangeLog:
2020-05-19 Jan Hubicka <hubicka@ucw.cz>
* lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter.
(lto_input_tree_1): Strenghten sanity check.
(lto_input_tree): Update call of lto_input_scc.
* lto-streamer-out.c: Include ipa-utils.h
(create_output_block): Initialize local_trees if merigng is going
to happen.
(destroy_output_block): Destroy local_trees.
(DFS): Add max_local_entry.
(local_tree_p): New function.
(DFS::DFS): Initialize and maintain it.
(DFS::DFS_write_tree): Decide on streaming format.
(lto_output_tree): Stream inline singleton SCCs
* lto-streamer.h (enum LTO_tags): Add LTO_trees.
(struct output_block): Add local_trees.
(lto_input_scc): Update prototype.
gcc/lto/ChangeLog:
2020-05-19 Jan Hubicka <hubicka@ucw.cz>
* lto-common.c (compare_tree_sccs_1): Sanity check that we never
read TRANSLATION_UNIT_DECL.
(process_dref): Break out from ...
(unify_scc): ... here.
(process_new_tree): Break out from ...
(lto_read_decls): ... here; handle streaming of singleton trees.
(print_lto_report_1): Update statistics.
Alan Modra [Thu, 6 Aug 2020 04:42:21 +0000 (14:12 +0930)]
PR96493, powerpc local call linkage failure
This corrects current_file_function_operand, an operand predicate used
to determine whether a symbol_ref is safe to use with the local_call
patterns. Calls between pcrel and non-pcrel code need to go via
linker stubs. In the case of non-pcrel code to pcrel the stub saves
r2 but there needs to be a nop after the branch for the r2 restore.
So the local_call patterns can't be used there. For pcrel code to
non-pcrel the local_call patterns could still be used, but I thought
it better to not use them since the call isn't direct. Code generated
by the corresponding call_nonlocal_aix for pcrel is identical anyway.
Incidentally, without the TREE_CODE () == FUNCTION_DECL test,
gcc.c-torture/compile/pr37433.c and pr37433-1.c ICE. Also, if you
make the test more strict by disallowing an op without a
SYMBOL_REF_DECL then a bunch of go and split-stack tests fail. That's
because a prologue call to __morestack can't have a following nop.
(__morestack calls its caller at a fixed offset from the __morestack
call!)
gcc/
PR target/96493
* config/rs6000/predicates.md (current_file_function_operand): Don't
accept functions that differ in r2 usage.
gcc/testsuite/
* gcc.target/powerpc/pr96493.c: New file.
This adds support for __sync_val_compare_and_swap and
__sync_bool_compare_and_swap for 1-byte and 2-byte long
values, which are not natively supported on nvptx.
when we AND masks in get_default_value we end up with 6 & 3 = 2 (0x010).
That means the only second (least significant bit) is unknown and
value (5 = 0x101) & ~mask gives us either 7 (0x111) or 5 (0x101).
That's why if (arg_2(D) == 3) gets optimized to false.
gcc/ChangeLog:
PR ipa/96482
* ipa-cp.c (ipcp_bits_lattice::meet_with_1): Drop value bits
for bits that are unknown.
(ipcp_bits_lattice::set_to_constant): Likewise.
* tree-ssa-ccp.c (get_default_value): Add sanity check that
IPA CP bit info has all bits set to zero in bits that
are unknown.
ipa/96291: don't crash on unoptimized lto functions
In PR ipa/96291 the test contained an SCC with one
unoptimized function. This tricked ipa-cp into NULL dereference.
has_undead_caller_from_outside_scc_p() did not take into account
that unoptimized funtions don't have IPA summary analysis. And
dereferenced NULL pointer causing an ICE.
Jose E. Marchesi [Wed, 12 Aug 2020 14:55:49 +0000 (16:55 +0200)]
bpf: more flexible support for kernel helpers
This patch changes the existing support for BPF kernel helpers to be
more flexible, in two main ways.
First, there is no longer a hardcoded list of kernel helpers defined
in the compiler internals. This is replaced by a new target-specific
attribute `kernel_helper' that the user can use to define her own
helpers, annotating function prototypes.
Second, following feedback from the kernel hackers, the pre-defined
helpers in the distributed bpf-helpers.h are no longer available
conditionally depending on the kernel version used in -mkernel. The
command-line option stays for now, as it may be useful for other
things.
Jose E. Marchesi [Wed, 12 Aug 2020 14:55:30 +0000 (16:55 +0200)]
bpf: do not save/restore callee-saved registers in function prolog/epilog
BPF considers that every call to a function allocates a fresh set of
registers that are available to the callee, of which the first five
may have bee initialized with the function arguments. This is
implemented by both interpreter and JIT in the Linux kernel.
This is enforced by the kernel BPF verifier, which will reject any
code in which non-initialized registers are accessed before being
written. Consequently, the spill instructions generated in function
prologue were causing the verifier to reject our compiled programs.
This patch makes GCC to not save/restore callee-saved registers in
function prologue/epilogue, unless xBPF mode is enabled.
2020-05-19 Jose E. Marchesi <jose.marchesi@oracle.com>
gcc/
* config/bpf/bpf.c (bpf_compute_frame_layout): Include space for
callee saved registers only in xBPF.
(bpf_expand_prologue): Save callee saved registers only in xBPF.
(bpf_expand_epilogue): Likewise for restoring.
* doc/invoke.texi (eBPF Options): Document this is activated by
-mxbpf.
gcc/testsuite/
* gcc.target/bpf/xbpf-callee-saved-regs-1.c: New test.
* gcc.target/bpf/xbpf-callee-saved-regs-2.c: Likewise.
Jose E. Marchesi [Wed, 12 Aug 2020 14:54:53 +0000 (16:54 +0200)]
bpf: add support for the -mxbpf option
This patch adds support for a new option -mxbpf. This tells GCC to
generate code for an expanded version of BPF that relaxes some of the
restrictions imposed by BPF.
Christophe Lyon [Wed, 12 Aug 2020 09:22:38 +0000 (09:22 +0000)]
testsuite: Fix gcc.target/arm/stack-protector-1.c for Cortex-M
The stack-protector-1.c test fails when compiled for Cortex-M:
- for Cortex-M0/M1, str r0, [sp #-8]! is not supported
- for Cortex-M3/M4..., the assembler complains that "use of r13 is
deprecated"
This patch replaces the str instruction with
sub sp, sp, #8
str r0, [sp]
and removes the check for r13, which is unlikely to leak the canary
value.
Patrick Palka [Thu, 30 Jul 2020 02:06:36 +0000 (22:06 -0400)]
c++: abbreviated function template friend matching [PR96106]
In the below testcase, duplicate_decls wasn't merging the tsubsted
friend declaration for 'void add(auto)' with its definition, because
reduce_template_parm_level (during tsubst_friend_function) lost the
DECL_VIRTUAL_P flag on the auto's invented template parameter, which
caused template_heads_equivalent_p to deem the two template heads as not
equivalent in C++20 mode.
This patch makes reduce_template_parm_level carry over the
DECL_VIRTUAL_P flag from the original TEMPLATE_PARM_DECL.
gcc/cp/ChangeLog:
PR c++/96106
* pt.c (reduce_template_parm_level): Propagate DECL_VIRTUAL_P
from the original TEMPLATE_PARM_DECL to the new lowered one.
gcc/testsuite/ChangeLog:
PR c++/96106
* g++.dg/concepts/abbrev7.C: New test.
Patrick Palka [Thu, 30 Jul 2020 02:06:33 +0000 (22:06 -0400)]
c++: constraints and explicit instantiation [PR96164]
When considering to instantiate a member of a class template as part of
an explicit instantiation of the class template, we need to first check
the member's constraints before proceeding with the instantiation of the
member.
gcc/cp/ChangeLog:
PR c++/96164
* constraint.cc (constraints_satisfied_p): Return true if
!flags_concepts.
* pt.c (do_type_instantiation): Update a paragraph taken from
[temp.explicit] to reflect the latest specification. Don't
instantiate a member with unsatisfied constraints.
gcc/testsuite/ChangeLog:
PR c++/96164
* g++.dg/cpp2a/concepts-explicit-inst5.C: New test.
For rtx like (eq:HI (V8SI 90) (V8SI 91)), cse will take it as a
boolean value and try to do some optimization. But it is not true for
vector compare, also other places in rtl passes hold the same
assumption.
Peter Bergner [Sat, 8 Aug 2020 16:54:48 +0000 (11:54 -0500)]
rs6000: MMA built-ins reject typedefs of MMA types
We do not allow conversions between the MMA types and other types.
However, we are being too strict in not matching MMA types with
typdefs of those types. Use TYPE_CANONICAL to see through the
types to their canonical types before comparing them.
2020-08-08 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/96530
* config/rs6000/rs6000.c (rs6000_invalid_conversion): Use canonical
types for type comparisons. Refactor code to simplify it.
gcc/testsuite/
PR target/96530
* gcc.target/powerpc/pr96530.c: New test.
Peter Bergner [Thu, 6 Aug 2020 15:03:03 +0000 (10:03 -0500)]
rs6000: Don't ICE when spilling an MMA accumulator
When we spill an accumulator that has a known zero value, LRA will emit
a new (set (reg:PXI ...) 0) insn, but it does not use the mma_xxsetaccz
pattern to do it, leading to an unrecognized insn ICE. The solution here
is to move the xxsetaccz instruction into the movpxi pattern and have the
xxsetaccz pattern call the move pattern.
2020-08-06 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/96446
* config/rs6000/mma.md (*movpxi): Add xxsetaccz generation.
Disable split for zero constant source operand.
(mma_xxsetaccz): Change to define_expand. Call gen_movpxi.
gcc/testsuite/
PR target/96446
* gcc.target/powerpc/pr96446.c: New test.
Jonathan Wakely [Fri, 7 Aug 2020 19:29:11 +0000 (20:29 +0100)]
libstdc++: Fix ambiguous comparisons in __gnu_debug::bitset [PR 96303]
With -pedantic the debug mode bitset has an ambiguous equality
comparison operator, because it tries to compare the non-debug base to
the debug object. The base object can be converted to another debug
bitset, making the same operator== a candidate again.
The fix is to do the comparison on both base objects, so the operator
for the derived type isn't a candidate.
For the inequality operator the same change should be done, but that
operator can be removed entirely for C++20 because it can be synthesized
by the compiler.
I don't think either equality or inequality operators are really needed,
because the public _GLIBCXX_STD_C::bitset base class cam always be
compared using its own comparison operators. I'm not changing that here
though.
libstdc++-v3/ChangeLog:
PR libstdc++/96303
* include/debug/bitset (bitset::operator==): Call _M_base() on
right operand.
(bitset::operator!=): Likewise, but don't define it at all when
default comparisons are supported by the compiler.
* testsuite/23_containers/bitset/operations/96303.cc: New test.
Tamar Christina [Mon, 3 Aug 2020 11:03:17 +0000 (12:03 +0100)]
AArch64: Fix hwasan failure in readline.
My previous fix added an unchecked call to fgets in the new function readline.
fgets can fail when there's an error reading the file in which case it returns
NULL. It also returns NULL when the next character is EOF.
The EOF case is already covered by the existing code but the error case isn't.
This fixes it by returning the empty string on error.
Also I now use strnlen instead of strlen to make sure we never read outside the
buffer.
This was flagged by Matthew Malcomson during his hwasan work.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.c (readline): Check return value fgets.
Tamar Christina [Fri, 17 Jul 2020 12:13:12 +0000 (13:13 +0100)]
AArch64: Add test for -mcpu=native
This adds some tests to the GCC testsuite for testing the
-mcpu=native code.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/aarch64-cpunative.exp: New test.
* gcc.target/aarch64/cpunative/info_0: New test.
* gcc.target/aarch64/cpunative/info_1: New test.
* gcc.target/aarch64/cpunative/info_10: New test.
* gcc.target/aarch64/cpunative/info_11: New test.
* gcc.target/aarch64/cpunative/info_12: New test.
* gcc.target/aarch64/cpunative/info_13: New test.
* gcc.target/aarch64/cpunative/info_14: New test.
* gcc.target/aarch64/cpunative/info_15: New test.
* gcc.target/aarch64/cpunative/info_2: New test.
* gcc.target/aarch64/cpunative/info_3: New test.
* gcc.target/aarch64/cpunative/info_4: New test.
* gcc.target/aarch64/cpunative/info_5: New test.
* gcc.target/aarch64/cpunative/info_6: New test.
* gcc.target/aarch64/cpunative/info_7: New test.
* gcc.target/aarch64/cpunative/info_8: New test.
* gcc.target/aarch64/cpunative/info_9: New test.
* gcc.target/aarch64/cpunative/native_cpu_0.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_1.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_10.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_11.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_12.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_13.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_14.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_15.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_2.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_3.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_4.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_5.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_6.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_7.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_8.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_9.c: New test.
Tamar Christina [Fri, 17 Jul 2020 12:10:28 +0000 (13:10 +0100)]
AArch64: Fix bugs in -mcpu=native detection.
This patch fixes a couple of issues in AArch64's -mcpu=native processing:
The buffer used to read the lines from /proc/cpuinfo is 128 bytes long. While
this was enough in the past with the increase in architecture extensions it is
no longer enough. It results in two bugs:
1) No option string longer than 127 characters is correctly parsed. Features
that are supported are silently ignored.
2) It incorrectly enables features that are not present on the machine:
a) It checks for substring matching instead of full word matching. This makes
it incorrectly detect sb support when ssbs is provided instead.
b) Due to the truncation at the 127 char border it also incorrectly enables
features due to the full feature being cut off and the part that is left
accidentally enables something else.
This breaks -mcpu=native detection on some of our newer system.
The patch fixes these issues by reading full lines up to the \n in a string.
This gives us the full feature line. Secondly it creates a set from this string
to:
1) Reduce matching complexity from O(n*m) to O(n*logm).
2) Perform whole word matching instead of substring matching.
To make this code somewhat cleaner I also changed from using char* to using
std::string and std::set.
Note that I have intentionally avoided the use of ifstream and stringstream
to make it easier to backport. I have also not change the substring matching
for the initial line classification as I cannot find a documented cpuinfo format
which leads me to believe there may be kernels out there that require this which
may be why the original code does this.
I also do not want this to break if the kernel adds a new line that is long and
indents the file by two tabs to keep everything aligned. In short I think an
imprecise match is the right thing here.
Test for this is added as the last thing in this series as it requires some
changes to be made to be able to test this.
arm: Clear canary value after stack_protect_test [PR96191]
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/arm/arm.md (arm_stack_protect_test_insn): Zero out
operand 2 after use.
* config/arm/thumb1.md (thumb1_stack_protect_test_insn): Likewise.
gcc/testsuite/
* gcc.target/arm/stack-protector-1.c: New test.
* gcc.target/arm/stack-protector-2.c: Likewise.
aarch64: Clear canary value after stack_protect_test [PR96191]
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/aarch64/aarch64.md (stack_protect_test_<mode>): Set the
CC register directly, instead of a GPR. Replace the original GPR
destination with an extra scratch register. Zero out operand 3
after use.
(stack_protect_test): Update accordingly.
gcc/testsuite/
PR target/96191
* gcc.target/aarch64/stack-protector-1.c: New test.
* gcc.target/aarch64/stack-protector-2.c: Likewise.