]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
2 years agoamdgcn: Fix expansion of builtin for vector fabs operation
Kwok Cheung Yeung [Tue, 1 Nov 2022 23:05:44 +0000 (23:05 +0000)] 
amdgcn: Fix expansion of builtin for vector fabs operation

2022-11-01  Kwok Cheung Yeung  <kcy@codesourcery.com>

* config/gcn/gcn.cc (gcn_expand_builtin_1): Fix expansion of
GCN_BUILTIN_FABSV.

2 years agoopenmp: Bugfix in omp_expand_metadirective for same blocks/edges to be deleted.
Marcel Vollweiler [Tue, 1 Nov 2022 13:05:18 +0000 (13:05 +0000)] 
openmp: Bugfix in omp_expand_metadirective for same blocks/edges to be deleted.

This patch handles an ICE that is thrown in omp_expand_metadirective when a
basic_block for a metadirective label is tried to be deleted multiple times.
To avoid this situation, processed labels are added to the already existing
list of labels that are not intended to be deleted.

The issue occured in the attached test case.

gcc/ChangeLog:

* omp-expand-metadirective.cc (omp_expand_metadirective): Add already
processed labels to "labels" (the list of labels not to be deleted).

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/metadirective-8.c: New test.

2 years agoamdgcn: add fmin/fmax patterns
Andrew Stubbs [Fri, 28 Oct 2022 12:09:20 +0000 (13:09 +0100)] 
amdgcn: add fmin/fmax patterns

Add fmin/fmax for scalar, vector, and reductions.  The smin/smax patterns are
already using the IEEE compliant hardware instructions anyway, so we can just
expand to use those insns.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (fminmaxop): New iterator.
(<fexpander><mode>3): New define_expand.
(<fexpander><mode>3<exec>): Likewise.
(reduc_<fexpander>_scal_<mode>): Likewise.
* config/gcn/gcn.md (fexpander): New attribute.

(cherry picked from commit 10aa0356118f44e5f4d720a2a4c731b173baa298)

2 years agoamdgcn: multi-size vector reductions
Andrew Stubbs [Fri, 28 Oct 2022 11:38:43 +0000 (12:38 +0100)] 
amdgcn: multi-size vector reductions

Add support for vector reductions for any vector width by switching iterators
and generalising the code slightly.  There's no one-instruction way to move an
item from lane 31 to lane 0 (63, 15, 7, 3, and 1 are all fine though), and
vec_extract is probably fewer cycles anyway, so now we always reduce to an
SGPR.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (V64_SI): Delete iterator.
(V64_DI): Likewise.
(V64_1REG): Likewise.
(V64_INT_1REG): Likewise.
(V64_2REG): Likewise.
(V64_ALL): Likewise.
(V64_FP): Likewise.
(reduc_<reduc_op>_scal_<mode>): Use V_ALL. Use gen_vec_extract.
(fold_left_plus_<mode>): Use V_FP.
(*<reduc_op>_dpp_shr_<mode>): Use V_1REG.
(*<reduc_op>_dpp_shr_<mode>): Use V_DI.
(*plus_carry_dpp_shr_<mode>): Use V_INT_1REG.
(*plus_carry_in_dpp_shr_<mode>): Use V_SI.
(*plus_carry_dpp_shr_<mode>): Use V_DI.
(mov_from_lane63_<mode>): Delete.
(mov_from_lane63_<mode>): Delete.
* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Support partial vectors.
* config/gcn/gcn.md (unspec): Remove UNSPEC_MOV_FROM_LANE63.

(cherry picked from commit f539029c1ce6fb9163422d1a8b6ac12a2554eaa2)

2 years agoopenmp: Allow optional comma after directive-specifier in C/C++
Jakub Jelinek [Fri, 28 Oct 2022 09:03:56 +0000 (11:03 +0200)] 
openmp: Allow optional comma after directive-specifier in C/C++

Previously we've been allowing that comma only in C++ when in attribute
form (which was the reason why it has been allowed), but 5.1 allows that
even in pragma form in C/C++ (with clarifications in 5.2) and 5.2
also in Fortran (which this patch doesn't implement).

Note, for directives which take an argument (== unnamed clause),
comma is not allowed in between the directive name and the argument,
like the directive-1.c testcase shows.

2022-10-28  Jakub Jelinek  <jakub@redhat.com>

gcc/c/
* c-parser.cc (c_parser_omp_all_clauses): Allow optional
comma before the first clause.
(c_parser_omp_allocate, c_parser_omp_atomic, c_parser_omp_depobj,
c_parser_omp_flush, c_parser_omp_scan_loop_body,
c_parser_omp_ordered, c_finish_omp_declare_variant,
c_parser_omp_declare_target, c_parser_omp_declare_reduction,
c_parser_omp_requires, c_parser_omp_error,
c_parser_omp_assumption_clauses): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_all_clauses): Allow optional comma
before the first clause even in pragma syntax.
(cp_parser_omp_allocate, cp_parser_omp_atomic, cp_parser_omp_depobj,
cp_parser_omp_flush, cp_parser_omp_scan_loop_body,
cp_parser_omp_ordered, cp_parser_omp_assumption_clauses,
cp_finish_omp_declare_variant, cp_parser_omp_declare_target,
cp_parser_omp_declare_reduction_exprs, cp_parser_omp_requires,
cp_parser_omp_error): Likewise.
gcc/testsuite/
* c-c++-common/gomp/directive-1.c: New test.
* c-c++-common/gomp/clauses-6.c: New test.
* c-c++-common/gomp/declare-variant-2.c (f75a): Declare.
(f75): Use f75a as variant instead of f1 and don't expect error.
* g++.dg/gomp/clause-4.C (foo): Don't expect error on comma
before first clause.
* gcc.dg/gomp/clause-2.c (foo): Likewise.

(cherry picked from commit 89999f2358724fa4e71c7c3b4de340582c0e43da)

2 years agoMerge commit '9b116c51a451995f1bae8fdac0748fcf3f06aafe'
Thomas Schwinge [Fri, 28 Oct 2022 08:46:23 +0000 (10:46 +0200)] 
Merge commit '9b116c51a451995f1bae8fdac0748fcf3f06aafe'

2 years ago[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks: ChangeLog
Thomas Schwinge [Fri, 28 Oct 2022 08:43:24 +0000 (10:43 +0200)] 
[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks: ChangeLog

... forgotten in og12 commit 9a50d282f03f7f1e1ad00de917143a2a8e0c0ee0
"[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks".

2 years agoOpenACC: Don't gang-privatize artificial variables [PR90115]
Julian Brown [Wed, 12 Oct 2022 20:44:57 +0000 (20:44 +0000)] 
OpenACC: Don't gang-privatize artificial variables [PR90115]

This patch prevents compiler-generated artificial variables from being
treated as privatization candidates for OpenACC.

The rationale is that e.g. "gang-private" variables actually must be
shared by each worker and vector spawned within a particular gang, but
that sharing is not necessary for any compiler-generated variable (at
least at present, but no such need is anticipated either).  Variables on
the stack (and machine registers) are already private per-"thread"
(gang, worker and/or vector), and that's fine for artificial variables.

We're restricting this to blocks, as we still need to understand what it
means for a 'DECL_ARTIFICIAL' to appear in a 'private' clause.

Several tests need their scan output patterns adjusted to compensate.

2022-10-14  Julian Brown  <julian@codesourcery.com>

PR middle-end/90115
gcc/
* omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not
privatization candidates.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
* testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/print-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
(cherry picked from commit 11e811d8e2f63667f60f73731bb934273f5882b8)

2 years ago[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks
Thomas Schwinge [Tue, 18 Oct 2022 14:59:54 +0000 (16:59 +0200)] 
[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks

Follow-up to og12 commit d4504346d2a1d6ffecb8b2d8e3e04ab8ea259785
"[og12] OpenACC: Don't gang-privatize artificial variables", to restore
the previous behavior, until we understand what it means for a
'DECL_ARTIFICIAL' to appear in a 'private' clause.

gcc/
* omp-low.cc (oacc_privatization_candidate_p) <DECL_ARTIFICIAL>:
Restrict to 'block's.
libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust.

2 years agoResolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses':...
Thomas Schwinge [Fri, 28 Oct 2022 07:55:22 +0000 (09:55 +0200)] 
Resolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses': ChangeLog

... forgotten in og12 commit 4e32d1582a137d5f34248fdd3e93d35a798f5221
"Resolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses'".

2 years agoResolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses'
Thomas Schwinge [Tue, 25 Oct 2022 07:45:31 +0000 (09:45 +0200)] 
Resolve '-Wsign-compare' issue in 'gcc/omp-low.cc:lower_rec_simd_input_clauses'

..., introduced in og12 commit 55722a87dd223149dcd41ca9c8eba16ad5b3eddc
"openmp: fix max_vf setting for amdgcn offloading":

    In file included from [...]/source-gcc/gcc/coretypes.h:482,
                     from [...]/source-gcc/gcc/omp-low.cc:27:
    [...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly<Ca, bool>::type maybe_lt(const Ca&, const poly_int_pod<N, Cb>&) [with unsigned int N = 1; Ca = int; Cb = long unsigned int; typename if_nonpoly<Ca, bool>::type = bool]’:
    [...]/source-gcc/gcc/poly-int.h:1510:7:   required from ‘poly_int<N, typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type> ordered_max(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type = long unsigned int; typename if_nonpoly<Cb>::type = int]’
    [...]/source-gcc/gcc/omp-low.cc:5180:33:   required from here
    [...]/source-gcc/gcc/poly-int.h:1384:12: error: comparison of integer expressions of different signedness: ‘const int’ and ‘const long unsigned int’ [-Werror=sign-compare]
     1384 |   return a < b.coeffs[0];
          |          ~~^~~~~~~~~~~
    [...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly<Cb, bool>::type maybe_lt(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]’:
    [...]/source-gcc/gcc/poly-int.h:1515:2:   required from ‘poly_int<N, typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type> ordered_max(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type = long unsigned int; typename if_nonpoly<Cb>::type = int]’
    [...]/source-gcc/gcc/omp-low.cc:5180:33:   required from here
    [...]/source-gcc/gcc/poly-int.h:1373:22: error: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const int’ [-Werror=sign-compare]
     1373 |   return a.coeffs[0] < b;
          |          ~~~~~~~~~~~~^~~

gcc/
* omp-low.cc (lower_rec_simd_input_clauses): For 'ordered_max',
cast 'omp_max_simt_vf ()', 'omp_max_simd_vf ()' to 'unsigned'.

2 years agoFix target selector syntax in 'gcc.dg/vect/bb-slp-cond-1.c'
Thomas Schwinge [Tue, 25 Oct 2022 11:10:52 +0000 (13:10 +0200)] 
Fix target selector syntax in 'gcc.dg/vect/bb-slp-cond-1.c'

... to restore testing lost in recent
commit r13-3225-gbd9a05594d227cde79a67dc715bd9d82e9c464e9
"amdgcn: vector testsuite tweaks" (for example, x86_64-pc-linux-gnu):

    PASS: gcc.dg/vect/bb-slp-cond-1.c (test for excess errors)
    PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  scan-tree-dump vect "(no need for alias check [^\\n]* when VF is 1|no alias between [^\\n]* when [^\\n]* is outside \\(-16, 16\\))"
    [-PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "loop vectorized" 1-]
    PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects (test for excess errors)
    PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects execution test
    PASS: gcc.dg/vect/bb-slp-cond-1.c execution test
    PASS: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump vect "(no need for alias check [^\\n]* when VF is 1|no alias between [^\\n]* when [^\\n]* is outside \\(-16, 16\\))"
    [-PASS: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1-]

gcc/testsuite/
* gcc.dg/vect/bb-slp-cond-1.c: Fix target selector syntax.

(cherry picked from commit 0607307768b66a90e27c5bc91a247acc938f070e)

2 years agoDaily bump.
GCC Administrator [Fri, 28 Oct 2022 00:22:49 +0000 (00:22 +0000)] 
Daily bump.

2 years agoopenacc: Revert erroneous gang reduction changes
Marcel Vollweiler [Thu, 27 Oct 2022 12:43:00 +0000 (12:43 +0000)] 
openacc: Revert erroneous gang reduction changes

This patch reverts some changes related to "gang reduction on an orphan loop"
of commit 3a5e525489f2f808093ae1f12b5d2b406f571ec7 "Various OpenACC reduction
enhancements - FE change" similar to the mainline commit
77d24d43644909852998043335b5a0e09d1e8f02.

gcc/c/ChangeLog:

* c-typeck.cc (c_finish_omp_clauses): Remove "gang reduction on an
orphan loop" checking.

gcc/cp/ChangeLog:

* semantics.cc (finish_omp_clauses): Remove "gang reduction on an
orphan loop" checking.

gcc/fortran/ChangeLog:

* openmp.cc (oacc_is_parallel): Remove.
(resolve_oacc_loop_blocks): Remove "gang reduction on an orphan loop"
checking.

2 years agoopenacc: Revert erroneous gang reduction change in OpenAcc test
Marcel Vollweiler [Wed, 26 Oct 2022 14:45:41 +0000 (14:45 +0000)] 
openacc: Revert erroneous gang reduction change in OpenAcc test

This patch reverts an erroneous modification of an OpenAcc test case in
d27d6c9e1e3bc18ba0113757b743b306ea69f825 "Various OpenACC reduction
enhancements - test cases".

The same reversion was already done on mainline with commit
77d24d43644909852998043335b5a0e09d1e8f02.

gcc/testsuite/ChangeLog:

        * gfortran.dg/goacc/orphan-reductions-1.f90: Adjust.

2 years agoMerge branch 'releases/gcc-12' into devel/omp/gcc-12
Tobias Burnus [Thu, 27 Oct 2022 11:44:50 +0000 (13:44 +0200)] 
Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8872-gca0220d42e075194ca1341c98a0a9a9f3fd1c719 (27th Oct 2022)

2 years agolto: do not load LTO stream for aliases [PR107418]
Martin Liska [Thu, 27 Oct 2022 08:29:17 +0000 (10:29 +0200)] 
lto: do not load LTO stream for aliases [PR107418]

PR lto/107418

gcc/lto/ChangeLog:

* lto-dump.cc (lto_main): Do not load LTO stream for aliases.

(cherry picked from commit be6c75547385c69706370f4e792b04295f708a5a)

2 years agoIRA: Make sure array is big enough
Torbjörn SVENSSON [Tue, 25 Oct 2022 09:45:40 +0000 (11:45 +0200)] 
IRA: Make sure array is big enough

In commit 081c96621da, the call to resize_reg_info() was moved before
the call to remove_scratches() and the latter one can increase the
number of regs and that would cause an out of bounds usage on the
reg_renumber global array.

Without this patch, the following testcase randomly fails with:
during RTL pass: ira
In file included from /src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:13:
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c: In function 'checkgSf13':
/src/gcc/testsuite/gcc.dg/compat/fp-struct-test-by-value-y.h:28:1: internal compiler error: Segmentation fault
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:22:1: note: in expansion of macro 'TEST'

gcc/ChangeLog:

* ira.cc: Resize array after reg number increased.

Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 4e1d704243a4f3c4ded47cd0d02427bb7efef069)

2 years agoDaily bump.
GCC Administrator [Thu, 27 Oct 2022 00:23:02 +0000 (00:23 +0000)] 
Daily bump.

2 years agoaarch64: update Ampere-1 core definition
Philipp Tomsich [Sun, 7 Aug 2022 22:30:52 +0000 (00:30 +0200)] 
aarch64: update Ampere-1 core definition

This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.

Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
Ampere-1 core entry.

(cherry picked from commit db2f5d661239737157cf131de7d4df1c17d8d88d)

2 years agoaarch64: fix off-by-one in reading cpuinfo
Philipp Tomsich [Mon, 3 Oct 2022 19:59:50 +0000 (21:59 +0200)] 
aarch64: fix off-by-one in reading cpuinfo

Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string.  This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.

gcc/ChangeLog:

* config/aarch64/driver-aarch64.cc (readline): Fix off-by-one.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_18: New test.
* gcc.target/aarch64/cpunative/native_cpu_18.c: New test.

(cherry picked from commit b1cfbccc41de6aec950c0f662e7e85ab34bfff8a)

2 years agoHandle operator new with alignment in usm transform.
Hafiz Abid Qadeer [Wed, 26 Oct 2022 08:51:01 +0000 (09:51 +0100)] 
Handle operator new with alignment in usm transform.

Since C++17, the is a variant of operator new with alignment. This
patch converts it to omp_aligned_alloc when unified shared memory is
being used.

gcc/ChangeLog:

* omp-low.cc (usm_transform): Handle operator new with alignment.

libgomp/ChangeLog:

* testsuite/libgomp.c++/usm-2.C: New test.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/usm-4.C: New test.
* g++.dg/gomp/usm-5.C: New test.

2 years agoDaily bump.
GCC Administrator [Wed, 26 Oct 2022 00:21:29 +0000 (00:21 +0000)] 
Daily bump.

2 years agoOpenAcc: Correction of reduction enhancement
Marcel Vollweiler [Tue, 25 Oct 2022 15:12:42 +0000 (08:12 -0700)] 
OpenAcc: Correction of reduction enhancement

Commit bce2c92cfec2ae1eb9d79e36dff5a220b688bfa1 "Various OpenACC reduction
enhancements - ME and nvptx changes" introduced several regressions:

        gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c
        gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c
        gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
        gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f90
        gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f90
        gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

This fixes above regressions.

gcc/ChangeLog:

        * omp-offload.cc (oacc_loop_auto_partitions): Removed OLF reduction
        handling.

2 years agoRelax assertion in profiler
Eric Botcazou [Tue, 25 Oct 2022 10:20:33 +0000 (12:20 +0200)] 
Relax assertion in profiler

This assertion in branch_prob:

  if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
    {
      location_t loc = DECL_SOURCE_LOCATION (current_function_decl);
      gcc_checking_assert (!RESERVED_LOCATION_P (loc));

had been correct until the fix for PR debug/101598 was installed.

gcc/
* profile.cc (branch_prob): Be prepared for ignored functions with
DECL_SOURCE_LOCATION set to UNKNOWN_LOCATION.

gcc/testsuite:
* gnat.dg/specs/coverage1.ads: New test.
* gnat.dg/specs/variant_part.ads: Minor tweak.
* gnat.dg/specs/weak1.ads: Add dg directive.

2 years agoIBM zSystems: Fix function_ok_for_sibcall [PR106355]
Stefan Schulze Frielinghaus [Wed, 19 Oct 2022 12:28:22 +0000 (14:28 +0200)] 
IBM zSystems: Fix function_ok_for_sibcall [PR106355]

For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers.  Streamlined this with
the implementation of s390_function_arg.

Fix some indentation whitespace, too.

gcc/ChangeLog:

PR target/106355
* config/s390/s390.cc (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.

gcc/testsuite/ChangeLog:

* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.

(cherry picked from commit cb994acc08b67f26a54e7c5dc1f4995a2ce24d98)

2 years agoi386: fix pedantic warning
Martin Liska [Tue, 25 Oct 2022 04:16:03 +0000 (06:16 +0200)] 
i386: fix pedantic warning

PR target/107364

gcc/ChangeLog:

* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Fix pedantic warning.

(cherry picked from commit f3f000b7689ce9eb6364808072025672af1e4e1b)

2 years agox86: fix VENDOR_MAX enum value
Martin Liska [Mon, 24 Oct 2022 13:34:39 +0000 (15:34 +0200)] 
x86: fix VENDOR_MAX enum value

PR target/107364

gcc/ChangeLog:

* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
  Reorder enum values as BUILTIN_VENDOR_MAX should not point
  in the middle of the valid enum values.

(cherry picked from commit f751bf4c5d1aaa1aacfcbdec62881c5ea1175dfb)

2 years agoDaily bump.
GCC Administrator [Tue, 25 Oct 2022 00:22:05 +0000 (00:22 +0000)] 
Daily bump.

2 years agolibgomp/nvptx: Prepare for reverse-offload callback handling, resolve spurious SIGSEGVs
Thomas Schwinge [Mon, 24 Oct 2022 19:59:37 +0000 (21:59 +0200)] 
libgomp/nvptx: Prepare for reverse-offload callback handling, resolve spurious SIGSEGVs

Per commit r13-3460-g131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
I'm seeing a lot of libgomp execution test regressions.  Random
example, 'libgomp.c-c++-common/error-1.c':

    [...]
      GOMP_OFFLOAD_run: kernel main$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 8), 1]

    Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
    0x00007ffff793b87d in GOMP_OFFLOAD_run (ord=<optimized out>, tgt_fn=<optimized out>, tgt_vars=<optimized out>, args=<optimized out>) at [...]/source-gcc/libgomp/plugin/plugin-nvptx.c:2127
    2127            if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0)
    (gdb) print ptx_dev
    $1 = (struct ptx_device *) 0x6a55a0
    (gdb) print ptx_dev->rev_data
    $2 = (struct rev_offload *) 0xffffffff00000000
    (gdb) print ptx_dev->rev_data->fn
    Cannot access memory at address 0xffffffff00000000

libgomp/
* plugin/plugin-nvptx.c (nvptx_open_device): Initialize
'ptx_dev->rev_data'.

(cherry picked from commit 205538832b7033699047900cf25928f5920d8b93)

2 years agovect: WORKAROUND vectorizer bug
Andrew Stubbs [Fri, 21 Oct 2022 13:19:31 +0000 (14:19 +0100)] 
vect: WORKAROUND vectorizer bug

This patch disables vectorization of memory accesses to non-default address
spaces where the pointer size is different to the usual pointer size.  This
condition typically occurs in OpenACC programs on amdgcn, where LDS memory is
used for broadcasting gang-private variables between threads. In particular,
see libgomp.oacc-c-c++-common/private-variables.c

The problem is that the address space information is dropped from the various
types in the middle-end and eventually it triggers an ICE trying to do an
address conversion.  That ICE can be avoided by defining
POINTERS_EXTEND_UNSIGNED, but that just produces wrong RTL code later on.

A correct solution would ensure that all the vectypes have the correct address
spaces, but I don't have time for that right now.

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_analyze_data_refs): Workaround an
address-space bug.

2 years agoamdgcn: disallow USM on gfx908
Andrew Stubbs [Tue, 18 Oct 2022 15:22:53 +0000 (16:22 +0100)] 
amdgcn: disallow USM on gfx908

It does work, but not well and only with the amdgpu.noreply=0 kernel boot
option.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_init_cumulative_args): Disallow gfx908.

2 years agoamdgcn, libgomp: USM allocation update
Andrew Stubbs [Sat, 15 Oct 2022 22:38:50 +0000 (23:38 +0100)] 
amdgcn, libgomp: USM allocation update

Allocate Unified Shared Memory via malloc and hsa_amd_svm_attributes_set,
instead of hsa_allocate_memory.  This scheme should be more efficient for
for memory that is first accessed by the CPU.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED): New.
(HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): New.
(HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG): New.
(HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED): New.
(hsa_amd_svm_attribute_pair_t): New.
(struct hsa_runtime_fn_info): Add hsa_amd_svm_attributes_set_fn.
(dump_hsa_system_info): Dump HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED and
HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT.
(DLSYM_OPT_FN): New.
(init_hsa_runtime_functions): Add hsa_amd_svm_attributes_set.
(GOMP_OFFLOAD_usm_alloc): Use malloc and hsa_amd_svm_attributes_set.
(GOMP_OFFLOAD_usm_free): Use regular free.
* testsuite/libgomp.c/usm-1.c: Add -mxnack=on for amdgcn.
* testsuite/libgomp.c/usm-2.c: Likewise.
* testsuite/libgomp.c/usm-3.c: Likewise.
* testsuite/libgomp.c/usm-4.c: Likewise.

2 years agolibgomp/nvptx: Prepare for reverse-offload callback handling
Tobias Burnus [Mon, 24 Oct 2022 15:16:29 +0000 (17:16 +0200)] 
libgomp/nvptx: Prepare for reverse-offload callback handling

This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.

include/ChangeLog:

* cuda/cuda.h (enum CUdevice_attribute): Add
CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
(CU_MEMHOSTALLOC_DEVICEMAP): Define.
(cuMemHostAlloc): Add prototype.

libgomp/ChangeLog:

* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
'static' for this variable.
* config/nvptx/libgomp-nvptx.h: New file.
* config/nvptx/target.c: Include it.
(GOMP_ADDITIONAL_ICVS): Declare extern var.
(GOMP_REV_OFFLOAD_VAR): Declare var.
(GOMP_target_ext): Handle reverse offload.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
* target.c (gomp_target_rev): ... this new stub function.
* libgomp.h (gomp_target_rev): Declare.
* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
* plugin/cuda-lib.def (cuMemHostAlloc): Add.
* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
(struct ptx_device): Add rev_data member.
(nvptx_open_device): Remove async_engines query, last used in
r10-304-g1f4c5b9b; add unified-address assert check.
(GOMP_OFFLOAD_get_num_devices): Claim unified address
support.
(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
offload functions exist. Make offload var available
on host and device.
(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
(GOMP_OFFLOAD_run): Handle reverse offload.

(cherry picked from commit 052dfa279f5de90b324d60cf787e821b18cf496c)

2 years agogcc/testsuite: Change 'cunrolli' to 'cunrolli1' in dump scan + options
Tobias Burnus [Mon, 24 Oct 2022 14:50:25 +0000 (16:50 +0200)] 
gcc/testsuite: Change 'cunrolli' to 'cunrolli1' in dump scan + options

The OG12 commit
  3e8b51d143e openacc: Move pass_oacc_device_lower after pass_graphite
adds a new pass, which also re-invokes some previous passes. This
seems to have the effect that the pass names and, hence, the dump
names have now a tailing number.

In that commit, 'cunrolli' was changed to 'cunrolli1 for some testcases.
This commit does likewise for some more testcases. In particular,
. in scan-tree-dump the tailing '1' is crucial to change UNRESOLVED
  to PASS.
. for "-fdisable-tree-cunrolli1" option, it changes FAIL (excess errors)
  to PASS
. even without the change, "-fdump-tree-cunrolli{-details,-optimized}"
  has PASS, but I believe the tailing 1 ensures that only the first
  'cunrolli' dumps.

gcc/testsuite
* g++.dg/ext/unroll-1.C: Change 'cunrolli' to 'cunrolli1' in
dg-options and scan-tree-dump.
* g++.dg/ext/unroll-2.C: Likewise.
* g++.dg/ext/unroll-3.C: Likewise.
* g++.dg/vect/pr36648.cc: Likewise.
* gcc.dg/tree-prof/init-array.c: Likewise.
* gcc.dg/tree-ssa/pr100359.c: Likewise.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gfortran.dg/directive_unroll_1.f90: Likewise.
* gfortran.dg/directive_unroll_4.f90: Likewise.
* gnat.dg/unroll1.adb: Likewise.
* gnat.dg/unroll2.adb: Likewise.

2 years agoOpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]
Tobias Burnus [Mon, 24 Oct 2022 13:23:43 +0000 (15:23 +0200)] 
OpenMP: Fix reverse offload GOMP_TARGET_REV IFN corner cases [PR107236]

For 'target parallel' and similarly nested directives, cgraph_node's
calls_declare_variant_alt was not set in the parent region node but in
cfun->decl. Hence, pass_omp_device_lower did not process handle the
internal function GOMP_TARGET_REV. - Solution is to set it to the
DECL_CONTEXT, which is set in adjust_context_and_scope.

The cgraph_node::create_clone issue is exposed with -O2 for the existing
libgomp.fortran/reverse-offload-1.f90.

PR middle-end/107236

gcc/ChangeLog:
* omp-expand.cc (expand_omp_target): Set calls_declare_variant_alt
in DECL_CONTEXT and not to cfun->decl.
* cgraphclones.cc (cgraph_node::create_clone): Copy also the
node's calls_declare_variant_alt value.

gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/target-device-ancestor-6.f90: New test.

(cherry picked from commit 178ac530fe67e4f2fc439cc4ce89bc19d571ca31)

2 years agoMerge branch 'releases/gcc-12' into devel/omp/gcc-12
Tobias Burnus [Mon, 24 Oct 2022 10:09:01 +0000 (12:09 +0200)] 
Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8861-g1ccec25cf0c3c9cfd5882c83fd8cc56ea2987bad (24th Oct 2022)

2 years agoMissing pr104517.c change from: 'Add a restriction on allocate clause (OpenMP 5.0)'
Tobias Burnus [Mon, 24 Oct 2022 08:53:36 +0000 (10:53 +0200)] 
Missing pr104517.c change from: 'Add a restriction on allocate clause (OpenMP 5.0)'

OG12 commit df47c25110474565f521508a1545232550052a75 included everything
of r13-150-g1a8c4d9ed36556a95bd7d53c04d2ec4c95594061 but the change to
gcc/testsuite/gcc.dg/gomp/pr104517.c

This commit cherry-picks the missing changes to that file.
Note: The OG12 commit already contained the ChangeLog.omp entry for
that file.

gcc/testsuite/
* gcc.dg/gomp/pr104517.c: Update.

(cherry picked from commit 1a8c4d9ed36556a95bd7d53c04d2ec4c95594061)

2 years agoDaily bump.
GCC Administrator [Mon, 24 Oct 2022 00:20:43 +0000 (00:20 +0000)] 
Daily bump.

2 years agoFortran: error recovery with references of bad array constructors [PR105633]
Harald Anlauf [Wed, 19 Oct 2022 20:37:56 +0000 (22:37 +0200)] 
Fortran: error recovery with references of bad array constructors [PR105633]

gcc/fortran/ChangeLog:

PR fortran/105633
* expr.cc (find_array_section): Move check for NULL pointers so
that both subscript triplets and vector subscripts are covered.

gcc/testsuite/ChangeLog:

PR fortran/105633
* gfortran.dg/pr105633.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
(cherry picked from commit ecb20df4fa6d99daa635c7fb662dc0554610777e)

2 years agoDaily bump.
GCC Administrator [Sun, 23 Oct 2022 00:21:09 +0000 (00:21 +0000)] 
Daily bump.

2 years agoDaily bump.
GCC Administrator [Sat, 22 Oct 2022 00:21:07 +0000 (00:21 +0000)] 
Daily bump.

2 years agoomp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG
Tobias Burnus [Fri, 21 Oct 2022 13:31:25 +0000 (15:31 +0200)] 
omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG

GIMPLE_DEBUG were put in a parallel region of its own, which is not
only pointless but also breaks -fcompare-debug. With this commit,
they are handled like simple assignments: those placed are places
into the same body as the loop such that only one parallel region
remains as without debugging. This fixes the existing testcase
libgomp.oacc-c-c++-common/kernels-loop-g.c.

Note: GIMPLE_DEBUG are only accepted with -fcompare-debug; if they
appear otherwise, decompose_kernels_region_body rejects them with
a sorry (unchanged).

gcc/
* omp-oacc-kernels-decompose.cc (top_level_omp_for_in_stmt,
decompose_kernels_region_body): Handle GIMPLE_DEBUG like
simple assignment.

2 years agoAdd 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]
Thomas Schwinge [Mon, 17 Oct 2022 22:13:47 +0000 (00:13 +0200)] 
Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]

After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]",
"big" private data now works for GCN offloading, too.

PR target/105421
libgomp/
* testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New.

(cherry picked from commit c7ebee2378426eeca425ca5406af213a926f154c)

2 years agoamdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]
Julian Brown [Fri, 14 Oct 2022 11:06:07 +0000 (11:06 +0000)] 
amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]

The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.

I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong.  For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables.  That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.

2022-10-14  Julian Brown  <julian@codesourcery.com>

PR target/105421
gcc/
* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
argument forces FLAT addressing mode, not just
pointer-to-non-aggregate.

(cherry picked from commit 7c55755d4c760de326809636531478fd7419e1e5)

2 years agoAdded "noclone" to scan-tree-dump for several OpenAcc tests.
Marcel Vollweiler [Fri, 21 Oct 2022 09:56:46 +0000 (02:56 -0700)] 
Added "noclone" to scan-tree-dump for several OpenAcc tests.

This fixes multiple tests in addition to
b0256655fb402f87c921cd782b873dd301760ebd.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
Add "noclone" in scan-tree-dump.
* c-c++-common/goacc/kernels-acc-loop-reduction.c: Likewise.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Likewise.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Likewise.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-parloops.f95: Likewise.

2 years agotree-optimization/107323 - loop distribution partition ordering issue
Richard Biener [Fri, 21 Oct 2022 07:45:44 +0000 (09:45 +0200)] 
tree-optimization/107323 - loop distribution partition ordering issue

The following reverts part of the PR94125 fix which causes us to
use a bogus partition ordering after applying versioning for
alias to the testcase in PR107323.  Instead PR94125 is fixed by
appropriately considering to be merged SCCs when skipping edges
we want to ignore because of the alias versioning.

PR tree-optimization/107323
* tree-loop-distribution.cc (pg_unmark_merged_alias_ddrs):
New function.
(loop_distribution::break_alias_scc_partitions): Revert
postorder save/restore from the PR94125 fix.  Instead
make sure to not ignore edges from SCCs we are going to
merge.

* gcc.dg/tree-ssa/pr107323.c: New testcase.

(cherry picked from commit 09f9814dc02c161ed78604c6df70b19b596f7524)

2 years agoDaily bump.
GCC Administrator [Fri, 21 Oct 2022 00:22:01 +0000 (00:22 +0000)] 
Daily bump.

2 years agoMake 'c-c++-common/goacc/kernels-decompose-pr100400-1-*.c' behave consistently, regar...
Thomas Schwinge [Mon, 2 May 2022 13:15:26 +0000 (15:15 +0200)] 
Make 'c-c++-common/goacc/kernels-decompose-pr100400-1-*.c' behave consistently, regardless of checking level

Fix-up for commit c14ea6a72fb1ae66e3d32ac8329558497c6e4403
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]".

For C++ compilation of 'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c',
we first emit a 'sorry' diagnostic, and then a 'gcc_unreachable' (or
'internal_error', see below) diagnostic, but for example, for
'--enable-checking=release' (thus, '!CHECKING_P'), the second one may actually
be turned into a 'confused by earlier errors, bailing out' diagnostic.  (See
'gcc/diagnostic.cc:diagnostic_report_diagnostic': "When not checking, ICEs are
converted to fatal errors when an error has already occurred.")  Thus, make
'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c' behave consistently via
'-Wfatal-errors', and thus only matching the 'sorry' diagnostic.

For example, for '--enable-checking=no' (thus, '!ENABLE_ASSERT_CHECKING'), a
call to 'gcc_unreachable' cannot be assumed emit an 'internal_error'-like
diagnostic, so explicitly call 'internal_error' in
'gcc/omp-oacc-kernels-decompose.cc:visit_loops_in_gang_single_region', in the
'GIMPLE_OMP_FOR' case, to avoid regressing
'c-c++-common/goacc/kernels-decompose-pr100400-1-3.c', and
'c-c++-common/goacc/kernels-decompose-pr100400-1-4.c'.

PR middle-end/100400
gcc/
* omp-oacc-kernels-decompose.cc
(visit_loops_in_gang_single_region) <GIMPLE_OMP_FOR>: Explicitly
call 'internal_error'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Specify
'-Wfatal-errors'.

(cherry picked from commit da6305558bab9e24943848e4fc5bd8738d7e8f9b)

2 years agoaarch64: Prevent generation of /M BRKAS and BRKBS
Richard Sandiford [Thu, 20 Oct 2022 14:34:09 +0000 (15:34 +0100)] 
aarch64: Prevent generation of /M BRKAS and BRKBS

Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS.  Those
instructions only support zero predication (although
BRKA and BRKB support both).

gcc/
* config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove
merging alternative.
(*aarch64_brk<brk_op>_ptest): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate
PTEST instruction.
* gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise.

(cherry picked from commit 57675c7f92a3bd3ca8dae1faac7f2f51d40e0f9e)

2 years agoaarch64: Fix matching of BRKNS
Richard Sandiford [Thu, 20 Oct 2022 14:34:09 +0000 (15:34 +0100)] 
aarch64: Fix matching of BRKNS

Unlike other flag-setting SVE instructions, BRKNS sets the flags
based on an all-true governing predicate, rather than the GP operand.

gcc/
* config/aarch64/iterators.md (SVE_BRKP): New iterator.
* config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern.
(*aarch64_brkn_ptest): Likewise.
(*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP.
(*aarch64_brk<brk_op>_ptest): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate
PTEST instructions.
* gcc.target/aarch64/sve/acle/general/brkn_2.c: New test.

(cherry picked from commit 6bec66640597e2604f51fc1642c7d279164cd442)

2 years agoaarch64: Define __ARM_FEATURE_RCPC
Richard Sandiford [Thu, 20 Oct 2022 14:34:08 +0000 (15:34 +0100)] 
aarch64: Define __ARM_FEATURE_RCPC

https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly.  This patch
adds the associated support to GCC.

Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it.  This was probably harmless in practice
since GCC simply ignored the extension until now.  (The GAS
definition is OK.)

gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.

gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_1.c: Add RCPC tests.

2 years agolibgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory=
Tobias Burnus [Thu, 20 Oct 2022 11:25:25 +0000 (13:25 +0200)] 
libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory=

The USM implementation uses -foffload-memory=... which allocates variables
in a special memory. This does not support static variables. Hence, XFAIL
this test on nvptx/gcn. The requires-4a.c testcase tests the same but uses
hash memory instead.

libgomp/
* testsuite/libgomp.c-c++-common/requires-4.c: dg-xfail-run-if on
nvptx and gcn.

2 years agolibgomp: Add offload_device_gcn check, add requires-4a.c test
Tobias Burnus [Thu, 20 Oct 2022 11:07:37 +0000 (13:07 +0200)] 
libgomp: Add offload_device_gcn check, add requires-4a.c test

Duplicate libgomp.c-c++-common/requires-4.c (as ...-4a.c) but
with using a heap-allocated instead of static memory for a variable.

This change and the added offload_device_gcn check prepare for
pseudo-USM, where the device hardware cannot access all host
memory but only managed and pinned memory; for those, requires-4.c
will fail and the new check permits to add
  target { ! { offload_device_nvptx || offload_device_gcn } }
to requires-4.c; however, it has not been added yet as pseuo-USM
support is not yet on mainline. (Review is pending for the USM
patches.)

include/ChangeLog:

* gomp-constants.h (GOMP_DEVICE_HSA): Comment out unused define.

libgomp/ChangeLog:

* testsuite/lib/libgomp.exp (check_effective_target_offload_device_gcn):
New.
* testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_gcn,
on_device_arch_gcn): New.
* testsuite/libgomp.c-c++-common/requires-4a.c: New test; copied from
requires-4.c but using heap-allocated memory.

(cherry picked from commit 12d9f5afbd2660862045acd41cb65a77e35bea4d)

2 years agoDaily bump.
GCC Administrator [Thu, 20 Oct 2022 00:23:39 +0000 (00:23 +0000)] 
Daily bump.

2 years agolibstdc++: eh_globals: gthreads: reset _S_init before deleting key
Alexandre Oliva [Wed, 22 Jun 2022 02:11:02 +0000 (23:11 -0300)] 
libstdc++: eh_globals: gthreads: reset _S_init before deleting key

Clear __eh_globals_init's _S_init in the dtor before deleting the
gthread key.

This ensures that, in case any code involved in deleting the key
interacts with eh_globals, the key that is being deleted won't be
used, and the non-thread-specific eh_globals fallback will.

for  libstdc++-v3/ChangeLog

* libsupc++/eh_globals.cc [!_GLIBCXX_HAVE_TLS]
(__eh_globals_init::~__eh_globals_init): Clear _S_init first.

(cherry picked from commit a33dda016e5acf9c6325ce8a72a1b0238130374e)

2 years agoFix omp-expand.cc's expand_omp_target for OpenACC
Tobias Burnus [Wed, 19 Oct 2022 15:31:14 +0000 (17:31 +0200)] 
Fix omp-expand.cc's expand_omp_target for OpenACC

In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c,
"Fortran/OpenMP: Support mapping of DT with allocatable components"
the size of the addr/sizes/kind arrays was passed as 4th argument.
However, OpenACC uses >3 arguments for its own purpose, e.g. to
handle noncontiguous arrays by passing an array descriptor there.

This patch restores the previous behaviour for OpenACC, fixing
testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c.

gcc/
* omp-expand.cc (expand_omp_target): Fix OpenACC in case there
are more than 3 arguments to the builtin function.

2 years agoChangeLog for "Fortran: Fix delinearization regression"
Tobias Burnus [Wed, 19 Oct 2022 15:26:34 +0000 (17:26 +0200)] 
ChangeLog for "Fortran: Fix delinearization regression"

Missed to update gcc/fortran/ChangeLog.omp and to include the
following in previous commit, i.e.
commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be.

gcc/fortran/ChangeLog:

* trans-array.cc (non_negative_strides_array_p): Fix handling
of GFC_DECL_SAVED_DESCRIPTOR.
(gfc_conv_array_ref): Use ARRAY_REF again when possible.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version,
update one scan-tree item.
* gfortran.dg/gomp/depend-4.f90: Revert to upstream version.
* gfortran.dg/gomp/depend-5.f90: Likewise.
* gfortran.dg/gomp/depend-6.f90: Likewise.

2 years agoFortran: Fix delinearization regression
Tobias Burnus [Wed, 19 Oct 2022 13:53:25 +0000 (15:53 +0200)] 
Fortran: Fix delinearization regression

The delinearization patch "Fortran: delinearize multi-dimensional array
accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses
gfc_build_array_ref for the non-delinearization path. The generated
code depends on whether there can be negative strides or not, an
addition to that function in r12-8230-g7964ab6c364 - adding a Boolean
argument.

The follow-up OG12 commit "Fix Fortran array-access regressions",
9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument
to the call in gfc_conv_array_ref, but always evaluating as false.

This commit changes it to a call to non_negative_strides_array_p
(Note: for 'se->expr' not 'base'; the former could be 'arraydesc'
while the later is then 'arraydesc.data' whose TREE_TYPE does not
contain information about the array type.)

However, doing so revealed a bug in non_negative_strides_array_p,
fixed in this commit but also submitted as "Fortran: Fix
non_negative_strides_array_p" to mainline,
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html

As a side effect of this commit, several testcases now pass and the
OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90
could be undone, except that the latter now uses the delinearized
array syntax in one case, which is an improvement (as honored in
the scan-dump-tree). Hence, this commit (partially) reverts the
commits:

21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump
014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90}
2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update
d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update

The main testcase for non_negative_strides_array_p is
gfortran.dg/array_reference_3.f90, which now also passes as well.

Additionally, this changes prevents some unintended implicit
mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed
before - and now passes again.

2 years agoRemove undefined behaviour from testscase.
Andrew MacLeod [Tue, 27 Sep 2022 22:42:33 +0000 (18:42 -0400)] 
Remove undefined behaviour from testscase.

There was a patch posted to remove the undefined behaviour from this
testcase, but it appear to never have been applied.

gcc/teststuite/
PR tree-optimization/102892
* gcc.dg/pr102892-1.c: Remove undefined behaviour.

2 years agors6000: Fix the condition with frame_pointer_needed_indeed [PR96072]
Kewen Lin [Mon, 26 Sep 2022 05:33:18 +0000 (00:33 -0500)] 
rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously.  That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981.  It
caused ICE due to unexpected NULL insn.

PR target/96072

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr96072.c: New test.

(cherry picked from commit 5be0950d22209f5ba69d244387228e12389a8470)

2 years agors6000: Fix condition of define_expand vec_shr_<mode> [PR100645]
Kewen Lin [Mon, 26 Sep 2022 03:01:50 +0000 (22:01 -0500)] 
rs6000: Fix condition of define_expand vec_shr_<mode> [PR100645]

PR100645 exposes one latent bug in define_expand vec_shr_<mode>
that the current condition TARGET_ALTIVEC is too loose.  The
mode iterator VEC_L contains a few modes, they are not always
supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
be used like some other VEC_L usages.

PR target/100645

gcc/ChangeLog:

* config/rs6000/vector.md (vec_shr_<mode>): Replace condition
TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr100645.c: New test.

(cherry picked from commit bfad7069b74c97000b698191c1945f07a6192db5)

2 years agoDaily bump.
GCC Administrator [Wed, 19 Oct 2022 00:22:58 +0000 (00:22 +0000)] 
Daily bump.

2 years agoMerge branch 'releases/gcc-12' into devel/omp/gcc-12
Tobias Burnus [Tue, 18 Oct 2022 08:00:17 +0000 (10:00 +0200)] 
Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8843-g912bdd5cfb92f6dd58accd755ad14f47c0df619e (18th Oct 2022)

2 years agoDaily bump.
GCC Administrator [Tue, 18 Oct 2022 00:21:01 +0000 (00:21 +0000)] 
Daily bump.

2 years agoFix register count when not splitting Complex IEEE 128-bit args.
Pat Haugen [Tue, 17 May 2022 20:53:24 +0000 (15:53 -0500)] 
Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.

PR target/99685

gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.

(cherry picked from commit 2ee68beee709e48fce85b8892ff9985acc6a91a8)

2 years agoFortran: Fixes for kind=4 characters strings [PR107266]
Tobias Burnus [Mon, 17 Oct 2022 15:00:20 +0000 (17:00 +0200)] 
Fortran: Fixes for kind=4 characters strings [PR107266]

PR fortran/107266

gcc/fortran/
* trans-expr.cc (gfc_conv_string_parameter): Use passed
type to honor character kind.
* trans-types.cc (gfc_sym_type): Honor character kind.
* trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4
character strings.

gcc/testsuite/
* gfortran.dg/char4_decl.f90: New test.
* gfortran.dg/char4_decl-2.f90: New test.

(cherry picked from commit c610cf20ebb3444ef4224d789aca670a12f5da40)

2 years agolibgomp: Add Fortran testcases for omp_in_explicit_task
Tobias Burnus [Mon, 17 Oct 2022 14:58:21 +0000 (16:58 +0200)] 
libgomp: Add Fortran testcases for omp_in_explicit_task

Fortranized testcases of commits r13-3257-ga58a965eb73
and r13-3258-g0ec4e93fb9f.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/task-7.f90: New test.
* testsuite/libgomp.fortran/task-8.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
* testsuite/libgomp.fortran/task-reduction-17.f90: New test.
* testsuite/libgomp.fortran/task-reduction-18.f90: New test.

(cherry picked from commit ab8477af9949a7e6fcaf89c5f1dcf32788accf88)

2 years agolibgomp: Fix up OpenMP 5.2 feature bullet
Jakub Jelinek [Mon, 17 Oct 2022 14:57:09 +0000 (16:57 +0200)] 
libgomp: Fix up OpenMP 5.2 feature bullet

The previous bullet correctly mentions 5.2 added for Fortran
allocators directive which is a replacement of allocate directive
associated with ALLOCATE statement to differentiate it at parse time
from allocate directive as declarative one not associated with ALLOCATE
statement, but the deprecation bullet talks about non-existing allocator
directive.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

* libgomp.texi (OpenMP 5.2): Fix up allocator -> allocate directive
in deprecation bullet.

(cherry picked from commit caf9db5a7f99fae8b6088328b9b48ee79fa5e5f0)

2 years agolibgomp: Add omp_in_explicit_task support
Jakub Jelinek [Mon, 17 Oct 2022 14:56:33 +0000 (16:56 +0200)] 
libgomp: Add omp_in_explicit_task support

This is pretty straightforward, if gomp_thread ()->task is NULL,
it can't be explicit task, otherwise if
gomp_thread ()->task->kind == GOMP_TASK_IMPLICIT, it is an implicit
task, otherwise explicit task.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

* omp.h.in (omp_in_explicit_task): Declare.
* omp_lib.h.in (omp_in_explicit_task): Likewise.
* omp_lib.f90.in (omp_in_explicit_task): New interface.
* libgomp.map (OMP_5.2): New symbol version, export
omp_in_explicit_task and omp_in_explicit_task_.
* task.c (omp_in_explicit_task): New function.
* fortran.c (omp_in_explicit_task): Add ialias_redirect.
(omp_in_explicit_task_): New function.
* libgomp.texi (OpenMP 5.2): Mark omp_in_explicit_task as implemented.
* testsuite/libgomp.c-c++-common/task-in-explicit-1.c: New test.
* testsuite/libgomp.c-c++-common/task-in-explicit-2.c: New test.
* testsuite/libgomp.c-c++-common/task-in-explicit-3.c: New test.

(cherry picked from commit 0ec4e93fb9fa5e9d2424683c5fab1310c8ae2f76)

2 years agolibgomp: Fix up creation of artificial teams
Jakub Jelinek [Mon, 17 Oct 2022 14:55:46 +0000 (16:55 +0200)] 
libgomp: Fix up creation of artificial teams

When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes).  In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s).  The code would destroy/free the previous
task and replace it with the new implicit task.  If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.

(cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)

2 years agotree-optimization/107254 - check and support live lanes from permutes
Richard Biener [Fri, 14 Oct 2022 09:14:59 +0000 (11:14 +0200)] 
tree-optimization/107254 - check and support live lanes from permutes

The following fixes an omission from adding SLP permute nodes which
is live lanes originating from those.  We have to check that we
can extract the lane and have to actually code generate them.

PR tree-optimization/107254
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
For permutes also analyze live lanes.
(vect_schedule_slp_node): For permutes also code generate
live lane extracts.

* gfortran.dg/vect/pr107254.f90: New testcase.

(cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)

2 years agotree-optimization/107212 - SLP reduction of reduction paths
Richard Biener [Tue, 11 Oct 2022 09:34:55 +0000 (11:34 +0200)] 
tree-optimization/107212 - SLP reduction of reduction paths

The following fixes an issue with how we handle epilogue generation
for SLP reductions of reduction paths where the actual live lanes
are not "canonical".  We need to make sure to identify all live
lanes as reductions and thus have to iterate over all participating
SLP lanes when walking the reduction SSA use-def chain.  Also the
previous attempt likely to mitigate such issue in
vectorizable_live_operation is misguided and has to be removed.

PR tree-optimization/107212
* tree-vect-loop.cc (vectorizable_reduction): Make sure to
set STMT_VINFO_REDUC_DEF for all live lanes in a SLP
reduction.
(vectorizable_live_operation): Do not pun to the SLP
node representative for reduction epilogue generation.

* gcc.dg/vect/pr107212-1.c: New testcase.
* gcc.dg/vect/pr107212-2.c: Likewise.

(cherry picked from commit ee467644c53ee2f7d633a8e1f53603feafab4351)

2 years agotree-optimization/107160 - avoid reusing multiple accumulators
Richard Biener [Thu, 13 Oct 2022 12:24:05 +0000 (14:24 +0200)] 
tree-optimization/107160 - avoid reusing multiple accumulators

Epilogue vectorization is not set up to re-use a vectorized
accumulator consisting of more than one vector.  For non-SLP
we always reduce to a single but for SLP that isn't happening.
In such case we currenlty miscompile the epilog so avoid this.

PR tree-optimization/107160
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Do not register accumulator if we failed to reduce it
to a single vector.

* gcc.dg/vect/pr107160.c: New testcase.

(cherry picked from commit 5cbaf84c191b9a3e3cb26545c808d208bdbf2ab5)

2 years agotree-optimization/107107 - tail-merging VN wrong-code
Richard Biener [Thu, 6 Oct 2022 09:20:16 +0000 (11:20 +0200)] 
tree-optimization/107107 - tail-merging VN wrong-code

The following fixes an unintended(?) side-effect of the special
MODIFY_EXPR expression entries we add for tail-merging during VN.
We shouldn't value-number the virtual operand differently here.

PR tree-optimization/107107
* tree-ssa-sccvn.cc (visit_reference_op_store): Do not
affect value-numbering when doing the tail merging
MODIFY_EXPR lookup.

* gcc.dg/pr107107.c: New testcase.

(cherry picked from commit 85333b9265720fc4e49397301cb16324d2b89aa7)

2 years agotree-optimization/106922 - extend same-val clobber FRE
Richard Biener [Fri, 23 Sep 2022 12:28:52 +0000 (14:28 +0200)] 
tree-optimization/106922 - extend same-val clobber FRE

The following extends the skipping of same valued stores to
handle an arbitrary number of them as long as they are from the
same value (which we now record).  That's an obvious extension
which allows to optimize the m_engaged member of std::optional
more reliably.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow
an arbitrary number of same valued skipped stores.

* g++.dg/torture/pr106922.C: New testcase.

(cherry picked from commit af611afe5fcc908a6678b5b205fb5af7d64fbcb2)

2 years agotestsuite: Fix up pr106922.C test
Jakub Jelinek [Fri, 23 Sep 2022 07:46:59 +0000 (09:46 +0200)] 
testsuite: Fix up pr106922.C test

On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote:
>       * g++.dg/tree-ssa/pr106922.C: Adjust.

> --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> @@ -87,5 +87,4 @@ void testfunctionfoo() {
>    }
>  }
>
> -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { xfail { ! lp64 } } } }
> -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } } } }
> +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }

I've noticed
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not dce3 "m_initialized"
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not dce3 "m_initialized"
with this change, both on x86_64 and i686.
The dump is still cddce3, additionally as the last reference to the pre
dump is gone, not sure it is worth creating that dump.

With the following patch, there aren't FAILs nor UNRESOLVED tests with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"

2022-09-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/106922
* g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
dce3.  Remove -fdump-tree-pre-details from dg-options.

(cherry picked from commit a0de11d0d22054b6fd76a0730a3ec807542379d0)

2 years agotree-optimization/106922 - missed FRE/PRE
Richard Biener [Wed, 21 Sep 2022 11:52:56 +0000 (13:52 +0200)] 
tree-optimization/106922 - missed FRE/PRE

The following enhances the store-with-same-value trick in
vn_reference_lookup_3 by not only looking for

  a = val;
  *ptr = val;
  .. = a;

but also

  *ptr = val;
  other = x;
  .. = a;

where the earlier store is more than one hop away.  It does this
by queueing the actual value to compare until after the walk but
as disadvantage only allows a single such skipped store from a
constant value.

Unfortunately we cannot handle defs from non-constants this way
since we're prone to pick up values from the past loop iteration
this way and we have no good way to identify values that are
invariant in the currently iterated cycle.  That's why we keep
the single-hop lookup for those cases.  gcc.dg/tree-ssa/pr87126.c
would be a testcase that's un-XFAILed when we'd handle those
as well.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member.
(vn_walk_cb_data::finish): Perform delayed verification of
a skipped may-alias.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise.
(vn_reference_lookup_3): When skipping stores of the same
value also handle constant stores that are more than a
single VDEF away by delaying the verification.

* gcc.dg/tree-ssa/ssa-fre-100.c: New testcase.
* g++.dg/tree-ssa/pr106922.C: Adjust.

(cherry picked from commit 9baee6181b4e427e0b5ba417e51424c15858dce7)

2 years agoGCN: Restore build with GCC 4.8
Thomas Schwinge [Fri, 14 Oct 2022 22:10:29 +0000 (00:10 +0200)] 
GCN: Restore build with GCC 4.8

For example, for "g++-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4", the recent
commit r13-3220-g45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07
"amdgcn: add multiple vector sizes" broke the build:

    In file included from [...]/source-gcc/gcc/coretypes.h:458:0,
                     from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
    [...]/source-gcc/gcc/config/gcn/gcn.cc: In function ‘machine_mode VnMODE(int, machine_mode)’:
    ./insn-modes.h:42:71: error: temporary of non-literal type ‘scalar_int_mode’ in a constant expression
     #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
                                                                           ^
    [...]/source-gcc/gcc/config/gcn/gcn.cc:405:10: note: in expansion of macro ‘QImode’
         case QImode:
              ^
    In file included from [...]/source-gcc/gcc/coretypes.h:478:0,
                     from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
    [...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not literal because:
     class scalar_int_mode
           ^
    [...]/source-gcc/gcc/machmode.h:410:7: note:   ‘scalar_int_mode’ is not an aggregate, does not have a trivial default constructor, and has no constexpr constructor that is not a copy or move constructor
    [...]

Addressing this like simiar issues have been addressed in the past.

gcc/
* config/gcn/gcn.cc (VnMODE): Use 'case E_QImode:' instead of
'case QImode:', etc.

(cherry picked from commit 612de72b0d2904b5a5a2b487ce4cb907c768a947)

2 years agoFix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c'
Thomas Schwinge [Fri, 23 Sep 2022 09:29:50 +0000 (11:29 +0200)] 
Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c'

That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

libgomp/
* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
'-foffload-options' syntax.

(cherry picked from commit b61796663ba1fe8fb83203829398f3f89ec212b7)

2 years agoDaily bump.
GCC Administrator [Mon, 17 Oct 2022 00:20:47 +0000 (00:20 +0000)] 
Daily bump.

2 years agoDaily bump.
GCC Administrator [Sun, 16 Oct 2022 00:19:58 +0000 (00:19 +0000)] 
Daily bump.

2 years agoDaily bump.
GCC Administrator [Sat, 15 Oct 2022 00:21:34 +0000 (00:21 +0000)] 
Daily bump.

2 years ago[og12] OpenACC: Don't gang-privatize artificial variables
Julian Brown [Wed, 12 Oct 2022 20:44:57 +0000 (20:44 +0000)] 
[og12] OpenACC: Don't gang-privatize artificial variables

This patch prevents compiler-generated artificial variables from being
treated as privatization candidates for OpenACC.

The rationale is that e.g. "gang-private" variables actually must be
shared by each worker and vector spawned within a particular gang, but
that sharing is not necessary for any compiler-generated variable (at
least at present, but no such need is anticipated either).  Variables on
the stack (and machine registers) are already private per-"thread"
(gang, worker and/or vector), and that's fine for artificial variables.

Several tests need their scan output patterns adjusted to compensate.

2022-10-14  Julian Brown  <julian@codesourcery.com>

gcc/
* omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not
privatization candidates.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
* testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/print-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.

2 years ago[og12] amdgcn: Use FLAT addressing for all functions with pointer arguments
Julian Brown [Fri, 14 Oct 2022 11:06:07 +0000 (11:06 +0000)] 
[og12] amdgcn: Use FLAT addressing for all functions with pointer arguments

The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.

I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong.  For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables.  That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.

2022-10-14  Julian Brown  <julian@codesourcery.com>

gcc/
* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
argument forces FLAT addressing mode, not just
pointer-to-non-aggregate.

2 years agoFix PR target/107248
Eric Botcazou [Fri, 14 Oct 2022 09:52:04 +0000 (11:52 +0200)] 
Fix PR target/107248

This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later...  Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.

gcc/
PR target/107248
* config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.

2 years agoDaily bump.
GCC Administrator [Fri, 14 Oct 2022 00:20:52 +0000 (00:20 +0000)] 
Daily bump.

2 years agoc++: ICE with VEC_INIT_EXPR and defarg [PR106925]
Marek Polacek [Tue, 11 Oct 2022 18:16:54 +0000 (14:16 -0400)] 
c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test.  At this point
start_preparsed_function hasn't yet set current_function_decl.
expand_vec_init_expr then leads to maybe_splice_retval_cleanup which
checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that
c_f_d is non-null first.  It seems correct that c_f_d is null here, so
it seems to me that maybe_splice_retval_cleanup should check c_f_d as
in the following patch.

PR c++/106925

gcc/cp/ChangeLog:

* except.cc (maybe_splice_retval_cleanup): Check current_function_decl.
Make the bool const.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-defarg3.C: New test.

(cherry picked from commit 3130e70dab1e64a7b014391fe941090d5f3b6b7d)

2 years agoinstall.texi: gcn - update llvm reqirements, gcn/nvptx - newlib use version
Tobias Burnus [Tue, 4 Oct 2022 09:49:18 +0000 (11:49 +0200)] 
install.texi: gcn - update llvm reqirements, gcn/nvptx - newlib use version

gcc/
* doc/install.texi (Specific): Add missing items to bullet list.
(amdgcn): Update LLVM requirements, use version not date for newlib.
(nvptx): Use version not git hash for newlib.

(cherry picked from commit e886ebd17965d78f609b62479f4f48085108389c)

2 years agoDaily bump.
GCC Administrator [Thu, 13 Oct 2022 00:22:46 +0000 (00:22 +0000)] 
Daily bump.

2 years agofortran: Move clobbers after evaluation of all arguments [PR106817]
Mikael Morin [Sat, 3 Sep 2022 09:58:47 +0000 (11:58 +0200)] 
fortran: Move clobbers after evaluation of all arguments [PR106817]

For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call.  This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.

With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.

PR fortran/106817

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Collect all clobbers
to their own separate block.  Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_4.f90: New test.

(cherry picked from commit 29919bf3b6449bafd02e795abbb1966e3990c1fc)

2 years agofortran: Fix invalid function decl clobber ICE [PR105012]
Mikael Morin [Mon, 29 Aug 2022 09:19:29 +0000 (11:19 +0200)] 
fortran: Fix invalid function decl clobber ICE [PR105012]

The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself.  This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable.  This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.

PR fortran/105012

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_15.f90: New test.

(cherry picked from commit edaf1e005c90b311c39b46d85cea17befbece112)

2 years agofortran: Move the clobber generation code
Mikael Morin [Wed, 31 Aug 2022 09:00:45 +0000 (11:00 +0200)] 
fortran: Move the clobber generation code

This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.

What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation.  Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.

Behaviour remains unchanged.

gcc/fortran/ChangeLog:

* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.cc (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.

(cherry picked from commit 2b393f6f83903cb836676bbd042c1b99a6e7e6f7)

2 years ago[OG12] amdgcn: Fixup "Add builtin for vectorized DFmode fabs operation"
Andrew Stubbs [Tue, 11 Oct 2022 14:14:41 +0000 (15:14 +0100)] 
[OG12] amdgcn: Fixup "Add builtin for vectorized DFmode fabs operation"

The function was taken away by the "add multiple vector sizes" patch.

2022-10-11  Andrew Stubbs  <ams@codesourcery.com>

gcc/
* config/gcn/gcn.cc (gcn_expand_builtin_1): Change gcn_full_exec_reg
to get_exec.

2 years agoamdgcn: vector testsuite tweaks
Andrew Stubbs [Sat, 10 Sep 2022 22:47:19 +0000 (23:47 +0100)] 
amdgcn: vector testsuite tweaks

The testsuite needs a few tweaks following my patches to add multiple vector
sizes for amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/pr104464.c: Xfail on amdgcn.
* gcc.dg/signbit-2.c: Likewise.
* gcc.dg/signbit-5.c: Likewise.
* gcc.dg/vect/bb-slp-68.c: Likewise.
* gcc.dg/vect/bb-slp-cond-1.c: Change expectations on amdgcn.
* gcc.dg/vect/bb-slp-subgroups-3.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Change expectations for multiple
vector sizes.
* gcc.dg/vect/pr33953.c: Likewise.
* gcc.dg/vect/pr65947-12.c: Likewise.
* gcc.dg/vect/pr65947-13.c: Likewise.
* gcc.dg/vect/pr80631-2.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/trapv-vect-reduc-4.c: Likewise.
* lib/target-supports.exp (available_vector_sizes): Add more sizes
for amdgcn.

2 years agoamdgcn: Add vector integer negate insn
Andrew Stubbs [Thu, 22 Sep 2022 11:48:30 +0000 (12:48 +0100)] 
amdgcn: Add vector integer negate insn

Another example of the vectorizer needing explicit insns where the scalar
expander just works.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (neg<mode>2): New define_expand.

2 years agoamdgcn: vec_init for multiple vector sizes
Andrew Stubbs [Wed, 11 Mar 2020 16:39:54 +0000 (16:39 +0000)] 
amdgcn: vec_init for multiple vector sizes

Implements vec_init when the input is a vector of smaller vectors, or of
vector MEM types, or a smaller vector duplicated several times.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (vec_init<V_ALL:mode><V_ALL_ALT:mode>): New.
* config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3.
(GEN_VNM): Add gathervNm_expr.
(GEN_VN_NOEXEC): Add vec_seriesvNsi.
(gcn_expand_vector_init): Add initialization of vectors from smaller
vectors.

2 years agoamdgcn: Add vec_extract for partial vectors
Andrew Stubbs [Mon, 29 Jun 2020 14:20:09 +0000 (15:20 +0100)] 
amdgcn: Add vec_extract for partial vectors

Add vec_extract expanders for all valid pairs of vector types.

gcc/ChangeLog:

* config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants.
* config/gcn/gcn-valu.md
(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): New define_expand.
* config/gcn/gcn.cc (get_exec): Export the existing function. Add a
new overload variant.

2 years agoamdgcn: Resolve insn conditions at compile time
Andrew Stubbs [Thu, 26 Mar 2020 21:22:45 +0000 (21:22 +0000)] 
amdgcn: Resolve insn conditions at compile time

GET_MODE_NUNITS isn't a compile time constant, so we end up with many
impossible insns in the machine description.  Adding MODE_VF allows the insns
to be eliminated completely.

gcc/ChangeLog:

* config/gcn/gcn-valu.md
(<cvt_name><VCVT_MODE:mode><VCVT_FMODE:mode>2<exec>): Use MODE_VF.
(<cvt_name><VCVT_FMODE:mode><VCVT_IMODE:mode>2<exec>): Likewise.
* config/gcn/gcn.h (MODE_VF): New macro.

2 years agoamdgcn: add multiple vector sizes
Andrew Stubbs [Mon, 3 Aug 2020 20:09:36 +0000 (21:09 +0100)] 
amdgcn: add multiple vector sizes

The vectors sizes are simulated using implicit masking, but they make life
easier for the autovectorizer and SLP passes.

gcc/ChangeLog:

* config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes
V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF,
V16QI, V16HI, V16SI, V16DI, V16TI, V16HF, V16SF, V16DF,
V8QI, V8HI, V8SI, V8DI, V8TI, V8HF, V8SF, V8DF,
V4QI, V4HI, V4SI, V4DI, V4TI, V4HF, V4SF, V4DF,
V2QI, V2HI, V2SI, V2DI, V2TI, V2HF, V2SF, V2DF.
(ADJUST_ALIGNMENT): Likewise.
* config/gcn/gcn-protos.h (gcn_full_exec): Delete.
(gcn_full_exec_reg): Delete.
(gcn_scalar_exec): Delete.
(gcn_scalar_exec_reg): Delete.
(vgpr_1reg_mode_p): Use inner mode to identify vector registers.
(vgpr_2reg_mode_p): Likewise.
(vgpr_vector_mode_p): Use VECTOR_MODE_P.
* config/gcn/gcn-valu.md (V_QI, V_HI, V_HF, V_SI, V_SF, V_DI, V_DF,
V_QIHI, V_1REG, V_INT_1REG, V_INT_1REG_ALT, V_FP_1REG, V_2REG, V_noQI,
V_noHI, V_INT_noQI, V_INT_noHI, V_ALL, V_ALL_ALT, V_INT, V_FP):
Add additional vector modes.
(V64_SI, V64_DI, V64_ALL, V64_FP): New iterators.
(scalar_mode, SCALAR_MODE, vnsi, VnSI, vndi, VnDI, sdwa):
Add additional vector mode mappings.
(mov<mode>): Implement vector length conversions.
(ldexp<mode>3<exec>): Use VnSI.
(frexp<mode>_exp2<exec>): Likewise.
(VCVT_MODE, VCVT_FMODE, VCVT_IMODE): Add additional vector modes.
(reduc_<reduc_op>_scal_<mode>): Use V64_ALL.
(fold_left_plus_<mode>): Use V64_FP.
(*<reduc_op>_dpp_shr_<mode>): Use V64_1REG.
(*<reduc_op>_dpp_shr_<mode>): Use V64_DI.
(*plus_carry_dpp_shr_<mode>): Use V64_INT_1REG.
(*plus_carry_in_dpp_shr_<mode>): Use V64_SI.
(*plus_carry_dpp_shr_<mode>): Use V64_DI.
(mov_from_lane63_<mode>): Use V64_2REG.
* config/gcn/gcn.cc (VnMODE): New function.
(gcn_can_change_mode_class): Support multiple vector sizes.
(gcn_modes_tieable_p): Likewise.
(gcn_operand_part): Likewise.
(gcn_scalar_exec): Delete function.
(gcn_scalar_exec_reg): Delete function.
(gcn_full_exec): Delete function.
(gcn_full_exec_reg): Delete function.
(gcn_inline_fp_constant_p): Support multiple vector sizes.
(gcn_fp_constant_p): Likewise.
(A): New macro.
(GEN_VN_NOEXEC): New macro.
(GEN_VNM_NOEXEC): New macro.
(GEN_VN): New macro.
(GEN_VNM): New macro.
(GET_VN_FN): New macro.
(CODE_FOR): New macro.
(CODE_FOR_OP): New macro.
(gen_mov_with_exec): Delete function.
(gen_duplicate_load): Delete function.
(gcn_expand_vector_init): Support multiple vector sizes.
(strided_constant): Likewise.
(gcn_addr_space_legitimize_address): Likewise.
(gcn_expand_scalar_to_vector_address): Likewise.
(gcn_expand_scaled_offsets): Likewise.
(gcn_secondary_reload): Likewise.
(gcn_valid_cvt_p): Likewise.
(gcn_expand_builtin_1): Likewise.
(gcn_make_vec_perm_address): Likewise.
(gcn_vectorize_vec_perm_const): Likewise.
(gcn_vector_mode_supported_p): Likewise.
(gcn_autovectorize_vector_modes): New hook.
(gcn_related_vector_mode): Support multiple vector sizes.
(gcn_expand_dpp_shr_insn): Add FIXME comment.
(gcn_md_reorg): Support multiple vector sizes.
(print_reg): Likewise.
(print_operand): Likewise.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): New hook.