git.ipfire.org Git - thirdparty/gcc.git/log

Fortran: Fix delinearization regression

The delinearization patch "Fortran: delinearize multi-dimensional array
accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses
gfc_build_array_ref for the non-delinearization path. The generated
code depends on whether there can be negative strides or not, an
addition to that function in r12-8230-g7964ab6c364 - adding a Boolean
argument.

The follow-up OG12 commit "Fix Fortran array-access regressions",
9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument
to the call in gfc_conv_array_ref, but always evaluating as false.

This commit changes it to a call to non_negative_strides_array_p
(Note: for 'se->expr' not 'base'; the former could be 'arraydesc'
while the later is then 'arraydesc.data' whose TREE_TYPE does not
contain information about the array type.)

However, doing so revealed a bug in non_negative_strides_array_p,
fixed in this commit but also submitted as "Fortran: Fix
non_negative_strides_array_p" to mainline,
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html

As a side effect of this commit, several testcases now pass and the
OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90
could be undone, except that the latter now uses the delinearized
array syntax in one case, which is an improvement (as honored in
the scan-dump-tree). Hence, this commit (partially) reverts the
commits:

21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump
014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90}
2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update
d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update

The main testcase for non_negative_strides_array_p is
gfortran.dg/array_reference_3.f90, which now also passes as well.

Additionally, this changes prevents some unintended implicit
mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed
before - and now passes again.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8843-g912bdd5cfb92f6dd58accd755ad14f47c0df619e (18th Oct 2022)

Daily bump.

Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only advancing the
function arg counter by one VSX register. Fixed with this patch.

PR target/99685

gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.

(cherry picked from commit 2ee68beee709e48fce85b8892ff9985acc6a91a8)

Fortran: Fixes for kind=4 characters strings [PR107266]

PR fortran/107266

gcc/fortran/
* trans-expr.cc (gfc_conv_string_parameter): Use passed
type to honor character kind.
* trans-types.cc (gfc_sym_type): Honor character kind.
* trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4
character strings.

gcc/testsuite/
* gfortran.dg/char4_decl.f90: New test.
* gfortran.dg/char4_decl-2.f90: New test.

(cherry picked from commit c610cf20ebb3444ef4224d789aca670a12f5da40)

libgomp: Add Fortran testcases for omp_in_explicit_task

Fortranized testcases of commits r13-3257-ga58a965eb73
and r13-3258-g0ec4e93fb9f.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/task-7.f90: New test.
* testsuite/libgomp.fortran/task-8.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
* testsuite/libgomp.fortran/task-reduction-17.f90: New test.
* testsuite/libgomp.fortran/task-reduction-18.f90: New test.

(cherry picked from commit ab8477af9949a7e6fcaf89c5f1dcf32788accf88)

libgomp: Fix up OpenMP 5.2 feature bullet

The previous bullet correctly mentions 5.2 added for Fortran
allocators directive which is a replacement of allocate directive
associated with ALLOCATE statement to differentiate it at parse time
from allocate directive as declarative one not associated with ALLOCATE
statement, but the deprecation bullet talks about non-existing allocator
directive.

2022-10-12 Jakub Jelinek <jakub@redhat.com>

* libgomp.texi (OpenMP 5.2): Fix up allocator -> allocate directive
in deprecation bullet.

(cherry picked from commit caf9db5a7f99fae8b6088328b9b48ee79fa5e5f0)

libgomp: Add omp_in_explicit_task support

This is pretty straightforward, if gomp_thread ()->task is NULL,
it can't be explicit task, otherwise if
gomp_thread ()->task->kind == GOMP_TASK_IMPLICIT, it is an implicit
task, otherwise explicit task.

2022-10-12 Jakub Jelinek <jakub@redhat.com>

* omp.h.in (omp_in_explicit_task): Declare.
* omp_lib.h.in (omp_in_explicit_task): Likewise.
* omp_lib.f90.in (omp_in_explicit_task): New interface.
* libgomp.map (OMP_5.2): New symbol version, export
omp_in_explicit_task and omp_in_explicit_task_.
* task.c (omp_in_explicit_task): New function.
* fortran.c (omp_in_explicit_task): Add ialias_redirect.
(omp_in_explicit_task_): New function.
* libgomp.texi (OpenMP 5.2): Mark omp_in_explicit_task as implemented.
* testsuite/libgomp.c-c++-common/task-in-explicit-1.c: New test.
* testsuite/libgomp.c-c++-common/task-in-explicit-2.c: New test.
* testsuite/libgomp.c-c++-common/task-in-explicit-3.c: New test.

(cherry picked from commit 0ec4e93fb9fa5e9d2424683c5fab1310c8ae2f76)

libgomp: Fix up creation of artificial teams

When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes).  In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s).  The code would destroy/free the previous
task and replace it with the new implicit task.  If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.

2022-10-12  Jakub Jelinek  <jakub@redhat.com>

* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.

(cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)

tree-optimization/107254 - check and support live lanes from permutes

The following fixes an omission from adding SLP permute nodes which
is live lanes originating from those. We have to check that we
can extract the lane and have to actually code generate them.

PR tree-optimization/107254
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
For permutes also analyze live lanes.
(vect_schedule_slp_node): For permutes also code generate
live lane extracts.

* gfortran.dg/vect/pr107254.f90: New testcase.

(cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)

tree-optimization/107212 - SLP reduction of reduction paths

The following fixes an issue with how we handle epilogue generation
for SLP reductions of reduction paths where the actual live lanes
are not "canonical". We need to make sure to identify all live
lanes as reductions and thus have to iterate over all participating
SLP lanes when walking the reduction SSA use-def chain. Also the
previous attempt likely to mitigate such issue in
vectorizable_live_operation is misguided and has to be removed.

PR tree-optimization/107212
* tree-vect-loop.cc (vectorizable_reduction): Make sure to
set STMT_VINFO_REDUC_DEF for all live lanes in a SLP
reduction.
(vectorizable_live_operation): Do not pun to the SLP
node representative for reduction epilogue generation.

* gcc.dg/vect/pr107212-1.c: New testcase.
* gcc.dg/vect/pr107212-2.c: Likewise.

(cherry picked from commit ee467644c53ee2f7d633a8e1f53603feafab4351)

tree-optimization/107160 - avoid reusing multiple accumulators

Epilogue vectorization is not set up to re-use a vectorized
accumulator consisting of more than one vector. For non-SLP
we always reduce to a single but for SLP that isn't happening.
In such case we currenlty miscompile the epilog so avoid this.

PR tree-optimization/107160
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Do not register accumulator if we failed to reduce it
to a single vector.

* gcc.dg/vect/pr107160.c: New testcase.

(cherry picked from commit 5cbaf84c191b9a3e3cb26545c808d208bdbf2ab5)

tree-optimization/107107 - tail-merging VN wrong-code

The following fixes an unintended(?) side-effect of the special
MODIFY_EXPR expression entries we add for tail-merging during VN.
We shouldn't value-number the virtual operand differently here.

PR tree-optimization/107107
* tree-ssa-sccvn.cc (visit_reference_op_store): Do not
affect value-numbering when doing the tail merging
MODIFY_EXPR lookup.

* gcc.dg/pr107107.c: New testcase.

(cherry picked from commit 85333b9265720fc4e49397301cb16324d2b89aa7)

tree-optimization/106922 - extend same-val clobber FRE

The following extends the skipping of same valued stores to
handle an arbitrary number of them as long as they are from the
same value (which we now record). That's an obvious extension
which allows to optimize the m_engaged member of std::optional
more reliably.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow
an arbitrary number of same valued skipped stores.

* g++.dg/torture/pr106922.C: New testcase.

(cherry picked from commit af611afe5fcc908a6678b5b205fb5af7d64fbcb2)

testsuite: Fix up pr106922.C test

On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote:
>       * g++.dg/tree-ssa/pr106922.C: Adjust.

> --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> @@ -87,5 +87,4 @@ void testfunctionfoo() {
>    }
>  }
>
> -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { xfail { ! lp64 } } } }
> -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } } } }
> +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }

I've noticed
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not dce3 "m_initialized"
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not dce3 "m_initialized"
with this change, both on x86_64 and i686.
The dump is still cddce3, additionally as the last reference to the pre
dump is gone, not sure it is worth creating that dump.

With the following patch, there aren't FAILs nor UNRESOLVED tests with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"

2022-09-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/106922
* g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
dce3.  Remove -fdump-tree-pre-details from dg-options.

(cherry picked from commit a0de11d0d22054b6fd76a0730a3ec807542379d0)

tree-optimization/106922 - missed FRE/PRE

The following enhances the store-with-same-value trick in
vn_reference_lookup_3 by not only looking for

  a = val;
  *ptr = val;
  .. = a;

but also

  *ptr = val;
  other = x;
  .. = a;

where the earlier store is more than one hop away.  It does this
by queueing the actual value to compare until after the walk but
as disadvantage only allows a single such skipped store from a
constant value.

Unfortunately we cannot handle defs from non-constants this way
since we're prone to pick up values from the past loop iteration
this way and we have no good way to identify values that are
invariant in the currently iterated cycle.  That's why we keep
the single-hop lookup for those cases.  gcc.dg/tree-ssa/pr87126.c
would be a testcase that's un-XFAILed when we'd handle those
as well.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member.
(vn_walk_cb_data::finish): Perform delayed verification of
a skipped may-alias.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise.
(vn_reference_lookup_3): When skipping stores of the same
value also handle constant stores that are more than a
single VDEF away by delaying the verification.

* gcc.dg/tree-ssa/ssa-fre-100.c: New testcase.
* g++.dg/tree-ssa/pr106922.C: Adjust.

(cherry picked from commit 9baee6181b4e427e0b5ba417e51424c15858dce7)

GCN: Restore build with GCC 4.8

For example, for "g++-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4", the recent
commit r13-3220-g45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07
"amdgcn: add multiple vector sizes" broke the build:

    In file included from [...]/source-gcc/gcc/coretypes.h:458:0,
                     from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
    [...]/source-gcc/gcc/config/gcn/gcn.cc: In function ‘machine_mode VnMODE(int, machine_mode)’:
    ./insn-modes.h:42:71: error: temporary of non-literal type ‘scalar_int_mode’ in a constant expression
     #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
                                                                           ^
    [...]/source-gcc/gcc/config/gcn/gcn.cc:405:10: note: in expansion of macro ‘QImode’
         case QImode:
              ^
    In file included from [...]/source-gcc/gcc/coretypes.h:478:0,
                     from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
    [...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not literal because:
     class scalar_int_mode
           ^
    [...]/source-gcc/gcc/machmode.h:410:7: note:   ‘scalar_int_mode’ is not an aggregate, does not have a trivial default constructor, and has no constexpr constructor that is not a copy or move constructor
    [...]

Addressing this like simiar issues have been addressed in the past.

gcc/
* config/gcn/gcn.cc (VnMODE): Use 'case E_QImode:' instead of
'case QImode:', etc.

(cherry picked from commit 612de72b0d2904b5a5a2b487ce4cb907c768a947)

Fix nvptx-specific '-foffload-options' syntax in 'libgomp.c/reverse-offload-sm30.c'

That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.

Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".

libgomp/
* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
'-foffload-options' syntax.

(cherry picked from commit b61796663ba1fe8fb83203829398f3f89ec212b7)

Daily bump.

[og12] OpenACC: Don't gang-privatize artificial variables

This patch prevents compiler-generated artificial variables from being
treated as privatization candidates for OpenACC.

The rationale is that e.g. "gang-private" variables actually must be
shared by each worker and vector spawned within a particular gang, but
that sharing is not necessary for any compiler-generated variable (at
least at present, but no such need is anticipated either). Variables on
the stack (and machine registers) are already private per-"thread"
(gang, worker and/or vector), and that's fine for artificial variables.

Several tests need their scan output patterns adjusted to compensate.

2022-10-14 Julian Brown <julian@codesourcery.com>

gcc/
* omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not
privatization candidates.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output.
* testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/print-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.

[og12] amdgcn: Use FLAT addressing for all functions with pointer arguments

The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.

I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong.  For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables.  That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.

2022-10-14  Julian Brown  <julian@codesourcery.com>

gcc/
* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
argument forces FLAT addressing mode, not just
pointer-to-non-aggregate.

Fix PR target/107248

This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later... Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.

gcc/
PR target/107248
* config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.

Daily bump.

c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test. At this point
start_preparsed_function hasn't yet set current_function_decl.
expand_vec_init_expr then leads to maybe_splice_retval_cleanup which
checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that
c_f_d is non-null first. It seems correct that c_f_d is null here, so
it seems to me that maybe_splice_retval_cleanup should check c_f_d as
in the following patch.

PR c++/106925

gcc/cp/ChangeLog:

* except.cc (maybe_splice_retval_cleanup): Check current_function_decl.
Make the bool const.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-defarg3.C: New test.

(cherry picked from commit 3130e70dab1e64a7b014391fe941090d5f3b6b7d)

install.texi: gcn - update llvm reqirements, gcn/nvptx - newlib use version

gcc/
* doc/install.texi (Specific): Add missing items to bullet list.
(amdgcn): Update LLVM requirements, use version not date for newlib.
(nvptx): Use version not git hash for newlib.

(cherry picked from commit e886ebd17965d78f609b62479f4f48085108389c)

Daily bump.

fortran: Move clobbers after evaluation of all arguments [PR106817]

For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call. This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.

With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.

PR fortran/106817

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Collect all clobbers
to their own separate block. Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_4.f90: New test.

(cherry picked from commit 29919bf3b6449bafd02e795abbb1966e3990c1fc)

fortran: Fix invalid function decl clobber ICE [PR105012]

The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself. This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable. This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.

PR fortran/105012

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_15.f90: New test.

(cherry picked from commit edaf1e005c90b311c39b46d85cea17befbece112)

fortran: Move the clobber generation code

This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.

What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation. Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.

Behaviour remains unchanged.

gcc/fortran/ChangeLog:

* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.cc (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.

(cherry picked from commit 2b393f6f83903cb836676bbd042c1b99a6e7e6f7)

[OG12] amdgcn: Fixup "Add builtin for vectorized DFmode fabs operation"

The function was taken away by the "add multiple vector sizes" patch.

2022-10-11 Andrew Stubbs <ams@codesourcery.com>

gcc/
* config/gcn/gcn.cc (gcn_expand_builtin_1): Change gcn_full_exec_reg
to get_exec.

amdgcn: vector testsuite tweaks

The testsuite needs a few tweaks following my patches to add multiple vector
sizes for amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/pr104464.c: Xfail on amdgcn.
* gcc.dg/signbit-2.c: Likewise.
* gcc.dg/signbit-5.c: Likewise.
* gcc.dg/vect/bb-slp-68.c: Likewise.
* gcc.dg/vect/bb-slp-cond-1.c: Change expectations on amdgcn.
* gcc.dg/vect/bb-slp-subgroups-3.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Change expectations for multiple
vector sizes.
* gcc.dg/vect/pr33953.c: Likewise.
* gcc.dg/vect/pr65947-12.c: Likewise.
* gcc.dg/vect/pr65947-13.c: Likewise.
* gcc.dg/vect/pr80631-2.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/trapv-vect-reduc-4.c: Likewise.
* lib/target-supports.exp (available_vector_sizes): Add more sizes
for amdgcn.

amdgcn: Add vector integer negate insn

Another example of the vectorizer needing explicit insns where the scalar
expander just works.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (neg<mode>2): New define_expand.

amdgcn: vec_init for multiple vector sizes

Implements vec_init when the input is a vector of smaller vectors, or of
vector MEM types, or a smaller vector duplicated several times.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (vec_init<V_ALL:mode><V_ALL_ALT:mode>): New.
* config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3.
(GEN_VNM): Add gathervNm_expr.
(GEN_VN_NOEXEC): Add vec_seriesvNsi.
(gcn_expand_vector_init): Add initialization of vectors from smaller
vectors.

amdgcn: Add vec_extract for partial vectors

Add vec_extract expanders for all valid pairs of vector types.

gcc/ChangeLog:

* config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants.
* config/gcn/gcn-valu.md
(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): New define_expand.
* config/gcn/gcn.cc (get_exec): Export the existing function. Add a
new overload variant.

amdgcn: Resolve insn conditions at compile time

GET_MODE_NUNITS isn't a compile time constant, so we end up with many
impossible insns in the machine description. Adding MODE_VF allows the insns
to be eliminated completely.

gcc/ChangeLog:

* config/gcn/gcn-valu.md
(<cvt_name><VCVT_MODE:mode><VCVT_FMODE:mode>2<exec>): Use MODE_VF.
(<cvt_name><VCVT_FMODE:mode><VCVT_IMODE:mode>2<exec>): Likewise.
* config/gcn/gcn.h (MODE_VF): New macro.

amdgcn: add multiple vector sizes

The vectors sizes are simulated using implicit masking, but they make life
easier for the autovectorizer and SLP passes.

gcc/ChangeLog:

* config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes
V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF,
V16QI, V16HI, V16SI, V16DI, V16TI, V16HF, V16SF, V16DF,
V8QI, V8HI, V8SI, V8DI, V8TI, V8HF, V8SF, V8DF,
V4QI, V4HI, V4SI, V4DI, V4TI, V4HF, V4SF, V4DF,
V2QI, V2HI, V2SI, V2DI, V2TI, V2HF, V2SF, V2DF.
(ADJUST_ALIGNMENT): Likewise.
* config/gcn/gcn-protos.h (gcn_full_exec): Delete.
(gcn_full_exec_reg): Delete.
(gcn_scalar_exec): Delete.
(gcn_scalar_exec_reg): Delete.
(vgpr_1reg_mode_p): Use inner mode to identify vector registers.
(vgpr_2reg_mode_p): Likewise.
(vgpr_vector_mode_p): Use VECTOR_MODE_P.
* config/gcn/gcn-valu.md (V_QI, V_HI, V_HF, V_SI, V_SF, V_DI, V_DF,
V_QIHI, V_1REG, V_INT_1REG, V_INT_1REG_ALT, V_FP_1REG, V_2REG, V_noQI,
V_noHI, V_INT_noQI, V_INT_noHI, V_ALL, V_ALL_ALT, V_INT, V_FP):
Add additional vector modes.
(V64_SI, V64_DI, V64_ALL, V64_FP): New iterators.
(scalar_mode, SCALAR_MODE, vnsi, VnSI, vndi, VnDI, sdwa):
Add additional vector mode mappings.
(mov<mode>): Implement vector length conversions.
(ldexp<mode>3<exec>): Use VnSI.
(frexp<mode>_exp2<exec>): Likewise.
(VCVT_MODE, VCVT_FMODE, VCVT_IMODE): Add additional vector modes.
(reduc_<reduc_op>_scal_<mode>): Use V64_ALL.
(fold_left_plus_<mode>): Use V64_FP.
(*<reduc_op>_dpp_shr_<mode>): Use V64_1REG.
(*<reduc_op>_dpp_shr_<mode>): Use V64_DI.
(*plus_carry_dpp_shr_<mode>): Use V64_INT_1REG.
(*plus_carry_in_dpp_shr_<mode>): Use V64_SI.
(*plus_carry_dpp_shr_<mode>): Use V64_DI.
(mov_from_lane63_<mode>): Use V64_2REG.
* config/gcn/gcn.cc (VnMODE): New function.
(gcn_can_change_mode_class): Support multiple vector sizes.
(gcn_modes_tieable_p): Likewise.
(gcn_operand_part): Likewise.
(gcn_scalar_exec): Delete function.
(gcn_scalar_exec_reg): Delete function.
(gcn_full_exec): Delete function.
(gcn_full_exec_reg): Delete function.
(gcn_inline_fp_constant_p): Support multiple vector sizes.
(gcn_fp_constant_p): Likewise.
(A): New macro.
(GEN_VN_NOEXEC): New macro.
(GEN_VNM_NOEXEC): New macro.
(GEN_VN): New macro.
(GEN_VNM): New macro.
(GET_VN_FN): New macro.
(CODE_FOR): New macro.
(CODE_FOR_OP): New macro.
(gen_mov_with_exec): Delete function.
(gen_duplicate_load): Delete function.
(gcn_expand_vector_init): Support multiple vector sizes.
(strided_constant): Likewise.
(gcn_addr_space_legitimize_address): Likewise.
(gcn_expand_scalar_to_vector_address): Likewise.
(gcn_expand_scaled_offsets): Likewise.
(gcn_secondary_reload): Likewise.
(gcn_valid_cvt_p): Likewise.
(gcn_expand_builtin_1): Likewise.
(gcn_make_vec_perm_address): Likewise.
(gcn_vectorize_vec_perm_const): Likewise.
(gcn_vector_mode_supported_p): Likewise.
(gcn_autovectorize_vector_modes): New hook.
(gcn_related_vector_mode): Support multiple vector sizes.
(gcn_expand_dpp_shr_insn): Add FIXME comment.
(gcn_md_reorg): Support multiple vector sizes.
(print_reg): Likewise.
(print_operand): Likewise.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): New hook.

vect: while_ult for integer masks

Add a vector length parameter needed by amdgcn without breaking aarch64.

All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode. (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask). Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..

The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
operand 3.
* doc/md.texi (while_ult): Document new operand 3 usage.
* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
maps to a non-vector mode.

Daily bump.

arm: Fix constant immediates predicates and constraints for some MVE builtins

Several MVE builtins incorrectly use the same predicate/constraint
pair for several modes, which does not match the specification.
This patch uses the appropriate iterator instead.

2022-09-06 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* config/arm/mve.md (mve_vqshluq_n_s<mode>): Use
MVE_pred/MVE_constraint instead of mve_imm_7/Ra.
(mve_vqshluq_m_n_s<mode>): Likewise.
(mve_vqrshrnbq_n_<supf><mode>): Use MVE_pred3/MVE_constraint3
instead of mve_imm_8/Rb.
(mve_vqrshrunbq_n_s<mode>): Likewise.
(mve_vqrshrntq_n_<supf><mode>): Likewise.
(mve_vqrshruntq_n_s<mode>): Likewise.
(mve_vrshrnbq_n_<supf><mode>): Likewise.
(mve_vrshrntq_n_<supf><mode>): Likewise.
(mve_vqrshrnbq_m_n_<supf><mode>): Likewise.
(mve_vqrshrntq_m_n_<supf><mode>): Likewise.
(mve_vrshrnbq_m_n_<supf><mode>): Likewise.
(mve_vrshrntq_m_n_<supf><mode>): Likewise.
(mve_vqrshrunbq_m_n_s<mode>): Likewise.
(mve_vsriq_n_<supf><mode): Use MVE_pred2/MVE_constraint2 instead
of mve_imm_selective_upto_8/Rg.
(mve_vsriq_m_n_<supf><mode>): Likewise.

(cherry-picked from c3fb6658c7670e446f2fd00984404d971e416b3c)

tree-optimization/106934 - avoid BIT_FIELD_REF of bitfields

The following avoids creating BIT_FIELD_REF of bitfields in
update-address-taken. The patch doesn't implement punning to
a full precision integer type but leaves a comment according to
that.

PR tree-optimization/106934
* tree-ssa.cc (non_rewritable_mem_ref_base): Avoid BIT_FIELD_REFs
of bitfields.
(maybe_rewrite_mem_ref_base): Likewise.

* gfortran.dg/pr106934.f90: New testcase.

(cherry picked from commit 05f5c42cb42c5088187d44cc45a5f671d19ad8c5)

tree-optimization/106922 - PRE and virtual operand translation

PRE implicitely keeps virtual operands at the blocks incoming version
but the explicit updating point during PHI translation fails to trigger
when there are no PHIs at all in a block.  Later lazy updating then
fails because of a too lose block check.  A similar issues plagues
reference invalidation when checking the ANTIC_OUT to ANTIC_IN
translation.  The following fixes both and makes the lazy updating
work.

The diagnostic testcase unfortunately requires boost so the
testcase is the one I reduced for a missed optimization in PRE.
The testcase fails with -m32 on x86_64 because we optimize too
much before PRE which causes PRE to not trigger so we fail to
eliminate a full redundancy.  I'm going to open a separate bug
for this.  Hopefully the !lp64 selector is good enough.

PR tree-optimization/106922
* tree-ssa-pre.cc (translate_vuse_through_block): Only
keep the VUSE if its def dominates PHIBLOCK.
(prune_clobbered_mems): Rewrite logic so we check whether
a value dies in a block when the VUSE def doesn't dominate it.

* g++.dg/tree-ssa/pr106922.C: New testcase.

(cherry picked from commit 5edf02ed2b6de024f83a023d046a6a18f645bc83)

tree-optimization/106892 - avoid invalid pointer association in predcom

When predictive commoning builds a reference for iteration N it
prematurely associates a constant offset into the MEM_REF offset
operand which can be invalid if the base pointer then points
outside of an object which alias-analysis does not consider valid.

PR tree-optimization/106892
* tree-predcom.cc (ref_at_iteration): Do not associate the
constant part of the offset into the MEM_REF offset
operand, across a non-zero offset.

* gcc.dg/torture/pr106892.c: New testcase.

(cherry picked from commit a8b0b13da7379feb31950a9d2ad74b98a29c547f)

tree-optimization/105937 - avoid uninit diagnostics crossing iterations

The following avoids adding PHIs to the worklist for uninit processing
if we reach them following backedges.  That confuses predicate analysis
because it assumes the use is happening in the same iteration as the the
definition.  For the testcase in the PR the situation is like

void foo (int val)
{
  int uninit;
  # val = PHI <..> (B)
  for (..)
    {
      if (..)
        {
          .. = val; (C)
          val = uninit;
        }
      # val = PHI <..> (A)
    }
}

and starting from (A) with 'uninit' as argument we arrive at (B)
and from there at (C).  Predicate analysis then tries to prove
the predicate of (B) (not the backedge) can prove that the
path from (B) to (C) is unreachable which isn't really what it
necessary - that's what we'd need to do when the preheader
edge of the loop were the edge with the uninitialized def.

So the following makes those cases intentionally false negatives.

PR tree-optimization/105937
* tree-ssa-uninit.cc (find_uninit_use): Do not queue PHIs
on backedges.
(execute_late_warn_uninitialized): Mark backedges.

* g++.dg/uninit-pr105937.C: New testcase.

(cherry picked from commit c77fae1ca796d6ea06d5cd437909905c3d3d771c)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8817-g97374f25e1ee7ea45293c244f29425c9f9abcf5a (11th Oct 2022)

Daily bump.

Add cpplib ro.po

* ro.po: New.

Daily bump.

Fortran: Fix ICE and wrong code for assumed-rank arrays [PR100029, PR100040]

gcc/fortran/ChangeLog:

PR fortran/100040
PR fortran/100029
* trans-expr.cc (gfc_conv_class_to_class): Add code to have
assumed-rank arrays recognized as full arrays and fix the type
of the array assignment.
(gfc_conv_procedure_call): Change order of code blocks such that
the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
first.

gcc/testsuite/ChangeLog:

PR fortran/100029
* gfortran.dg/PR100029.f90: New test.

PR fortran/100040
* gfortran.dg/PR100040.f90: New test.

(cherry picked from commit 5299155bb80e90df822e1eebc9f9a0c8e4505a46)

Daily bump.

Reverted: c-c++-common/gomp/map-6.c: Fix dg-error due to mapping changes

Revert commit a24748da5b48f23ac83fa9f2d128f766e80567ed
that was added to OG11 as dg-error fix for the cherry-pick of
"Fortran/OpenMP: Add support for 'close' in map clause"
r12-944-gcdcec2f8505ea12c2236cf0184d77dd2f5de4832

2022-10-06  Tobias Burnus  <tobias@codesourcery.com>

        Revert:
        2021-05-14  Tobias Burnus  <tobias@codesourcery.com>

        * c-c++-common/gomp/map-6.c: Remove two dg-error.

gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump

gcc/testsuite/
* gfortran.dg/gomp/depend-5.f90: Update scan-tree-dump.
* gfortran.dg/gomp/scope-6.f90: Likewise.

Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90}

gcc/testsuite/
* gfortran.dg/gomp/affinity-clause-1.f90: Update pattern for
different array-reference handling in OG12.
* gfortran.dg/gomp/uses_allocators-3.f90: Fix dg-error string.

Fortran: Add OpenMP's assume(s) directives

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.1 Impl. Status): Mark 'assume' as 'Y'.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_assumes): New.
(show_omp_clauses, show_namespace): Call it.
(show_omp_node, show_code_node): Handle OpenMP ASSUME.
* gfortran.h (enum gfc_statement): Add ST_OMP_ASSUME,
ST_OMP_END_ASSUME, ST_OMP_ASSUMES and ST_NOTHING.
(gfc_exec_op): Add EXEC_OMP_ASSUME.
(gfc_omp_assumptions): New struct.
(gfc_get_omp_assumptions): New XCNEW #define.
(gfc_omp_clauses, gfc_namespace): Add assume member.
(gfc_resolve_omp_assumptions): New prototype.
* match.h (gfc_match_omp_assume, gfc_match_omp_assumes): New.
* openmp.cc (omp_code_to_statement): Forward declare.
(enum gfc_omp_directive_kind, struct gfc_omp_directive): New.
(gfc_free_omp_clauses): Free assume member and its struct data.
(enum omp_mask2): Add OMP_CLAUSE_ASSUMPTIONS.
(gfc_omp_absent_contains_clause): New.
(gfc_match_omp_clauses): Call it; optionally use passed
omp_clauses argument.
(omp_verify_merge_absent_contains, gfc_match_omp_assume,
gfc_match_omp_assumes, gfc_resolve_omp_assumptions): New.
(resolve_omp_clauses): Call the latter.
(gfc_resolve_omp_directive, omp_code_to_statement): Handle
EXEC_OMP_ASSUME.
* parse.cc (decode_omp_directive): Parse OpenMP ASSUME(S).
(next_statement, parse_executable, parse_omp_structured_block):
Handle ST_OMP_ASSUME.
(case_omp_decl): Add ST_OMP_ASSUMES.
(gfc_ascii_statement): Handle Assumes, optional return
string without '!$OMP '/'!$ACC ' prefix.
* parse.h (gfc_ascii_statement): Add optional bool arg to prototype.
* resolve.cc (gfc_resolve_blocks, gfc_resolve_code): Add
EXEC_OMP_ASSUME.
(gfc_resolve): Resolve ASSUMES directive.
* symbol.cc (gfc_free_namespace): Free omp_assumes member.
* st.cc (gfc_free_statement): Handle EXEC_OMP_ASSUME.
* trans-openmp.cc (gfc_trans_omp_directive): Likewise.
* trans.cc (trans_code): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/assume-1.f90: New test.
* gfortran.dg/gomp/assume-2.f90: New test.
* gfortran.dg/gomp/assumes-1.f90: New test.
* gfortran.dg/gomp/assumes-2.f90: New test.

(cherry picked from commit e2a228438919d846995bf2c839c9b657442224b2)

OpenMP: Update invoke.texi and fix fortran/parse.cc for -fopenmp-simd

Split off from the 'Fortran: Add OpenMP's assume(s) directives' patch.

gcc/
* doc/invoke.texi (-fopenmp): Mention C++ attribut syntax.
(-fopenmp-simd): Likewise; update permitted directives.

gcc/fortran/
* parse.cc (decode_omp_directive): Handle '(end) loop' and 'scan'
also with -fopenmp-simd.

gcc/testsuite/
* gfortran.dg/gomp/openmp-simd-7.f90: New test.

(cherry picked from commit 8792047470073df0da4a5b91997d6058193d7676)

install.texi: gcn - update llvm reqirements, gcn/nvptx - newlib use version

gcc/
* doc/install.texi (Specific): Add missing items to bullet list.
(amdgcn): Update LLVM requirements, use version not date for newlib.
(nvptx): Use version not git hash for newlib.

(cherry picked from commit e886ebd17965d78f609b62479f4f48085108389c)

Daily bump.

openmp: Add begin declare target support

The following patch adds support for the begin declare target construct,
which is another spelling for declare target construct without clauses
(where it needs paired end declare target), but unlike that one accepts
clauses.

This is an OpenMP 5.1 feature, implemented with 5.2 clarification because
in 5.1 we had a restriction in the declare target chapter shared by
declare target and begin declare target that if there are any clauses
specified at least one of them needs to be to or link.  But that
was of course meant just for declare target and not begin declare target,
because begin declare target doesn't even allow to/link/enter clauses.
In addition to that, the patch also makes device_type clause duplication
an error (as stated in 5.1) and similarly makes declare target with
just device_type clause an error rather than warning.

What this patch doesn't do is:
1) OpenMP 5.1 also added an indirect clause, we don't support that
   neither on declare target nor begin declare target
   and I couldn't find it in our features pages (neither libgomp.texi
   nor web)
2) I think device_type(nohost)/device_type(host) support can't work for
   variables (in 5.0 it only talked about procedures so this could be
   also thought as 5.1 feature that we should just add to the list
   and implement)
3) I don't see any use of the "omp declare target nohost" attribute, so
   I'm not sure if device_type(nohost) works at all

2022-10-04  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-omp.cc (c_omp_directives): Uncomment begin declare target
entry.
gcc/c/
* c-lang.h (struct c_omp_declare_target_attr): New type.
(current_omp_declare_target_attribute): Change type from
int to vec<c_omp_declare_target_attr, va_gc> *.
* c-parser.cc (c_parser_translation_unit): Adjust for that change.
If last pushed directive was begin declare target, use different
wording and simplify format strings for easier translations.
(c_parser_omp_clause_device_type): Uncomment
check_no_duplicate_clause call.
(c_parser_omp_declare_target): Adjust for the
current_omp_declare_target_attribute type change, push { -1 }.
Use error_at rather than warning_at for declare target with
only device_type clauses.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Define.
(c_parser_omp_begin): Add begin declare target support.
(c_parser_omp_end): Adjust for the
current_omp_declare_target_attribute type change, adjust
diagnostics wording and simplify format strings for easier
translations.
* c-decl.cc (current_omp_declare_target_attribute): Change type from
int to vec<c_omp_declare_target_attr, va_gc> *.
(c_decl_attributes): Adjust for the
current_omp_declare_target_attribute type change.  If device_type
was present on begin declare target, add "omp declare target host"
and/or "omp declare target nohost" attributes.
gcc/cp/
* cp-tree.h (struct omp_declare_target_attr): Rename to ...
(cp_omp_declare_target_attr): ... this.  Add device_type member.
(omp_begin_assumes_data): Rename to ...
(cp_omp_begin_assumes_data): ... this.
(struct saved_scope): Change types of omp_declare_target_attribute
and omp_begin_assumes.
* parser.cc (cp_parser_omp_clause_device_type): Uncomment
check_no_duplicate_clause call.
(cp_parser_omp_all_clauses): Fix up pasto, c_name for OMP_CLAUSE_LINK
should be "link" rather than "to".
(cp_parser_omp_declare_target): Adjust for omp_declare_target_attr
to cp_omp_declare_target_attr changes, push -1 as device_type.  Use
error_at rather than warning_at for declare target with only
device_type clauses.
(OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Define.
(cp_parser_omp_begin): Add begin declare target support.  Adjust
for omp_begin_assumes_data to cp_omp_begin_assumes_data change.
(cp_parser_omp_end): Adjust for the
omp_declare_target_attr to cp_omp_declare_target_attr and
omp_begin_assumes_data to cp_omp_begin_assumes_data type changes,
adjust diagnostics wording and simplify format strings for easier
translations.
* semantics.cc (finish_translation_unit): Likewise.
* decl2.cc (cplus_decl_attributes): If device_type was present on
begin declare target, add "omp declare target host" and/or
"omp declare target nohost" attributes.
gcc/testsuite/
* c-c++-common/gomp/declare-target-4.c: Move tests that are now
rejected into declare-target-7.c.
* c-c++-common/gomp/declare-target-6.c: Adjust expected diagnostics.
* c-c++-common/gomp/declare-target-7.c: New test.
* c-c++-common/gomp/begin-declare-target-1.c: New test.
* c-c++-common/gomp/begin-declare-target-2.c: New test.
* c-c++-common/gomp/begin-declare-target-3.c: New test.
* c-c++-common/gomp/begin-declare-target-4.c: New test.
* g++.dg/gomp/attrs-9.C: Add begin declare target tests.
* g++.dg/gomp/attrs-18.C: New test.
libgomp/
* libgomp.texi (Support begin/end declare target syntax in C/C++):
Mark as implemented.

(cherry picked from commit b6d5d72bd0b71ac96a8b2ee537367c46107dcb73)

gcc/config/t-i386: add build dependencies on i386-builtin-types.inc

i386-builtin-types.inc is included indirectly via i386-builtins.h
into 4 files: i386.cc i386-builtins.cc i386-expand.cc i386-features.cc

Only i386.cc dependency was present in gcc/config/t-i386 makefile.

As a result parallel builds occasionally fail as:

    g++ ... -o i386-builtins.o ... ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc
    In file included from ../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc:92:
    ../../gcc-13-20220911/gcc/config/i386/i386-builtins.h:25:10:
     fatal error: i386-builtin-types.inc: No such file or directory
       25 | #include "i386-builtin-types.inc"
          |          ^~~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    make[3]: *** [../../gcc-13-20220911/gcc/config/i386/t-i386:54: i386-builtins.o]
      Error 1 shuffle=1663349189

gcc/
PR target/107064
* config/i386/t-i386: Add build-time dependencies against
i386-builtin-types.inc to i386-builtins.o, i386-expand.o,
i386-features.o.

(cherry picked from commit ef3165736d9daafba88adb2db65b2e8ebf0024ca)

Update gcc sv.po

* sv.po: Update.

Daily bump.

Fortran: Fix automatic reallocation inside select rank [PR100103]

gcc/fortran/ChangeLog:

PR fortran/100103
* trans-array.cc (gfc_is_reallocatable_lhs): Add select rank
temporary associate names as possible targets of automatic
reallocation.

gcc/testsuite/ChangeLog:

PR fortran/100103
* gfortran.dg/PR100103.f90: New test.

(cherry picked from commit 12b537b9b7fd50f4b2fbfcb7ccf45f8d66085577)

Fortran: Fix function attributes [PR100132]

gcc/fortran/ChangeLog:

PR fortran/100132
* trans-types.cc (create_fn_spec): Fix function attributes when
passing polymorphic pointers.

gcc/testsuite/ChangeLog:

PR fortran/100132
* gfortran.dg/PR100132.f90: New test.

(cherry picked from commit be60aa5b608b5f09fadfeff852a46589ac311a42)

Daily bump.

Fortran: Update use_device_ptr for OpenMP 5.1 [PR105318]

OpenMP 5.1 added has_device_addr and relaxed the restrictions for
use_device_ptr, including processing non-type(c_ptr) arguments as
if has_device_addr was used. (There is a semantic difference.)

For completeness, the likewise change was done for 'use_device_ptr',
where non-type(c_ptr) arguments now use use_device_addr.

Finally, a warning for 'device(omp_{initial,invalid}_device)' was
silenced on the way as affecting the new testcase.

PR fortran/105318

gcc/fortran/ChangeLog:
* openmp.cc (resolve_omp_clauses): Update is_device_ptr restrictions
for OpenMP 5.1 and map to has_device_addr where applicable; map
use_device_ptr to use_device_addr where applicable.
Silence integer-range warning for device(omp_{initial,invalid}_device).

libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-2.f90: New test.

gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/is_device_ptr-1.f90: Remove dg-error.
* gfortran.dg/gomp/is_device_ptr-2.f90: Likewise.
* gfortran.dg/gomp/is_device_ptr-3.f90: Update tree-scan-dump.

(cherry picked from commit 10a116104969b3ecc9ea4abdd5436c66fd78d537)

Daily bump.

c++: fix triviality of class with unsatisfied op=

cxx20_pair is trivially copyable because it has a trivial copy constructor
and only a deleted copy assignment operator; the non-triviality of the
unsatisfied copy assignment overload is not considered.

gcc/cp/ChangeLog:

* class.cc (check_methods): Call constraints_satisfied_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/cond-triv3.C: New test.

Fortran: error recovery while simplifying intrinsic UNPACK [PR107054]

gcc/fortran/ChangeLog:

PR fortran/107054
* simplify.cc (gfc_simplify_unpack): Replace assert by condition
that terminates simplification when there are not enough elements
in the constructor of argument VECTOR.

gcc/testsuite/ChangeLog:

PR fortran/107054
* gfortran.dg/pr107054.f90: New test.

(cherry picked from commit 78bc6497fc61bbdacfb416ee0246a775360d9af6)

Fortran: fix ICE in generate_coarray_sym_init [PR82868]

gcc/fortran/ChangeLog:

PR fortran/82868
* trans-decl.cc (generate_coarray_sym_init): Skip symbol
if attr.associate_var.

gcc/testsuite/ChangeLog:

PR fortran/82868
* gfortran.dg/associate_26a.f90: New test.

(cherry picked from commit bc71318a91286b5f00e88f07aab818ac82510692)

Fortran: NULL pointer dereference in invalid simplification [PR106985]

gcc/fortran/ChangeLog:

PR fortran/106985
* expr.cc (gfc_simplify_expr): Avoid NULL pointer dereference.

gcc/testsuite/ChangeLog:

PR fortran/106985
* gfortran.dg/pr106985.f90: New test.

(cherry picked from commit 8dbb15bc2d019488240c1e69d93121b0347ac092)

i386: Mark XMM4-XMM6 as clobbered by encodekey128/encodekey256

encodekey128 and encodekey256 operations clear XMM4-XMM6. But it is
documented that XMM4-XMM6 are reserved for future usages and software
should not rely upon them being zeroed. Change encodekey128 and
encodekey256 to clobber XMM4-XMM6.

gcc/

PR target/107061
* config/i386/predicates.md (encodekey128_operation): Check
XMM4-XMM6 as clobbered.
(encodekey256_operation): Likewise.
* config/i386/sse.md (encodekey128u32): Clobber XMM4-XMM6.
(encodekey256u32): Likewise.

gcc/testsuite/

PR target/107061
* gcc.target/i386/keylocker-encodekey128.c: Don't check
XMM4-XMM6.
* gcc.target/i386/keylocker-encodekey256.c: Likewise.

(cherry picked from commit db288230db55dc1ff626f46c708b555847013a41)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merged up to r12-8794-g85adc2ec2b0736d07c0df35ad9a450f97ff59a7c (29th Sept 2022)

This includes r12-8793-gafea1ae84f0 (cherry-picked from r13-2868-gd3df98807b5)
"OpenACC: Fix reduction tree-sharing issue [PR106982]". However, due to
omp-low.cc changes, it neither applies cleanly nor it required to make the
testcases pass. This merge adds the testcases - but due to conflicts under a
different filename: gcc/testsuite/c-c++-common/goacc/reduction-7.c added as
...-9.c and ...-8.c added as ...-10.c.

libstdc++: Disable volatile-qualified std::bind for C++20

LWG 2487 added a precondition to std::bind for C++17, making
volatile-qualified uses undefined. We still support it, but with a
deprecated warning.

P1065R2 made it explicitly ill-formed for C++20, so we should no longer
accept it as deprecated. This implements that change.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document std::bind API
changes.
* doc/xml/manual/intro.xml: Document LWG 2487 status.
* doc/xml/manual/using.xml: Clarify default value of
_GLIBCXX_USE_DEPRECATED.
* doc/html/*: Regenerate.
* include/std/functional (_Bind::operator()(Args&&...) volatile)
(_Bind::operator()(Args&&...) const volatile)
(_Bind_result::operator()(Args&&...) volatile)
(_Bind_result::operator()(Args&&...) const volatile): Replace
with deleted overload for C++20 and later.
* testsuite/20_util/bind/cv_quals.cc: Check for deprecated
warnings in C++17.
* testsuite/20_util/bind/cv_quals_2.cc: Likewise, and check for
ill-formed in C++20.

(cherry picked from commit d01f112de4a54db6d2abef836e6dff3a08167389)

OpenACC: Fix reduction tree-sharing issue [PR106982]

The tree for var == incoming == outgound was
'MEM <double[5]> [(double *)&reduced]' which caused the ICE
"incorrect sharing of tree nodes".

PR middle-end/106982

gcc/ChangeLog:

* omp-low.cc (lower_oacc_reductions): Add some unshare_expr.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/reduction-7.c: New test.
* c-c++-common/goacc/reduction-8.c: New test.

(cherry picked from commit d3df98807b58df186061ad52ff87cc09ba593e9b)

Daily bump.

OpenMP: Fix ICE with OMP metadirectives

Problem: ending an OpenMP metadirective block with an OMP end statement
results in an internal compiler error.
Solution: reject invalid end statements and issue a proper diagnostic.

This revision also fixes a couple of minor metadirective issues and adds
related test cases.

gcc/fortran/ChangeLog:

* parse.cc (gfc_ascii_statement): Missing $ in !$OMP END METADIRECTIVE.
(parse_omp_structured_block): Fix handling of OMP end metadirective.
(parse_omp_metadirective_body): Reject OMP end statements
at the end of an OMP metadirective.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/metadirective-1.f90: Match !$OMP END METADIRECTIVE.
* gfortran.dg/gomp/metadirective-10.f90: New test.
* gfortran.dg/gomp/metadirective-11.f90: New xfail test.
* gfortran.dg/gomp/metadirective-9.f90: New test.

aarch64: Add Arm Neoverse V2 support

This patch adds -mcpu/-mtune support for the Arm Neoverse V2 core.
This updates the internal references to "demeter", but leaves "demeter" as an
accepted value to -mcpu/-mtune as it appears in the released GCC 12 series.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (neoverse-v2): New entry.
(demeter): Update tunings to neoversev2.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (demeter_addrcost_table): Rename to
neoversev2_addrcost_table.
(demeter_regmove_cost): Rename to neoversev2_addrcost_table.
(demeter_advsimd_vector_cost): Rename to neoversev2_advsimd_vector_cost.
(demeter_sve_vector_cost): Rename to neoversev2_sve_vector_cost.
(demeter_scalar_issue_info): Rename to neoversev2_scalar_issue_info.
(demeter_advsimd_issue_info): Rename to neoversev2_advsimd_issue_info.
(demeter_sve_issue_info): Rename to neoversev2_sve_issue_info.
(demeter_vec_issue_info): Rename to neoversev2_vec_issue_info.
Update references to above.
(demeter_vector_cost): Rename to neoversev2_vector_cost.
(demeter_tunings): Rename to neoversev2_tunings.
(aarch64_vec_op_count::rename_cycles_per_iter): Use
neoversev2_sve_issue_info instead of demeter_sve_issue_info.
* doc/invoke.texi (AArch64 Options): Document neoverse-v2.

(cherry picked from commit 14d4b4fb12041dde1511262b926662929196c3fe)

libgomp.texi: Status 'P' for 'assume', remove duplicated line

libgomp/
* libgomp.texi (OpenMP 5.1): Mark 'assume' as implemented
for C/C++. Remove duplicated 'begin declare target' entry.

(cherry picked from commit 175a89d12392acd9cb09e56acafee6fcf2366392)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8790-g8dbde52fbcd0ad5749398216064637414d639d89 (28th Sep 2022)

Daily bump.

amdgcn: Add builtin for vectorized DFmode fabs operation

2022-09-27 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* config/gcn/gcn-builtins.def (FABSV): New builtin.
* config/gcn/gcn.cc (gcn_expand_builtin_1): Generate
builtin for GCN_BUILTIN_FABSV.

amdgcn: Fix instruction generation for exp2 and log2 operations

The GCN instructions for the exp2 and log2 operations are v_exp_* and v_log_*
respectively, which unfortunately do not line up with the RTL naming
convention. To deal with this, a new set of int attributes is now used when
generating the assembly for these instructions.

2022-09-27 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* config/gcn/gcn-valu.md (math_unop_insn): New attribute.
(<math_unop><mode>2, <math_unop><mode>2<exec>, <math_unop><mode>2,
<math_unop><mode>2<exec>, *<math_unop><mode>2_insn,
*<math_unop><mode>2<exec>_insn): Use math_unop_insn to generate
assembler output.

OpenMP: Generate SIMD clones for functions with "declare target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution. The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled at -O2 and higher.

This is a backport of the proposed mainline patch.
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601972.html

gcc/ChangeLog:

* common.opt (fopenmp-target-simd-clone): New option.
* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
* omp-simd-clone.cc (auto_simd_check_stmt): New function.
(mark_auto_simd_clone): New function.
(simd_clone_create): Add force_local argument, make the symbol
have internal linkage if it is true.
(expand_simd_clones): Also check for cloneable functions with
"omp declare target". Pass explicit_p argument to
simd_clone.compute_vecsize_and_simdlen target hook.
* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
Add bool explicit_p argument.
* doc/tm.texi: Regenerated.
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
* config/gcn/gcn.cc
(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
* config/i386/i386.cc
(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/target-simd-clone-1.c: New.
* gcc.dg/gomp/target-simd-clone-2.c: New.
* gcc.dg/gomp/target-simd-clone-3.c: New.
* gcc.dg/gomp/target-simd-clone-4.c: New.
* gcc.dg/gomp/target-simd-clone-5.c: New.
* gcc.dg/gomp/target-simd-clone-6.c: New.

c-family: Drop nothrow from c_keywords

As discussed in
<https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602337.html>.

gcc/c-family/ChangeLog:

* c-format.cc (c_keywords): Drop nothrow.

openmp: Add OpenMP assume, assumes and begin/end assumes support

The following patch implements OpenMP 5.1
#pragma omp assume
#pragma omp assumes
and
#pragma omp begin assumes
#pragma omp end assumes
directive support for C and C++.  Currently it doesn't remember
anything from the assumption clauses for later, so is mainly
to support the directives and diagnose errors in their use.
If the recently posted C++23 [[assume (cond)]]; support makes it
in, the intent is that this can be easily adjusted at least for
the #pragma omp assume directive with holds clause(s) to use
the same infrastructure.  Now, C++23 portable assumptions are slightly
different from OpenMP 5.1 assumptions' holds clause in that C++23
assumption holds just where it appears, while OpenMP 5.1 assumptions
hold everywhere in the scope of the directive.  For assumes
directive which can appear at file or namespace scope it is the whole
TU and everything that functions from there call at runtime, for
begin assumes/end assumes pair all the functions in between those
directives and everything they call and for assume directive the
associated (currently structured) block.  I have no idea how to
represents such holds to be usable for optimizers, except to
make
#pragma omp assume holds (cond)
block;
expand essentially to
[[assume (cond)]];
block;
or
[[assume (cond)]];
block;
[[assume (cond)]];
for now.  Except for holds clause, the other assumptions are
OpenMP related, I'd say we should brainstorm where it would be
useful to optimize based on such information (I guess e.g. in target
regions it easily could) and only when we come up with something
like that think about how to propagate the assumptions to the optimizers.

2022-09-27  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_ASSUME,
PRAGMA_OMP_ASSUMES and PRAGMA_OMP_BEGIN.  Rename
PRAGMA_OMP_END_DECLARE_TARGET to PRAGMA_OMP_END.
* c-pragma.cc (omp_pragmas): Add assumes and begin.
For end rename PRAGMA_OMP_END_DECLARE_TARGET to PRAGMA_OMP_END.
(omp_pragmas_simd): Add assume.
* c-common.h (c_omp_directives): Declare.
* c-omp.cc (omp_directives): Rename to ...
(c_omp_directives): ... this.  No longer static.  Uncomment
assume, assumes, begin assumes and end assumes entries.
In end declare target entry rename PRAGMA_OMP_END_DECLARE_TARGET
to PRAGMA_OMP_END.
(c_omp_categorize_directive): Adjust for omp_directives to
c_omp_directives renaming.
gcc/c/
* c-lang.h (current_omp_begin_assumes): Declare.
* c-parser.cc: Include bitmap.h.
(c_parser_omp_end_declare_target): Rename to ...
(c_parser_omp_end): ... this.  Handle also end assumes.
(c_parser_omp_begin, c_parser_omp_assumption_clauses,
c_parser_omp_assumes, c_parser_omp_assume): New functions.
(c_parser_translation_unit): Also diagnose #pragma omp begin assumes
without corresponding #pragma omp end assumes.
(c_parser_pragma): Use %s in may only be used at file scope
diagnostics to decrease number of translatable messages.  Handle
PRAGMA_OMP_BEGIN and PRAGMA_OMP_ASSUMES.  Handle PRAGMA_OMP_END
rather than PRAGMA_OMP_END_DECLARE_TARGET and call c_parser_omp_end
for it rather than c_parser_omp_end_declare_target.
(c_parser_omp_construct): Handle PRAGMA_OMP_ASSUME.
* c-decl.cc (current_omp_begin_assumes): Define.
gcc/cp/
* cp-tree.h (struct omp_begin_assumes_data): New type.
(struct saved_scope): Add omp_begin_assumes member.
* parser.cc: Include bitmap.h.
(cp_parser_omp_assumption_clauses, cp_parser_omp_assume,
cp_parser_omp_assumes, cp_parser_omp_begin): New functions.
(cp_parser_omp_end_declare_target): Rename to ...
(cp_parser_omp_end): ... this.  Handle also end assumes.
(cp_parser_omp_construct): Handle PRAGMA_OMP_ASSUME.
(cp_parser_pragma): Handle PRAGMA_OMP_ASSUME, PRAGMA_OMP_ASSUMES
and PRAGMA_OMP_BEGIN.  Handle PRAGMA_OMP_END rather than
PRAGMA_OMP_END_DECLARE_TARGET and call cp_parser_omp_end
for it rather than cp_parser_omp_end_declare_target.
* pt.cc (apply_late_template_attributes): Also temporarily clear
omp_begin_assumes.
* semantics.cc (finish_translation_unit): Also diagnose
#pragma omp begin assumes without corresponding
#pragma omp end assumes.
gcc/testsuite/
* c-c++-common/gomp/assume-1.c: New test.
* c-c++-common/gomp/assume-2.c: New test.
* c-c++-common/gomp/assume-3.c: New test.
* c-c++-common/gomp/assumes-1.c: New test.
* c-c++-common/gomp/assumes-2.c: New test.
* c-c++-common/gomp/assumes-3.c: New test.
* c-c++-common/gomp/assumes-4.c: New test.
* c-c++-common/gomp/begin-assumes-1.c: New test.
* c-c++-common/gomp/begin-assumes-2.c: New test.
* c-c++-common/gomp/begin-assumes-3.c: New test.
* c-c++-common/gomp/begin-assumes-4.c: New test.
* c-c++-common/gomp/declare-target-6.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Add n1 and n2 arguments, add
tests for assume directive.
* g++.dg/gomp/attrs-2.C (bar): Likewise.
* g++.dg/gomp/attrs-9.C: Add n1 and n2 variables, add tests for
begin assumes directive.
* g++.dg/gomp/attrs-15.C: New test.
* g++.dg/gomp/attrs-16.C: New test.
* g++.dg/gomp/attrs-17.C: New test.

(cherry picked from commit 4790fe99f236c7f1b617722403e682ba2f82485f)

Daily bump.

nvptx: Allow '--with-arch' to override the default '-misa'

gcc/
* config.gcc (with_arch) [nvptx]: Allow '--with-arch' to override
the default.
* config/nvptx/gen-multilib-matches.sh: New.
* config/nvptx/t-nvptx (MULTILIB_OPTIONS, MULTILIB_MATCHES)
(MULTILIB_EXCEPTIONS): Handle this.
* doc/install.texi (Specific) <nvptx-*-none>: Document this.
* doc/invoke.texi (Nvidia PTX Options): Likewise.

(cherry picked from commit e9019085e17554c209ca8531022f116b2d7f94fe)

nvptx: Introduce dummy multilib option for default '-misa=sm_30'

... primarily in preparation for later changes.

gcc/
* config.gcc (TM_MULTILIB_CONFIG) [nvptx]: Set to '$with_arch'.
* config/nvptx/t-nvptx (MULTILIB_OPTIONS, MULTILIB_MATCHES)
(MULTILIB_EXCEPTIONS): Handle it.

(cherry picked from commit 4d94582e0dcbf5fed9d61213715bfff877bf5ecf)

nvptx: Make default '-misa=sm_30' explicit

... primarily in preparation for later changes.

gcc/
* config.gcc (with_arch) [nvptx]: Set to 'sm_30'.
* config/nvptx/nvptx.cc (nvptx_option_override): Assert that
'-misa' appeared.
* config/nvptx/nvptx.h (OPTION_DEFAULT_SPECS): Define.
* config/nvptx/nvptx.opt (misa=): Remove 'Init'.

(cherry picked from commit 108b99b6c45ed8fbad6776539a639244b63191f5)

nvptx: forward '-v' command-line option to assembler

For example, for offloading compilation with '-save-temps -v', before vs. after
word-diff then looks like:

    [...]
     [...]/build-gcc-offload-nvptx-none/gcc/as {+-v -v+} -o ./a.xnvptx-none.mkoffload.o ./a.xnvptx-none.mkoffload.s
    {+Verifying sm_30 code with sm_35 code generation.+}
    {+ ptxas -c -o /dev/null ./a.xnvptx-none.mkoffload.o --gpu-name sm_35 -O0+}
    [...]

(This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/37>
"Put '-v' verbose output onto stderr instead of stdout".)

gcc/
* config/nvptx/nvptx.h (ASM_SPEC): Define.

(cherry picked from commit 84072a2615ec1f5f35e994128a6dc22af5bf1322)

Daily bump.

openmp: Fix ICE with taskgroup at -O0 -fexceptions [PR107001]

The following testcase ICEs because with -O0 -fexceptions GOMP_taskgroup_end
call isn't directly followed by GOMP_RETURN statement, but there are some
conditionals to handle exceptions and we fail to find the correct GOMP_RETURN.

The fix is to treat taskgroup similarly to target data, both of these constructs
emit a try { body } finally { end_call } around the construct's body during
gimplification and we need to see proper construct nesting during gimplification
and omp lowering (including nesting of regions checks), but during omp expansion
we don't really need their nesting anymore, all we need is emit something at
the start of the region and the end of the region is the end API call we've
already emitted during gimplification.  For target data, we weren't adding
GOMP_RETURN statement during omp lowering, so after that pass it is treated
merely like stand-alone omp directives.  This patch does the same for
taskgroup too.

2022-09-24  Jakub Jelinek  <jakub@redhat.com>

PR c/107001
* omp-low.cc (lower_omp_taskgroup): Don't add GOMP_RETURN statement
at the end.
* omp-expand.cc (build_omp_regions_1): Clarify GF_OMP_TARGET_KIND_DATA
is not stand-alone directive.  For GIMPLE_OMP_TASKGROUP, also don't
update parent.
(omp_make_gimple_edges) <case GIMPLE_OMP_TASKGROUP>: Reset
cur_region back after new_omp_region.

* c-c++-common/gomp/pr107001.c: New test.

(cherry picked from commit ad2aab5c816a6fd56b46210c0a4a4c6243da1de9)

openmp, c: Tighten up c_tree_equal [PR106981]

This patch changes c_tree_equal to work more like cp_tree_equal, be
more strict in what it accepts.  The ICE on the first testcase was
due to INTEGER_CST wi::wide (t1) == wi::wide (t2) comparison which
ICEs if the two constants have different precision, but as the second
testcase shows, being too lenient in it can also lead to miscompilation
of valid OpenMP programs where we think certain expression is the same
even when it isn't and can be guaranteed at runtime to represent different
memory location.  So, the patch looks through only NON_LVALUE_EXPRs
and for constants as well as casts requires that the types match before
actually comparing the constant values or recursing on the cast operands.

2022-09-24  Jakub Jelinek  <jakub@redhat.com>

PR c/106981
gcc/c/
* c-typeck.cc (c_tree_equal): Only strip NON_LVALUE_EXPRs at the
start.  For CONSTANT_CLASS_P or CASE_CONVERT: return false if t1 and
t2 have different types.
gcc/testsuite/
* c-c++-common/gomp/pr106981.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/pr106981.c: New test.

(cherry picked from commit 3c5bccb608c665ac3f62adb1817c42c845812428)