git.ipfire.org Git - thirdparty/gcc.git/log

libgomp, amdgcn: Switch USM to 128-byte alignment

This should optimize cache-lines on the AMD GPUs somewhat.

libgomp/ChangeLog:

* usm-allocator.c (ALIGN): Use 128-byte alignment.

amdgcn, libgomp: custom USM allocator

There were problems with critical driver data sharing pages with USM data, so
this new allocator implementation moves USM to entirely different pages.

libgomp/ChangeLog:

* plugin/plugin-gcn.c: Include sys/mman.h and unistd.h.
(usm_heap_create): New function.
(struct usm_splay_tree_key_s): Delete function.
(usm_splay_compare): Delete function.
(splay_tree_prefix): Delete define.
(GOMP_OFFLOAD_usm_alloc): Use new allocator.
(GOMP_OFFLOAD_usm_free): Likewise.
(GOMP_OFFLOAD_is_usm_ptr): Likewise.
(gomp_fatal): Delete macro.
(splay_tree_c): Delete.
* usm-allocator.c: New file.

openmp: Fix up finish_omp_target_clauses [PR108286]

The comment in the loop says that we shouldn't add a map clause if such
a clause exists already, but the loop was actually using OMP_CLAUSE_DECL
on any clause. Target construct can have various clauses which don't
have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause
where it means something different (e.g. privatization clauses, allocate,
depend).

So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses.

2023-01-05 Jakub Jelinek <jakub@redhat.com>

PR c++/108286
* semantics.cc (finish_omp_target_clauses): Ignore clauses other than
OMP_CLAUSE_MAP.

* testsuite/libgomp.c++/pr108286.C: New test.

(cherry picked from commit 29c3218618ef6177dc33871b26c8fbd9b21eabe1)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9034-g4494965932fc5d005e2482bbe58cf9e138c830bd (9th Jan 2023)

Daily bump.

libstdc++: Fix std::chrono::hh_mm_ss with unsigned rep [PR108265]

libstdc++-v3/ChangeLog:

PR libstdc++/108265
* include/std/chrono (hh_mm_ss): Do not use chrono::abs if
duration rep is unsigned. Remove incorrect noexcept-specifier.
* testsuite/std/time/hh_mm_ss/1.cc: Check unsigned rep. Check
floating-point representations. Check default construction.

(cherry picked from commit e36e57b032b2d70eaa1294d5921e4fd8ce12a74d)

rs6000: Raise error for __vector_{quad,pair} uses without MMA enabled [PR106736]

As PR106736 shows, it's unexpected to use __vector_quad and
__vector_pair types without MMA support, it would cause ICE
when expanding the corresponding assignment.  We can't guard
these built-in types registering under MMA support as Peter
pointed out in that PR, because the registering is global,
it doesn't work for target pragma/attribute support with MMA
enabled.  The existing verify_type_context mentioned in [2]
can help to make the diagnostics invalid built-in type uses
better, but as Richard pointed out in [4], it can't deal with
all cases.  As the discussions in [1][3], this patch is to
check the invalid use of built-in types __vector_quad and
__vector_pair in mov pattern of OOmode and XOmode, on the
currently being expanded gimple assignment statement.  It
still puts an assertion in else arm rather than just makes
it go through, it's to ensure we can catch any other possible
unexpected cases in time if there are.

[1] https://gcc.gnu.org/pipermail/gcc/2022-December/240218.html
[2] https://gcc.gnu.org/pipermail/gcc/2022-December/240220.html
[3] https://gcc.gnu.org/pipermail/gcc/2022-December/240223.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608083.html

PR target/106736

gcc/ChangeLog:

* config/rs6000/mma.md (define_expand movoo): Call function
rs6000_opaque_type_invalid_use_p to check and emit error message for
the invalid use of opaque type.
(define_expand movxo): Likewise.
* config/rs6000/rs6000-protos.h
(rs6000_opaque_type_invalid_use_p): New function declaration.
(currently_expanding_gimple_stmt): New extern declaration.
* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): New
function.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106736-1.c: New test.
* gcc.target/powerpc/pr106736-2.c: Likewise.
* gcc.target/powerpc/pr106736-3.c: Likewise.
* gcc.target/powerpc/pr106736-4.c: Likewise.
* gcc.target/powerpc/pr106736-5.c: Likewise.

Daily bump.

c++: mark_single_function and SFINAE [PR108282]

We typically ignore mark_used failure when in a non-SFINAE context for
sake of better error recovery. But in mark_single_function we're
instead ignoring mark_used failure in a SFINAE context, which ends up
causing the second static_assert here to incorrectly fail.

PR c++/108282

gcc/cp/ChangeLog:

* decl2.cc (mark_single_function): Ignore mark_used failure
only in a non-SFINAE context rather than in a SFINAE one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires34.C: New test.

(cherry picked from commit 238e292cf5d822f3bd12d9b58eb04cf377758b2a)

libiberty: Fix C89-isms in configure tests

libiberty/

* acinclude.m4 (ac_cv_func_strncmp_works): Add missing
int return type and parameter list to the definition of main.
Include <stdlib.h> and <string.h> for prototypes.
(ac_cv_c_stack_direction): Add missing
int return type and parameter list to the definitions of
main, find_stack_direction. Include <stdlib.h> for exit
prototype.
* configure: Regenerate.

(cherry picked from commit 885b6660c17fb91980b5682514ef54668e544b02)

libsanitizer: Avoid implicit function declaration in configure test

libsanitizer/

* configure.ac (sanitizer_supported): Include <unistd.h> for
syscall prototype.
* configure: Regenerate.

(cherry picked from commit 6be2672e4ee41c566a9e072088cccca263bab5f7)

OpenMP: GC unused SIMD clones

SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them. Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.

gcc/ChangeLog
* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
default constructor to initialize it.
* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
for last and iterate to handle recursive calls. Delete leftover
candidates at the end.
* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
on local clones.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
gc_candidate bit when a clone is used.

gcc/testsuite/ChangeLog
* g++.dg/gomp/target-simd-clone-1.C: Tweak to test
that the unused clone is GC'ed.
* gcc.dg/gomp/target-simd-clone-1.c: Likewise.

(cherry picked from commit 0425ae780fb2b055d985b5719af5edfaaad5e980)

Daily bump.

Fortran: incorrect array bounds when bound intrinsic used in decl [PR108131]

gcc/fortran/ChangeLog:

PR fortran/108131
* array.cc (match_array_element_spec): Avoid too early simplification
of matched array element specs that can lead to a misinterpretation
when used as array bounds in array declarations.

gcc/testsuite/ChangeLog:

PR fortran/108131
* gfortran.dg/pr103505.f90: Adjust expected patterns.
* gfortran.dg/pr108131.f90: New test.

(cherry picked from commit 6a95f0e0a06d78d94138d4c3dd64d41591197281)

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9015-ga3fbfc1027e9edcd14bb290b5702504d80d9e8fe (27th Dec 2022)

Daily bump.

Skip guality tests on hppa-hpux.

The guality check command hangs. This causes TCL errors in
other tests and slows testsuite execution.

2022-11-13 John David Anglin <danglin@gcc.gnu.org>

gcc/testsuite/ChangeLog:

* g++.dg/guality/guality.exp: Skip on hppa*-*-hpux*.
* gcc.dg/guality/guality.exp: Likewise.
* gfortran.dg/guality/guality.exp: Likewise.

Daily bump.

bootstrap/106482 - document minimal GCC version

There's no explicit mention of what GCC compiler supports C++11
and the cross compiler build requirement mentions GCC 4.8 but not
GCC 4.8.3 which is the earliest known version to not run into
C++11 implementation bugs. The following adds explicit wording.

PR bootstrap/106482
* doc/install.texi (ISO C++11 Compiler): Document GCC version
known to work.

(cherry picked from commit b97c33fbd28e74a206c81c96a9b0b9fa3c8545d6)

lto: support --jobserver-style=fifo for recent GNU make

gcc/ChangeLog:

* opts-jobserver.h: Add one member.
* opts-common.cc (jobserver_info::jobserver_info): Parse FIFO
format of --jobserver-auth.

(cherry picked from commit 53e3b2bf16a486c15c20991c6095f7be09012b55)

Factor out jobserver_active_p.

gcc/ChangeLog:

* gcc.cc (driver::detect_jobserver): Remove and move to
jobserver.h.
* lto-wrapper.cc (jobserver_active_p): Likewise.
(run_gcc): Likewise.
* opts-jobserver.h: New file.
* opts-common.cc (jobserver_info::jobserver_info): New function.

(cherry picked from commit 1270ccda70ca09f7d4fe76b5156dca8992bd77a6)

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-9005-g52daccd82cd71bd065826784ebb6eb04fa9b42af (21st Dec 2022)

nvptx: reimplement libgomp barriers [PR99555]

Instead of trying to have the GPU do CPU-with-OS-like things, this new barriers
implementation for NVPTX uses simplistic bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0

It is noted that: there might be a little bit of performance forfeited for
cases where earlier arriving threads could've been used to process tasks ahead
of other threads, but that has the requirement of implementing complex
futex-wait/wake like behavior, which is what we're try to avoid with this patch.
It is deemed that task processing is not what GPU target offloading is usually
used for.

Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in
   the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
   The main synchronization is done using a 'bar.red' instruction. This reduces
   across all threads the condition (team->task_count != 0), to enable the task
   processing down below if any thread created a task.
   (this bar.red usage means that this patch is dependent on the prior NVPTX
   bar.red GCC patch)

PR target/99555

libgomp/ChangeLog:

* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_barrier_t): Remove waiters, lock fields.
(gomp_barrier_init): Remove init of waiters, lock fields.
(gomp_team_barrier_wake): Remove prototype, add new static inline
function.

(cherry picked from commit fdc7469cf597ec11229ddfc3e9c7a06f3d0fba9d)

nvptx: support bar.red instruction

This patch adds support for the PTX 'bar.red' (i.e. "barrier reduction")
instruction, in the form of nvptx-specific __builtin_nvptx_bar_red_[and/or/popc]
built-in functions.

gcc/ChangeLog:

* config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p' case, adjust
comments.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND,
NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC.
(nvptx_expand_bar_red): New function.
(nvptx_init_builtins):
Add DEFs of __builtin_nvptx_bar_red_[and/or/popc].
(nvptx_expand_builtin): Use nvptx_expand_bar_red to expand
NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases.

* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC.
(BARRED): New int iterator.
(barred_op,barred_mode,barred_ptxtype): New int attrs.
(nvptx_barred_<barred_op>): New define_insn.

(cherry picked from commit 623daaf8a229fcb58f84448d954f8c71191ca486)

openmp: Don't try to destruct DECL_OMP_PRIVATIZED_MEMBER vars [PR108180]

DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.

Fixed thusly.

2022-12-21 Jakub Jelinek <jakub@redhat.com>

PR c++/108180
* pt.cc (tsubst_expr): Don't call cp_finish_decl on
DECL_OMP_PRIVATIZED_MEMBER vars.

* testsuite/libgomp.c++/pr108180.C: New test.

(cherry picked from commit 1119902b6c7c1c50123ed85ec1def8be4772d68c)

OpenMP: Duplicate checking for map clauses in Fortran (PR107214)

This patch adds duplicate checking for OpenMP "map" clauses, taking some
cues from the implementation for C in c-typeck.cc:c_finish_omp_clauses
(and similar for C++).

In addition to the existing use of the "mark" and "comp_mark" bitfields
in the gfc_symbol structure, the patch adds several new bits handling
duplicate checking within various categories of clause types. If "mark"
is being used for map clauses, we need to use different bits for other
clauses for cases where "map" and some other clause can refer to the
same symbol (e.g. "map(n) shared(n)").

2022-12-06 Julian Brown <julian@codesourcery.com>

gcc/fortran/
PR fortran/107214
* gfortran.h (gfc_symbol): Add data_mark, dev_mark, gen_mark and
reduc_mark bitfields.
* openmp.cc (resolve_omp_clauses): Use above bitfields to improve
duplicate clause detection.

gcc/testsuite/
PR fortran/107214
* gfortran.dg/gomp/pr107214.f90: New test.
* gfortran.dg/gomp/pr107214-2.f90: New test.
* gfortran.dg/gomp/pr107214-3.f90: New test.
* gfortran.dg/gomp/pr107214-4.f90: New test.
* gfortran.dg/gomp/pr107214-5.f90: New test.
* gfortran.dg/gomp/pr107214-6.f90: New test.
* gfortran.dg/gomp/pr107214-7.f90: New test.
* gfortran.dg/gomp/pr107214-8.f90: New test.

(cherry picked from commit 330b9a8d87dd73e10539da618496ad4964fee26d)

OpenMP/Fortran: Combined directives with map/firstprivate of same symbol

This patch fixes a case where a combined directive (e.g. "!$omp target
parallel ...") contains both a map and a firstprivate clause for the
same variable.  When the combined directive is split into two nested
directives, the outer "target" gets the "map" clause, and the inner
"parallel" gets the "firstprivate" clause, like so:

  !$omp target parallel map(x) firstprivate(x)

  -->

  !$omp target map(x)
    !$omp parallel firstprivate(x)
      ...

When there is no map of the same variable, the firstprivate is distributed
to both directives, e.g. for 'y' in:

  !$omp target parallel map(x) firstprivate(y)

  -->

  !$omp target map(x) firstprivate(y)
    !$omp parallel firstprivate(y)
      ...

This is not a recent regression, but appear to fix a long-standing ICE.
(The included testcase is based on one by Tobias.)

2022-12-06  Julian Brown  <julian@codesourcery.com>

gcc/fortran/
* trans-openmp.cc (gfc_add_firstprivate_if_unmapped): New function.
(gfc_split_omp_clauses): Call above.

libgomp/
* testsuite/libgomp.fortran/combined-directive-splitting-1.f90: New
test.

(cherry picked from commit 9316ad3b4354cbf2980f86902e54884e918c472a)

libstdc++: Fix unsafe use of dirent::d_name [PR107814]

Copy the fix for PR 104731 to the equivalent experimental::filesystem
test.

libstdc++-v3/ChangeLog:

PR libstdc++/107814
* testsuite/experimental/filesystem/iterators/error_reporting.cc:
Use a static buffer with space after it.

(cherry picked from commit 1cac00d013856fea4cee0f13c4959c8e21afd2d9)

libstdc++: Fixes for std::expected

This fixes some bugs in the swap functions for std::expected.

It also disables the noexcept-specifiers for equality operators, because
those are problematic when querying whether a std::expected is equality
comparable. The operator==(const expected<T,E>&, const U&) function is
not constrained, so is viable for comparing expected<T,E> with
expected<void,G>, but then we get an error from the noexcept-specifier.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::_M_swap_val_unex): Guard the
correct object.
(expected::swap): Move is_swappable
requirement from static_assert to constraint.
(swap): Likewise.
(operator==): Remove noexcept-specifier.
* testsuite/20_util/expected/swap.cc: Check swapping of
types without non-throwing move constructor. Check constraints
on swap.
* testsuite/20_util/expected/unexpected.cc: Check constraints on
swap.
* testsuite/20_util/expected/equality.cc: New test.

(cherry picked from commit 59822c39207c9e8be576e9d6c3370bd85ddaf886)

libgfortran's ISO_Fortran_binding.c: Use GCC11 version for backward-only code [PR108056]

Since GCC 12, the conversion between the array descriptors formats - the
internal (GFC) and the C binding one (CFI) - moved to the compiler itself
such that the cfi_desc_to_gfc_desc/gfc_desc_to_cfi_desc functions are only
used with older code (GCC 9 to 11). The newly added checks caused asserts
as older code did not pass the proper values (e.g. real(4) as effective
argument arrived as BT_ASSUME type as the effective type got lost inbetween).

As proposed in the PR, revert to the GCC 11 version - known bugs is better
than some fixes and new issues. Still, GCC 12 is much better in terms of
TS29113 support and should really be used.

This patch uses the current libgomp version of the GCC 11 branch, except
it fixes the GFC version number (which is 0), uses calloc instead of malloc,
and sets the lower bound to 1 instead of keeping it as is for
CFI_attribute_other.

(cherry picked from commit e205ec03f0794aeac3e8a89e947c12624d5a274e)

(This cherry pick excludes an accidentally committed file, which was
removed in follow-up commit 18af26fc375398f0a7cd7bae5aabebd447f8c899.)

libgfortran/ChangeLog:

PR libfortran/108056
* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc,
gfc_desc_to_cfi_desc): Mostly revert to GCC 11 version for
those backward-compatiblity-only functions.

Daily bump.

d/104749 - document host GDC version requirement

This documents that GDC 9.4 or later is required to build the D
language rather than GDC 9.1 which suffers from PR94240.

PR d/104749
* doc/install.texi (GDC): Document GDC 9.4 or later is required
to build the D language frontend.

(cherry picked from commit 05b7cf52e1b640271900894a894da2d27ef90092)

i386: Avoid fma_chain for -march=alderlake and sapphirerapids.

For Alderlake there is similar issue like PR81616, enable
avoid_fma256_chain will also benefit on Intel latest platforms
Alderlake and Sapphire Rapids.

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Add
m_SAPPHIRERAPIDS, m_ALDERLAKE.

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8998-gcdc1a14be1182874ccf1ceb27ee5b67c5ce8c62d (19th Dec 2022)

c++: extract_local_specs and unevaluated contexts [PR100295]

Here during partial instantiation of the constexpr if, extra_local_specs
walks the statement looking for local specializations within to capture.
However, we're thwarted by the fact that 'ts' first appears inside an
unevaluated context, and so the calls to process_outer_var_ref for its
local specializations are a no-op. And since we walk each tree exactly
once, we end up not capturing the local specializations despite 'ts'
later occurring in an evaluated context.

This patch fixes this by making extract_local_specs walk evaluated
contexts first before walking unevaluated contexts. We could probably
get away with not walking unevaluated contexts at all, but this approach
seems more clearly safe.

PR c++/100295
PR c++/107579

gcc/cp/ChangeLog:

* pt.cc (el_data::skip_unevaluated_operands): New data member.
(extract_locals_r): If skip_unevaluated_operands is true,
don't walk into unevaluated contexts.
(extract_local_specs): Walk the pattern twice, first with
skip_unevaluated_operands true followed by it set to false.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if-lambda5.C: New test.

(cherry picked from commit 18499b9f848707aee42d810e99ac0a4c9788433c)

c++: partial ordering with memfn ptr cst [PR108104]

Here we're triggering an overzealous assert in unify during partial
ordering since the member function pointer constants are represented as
ordinary CONSTRUCTORs (with TYPE_PTRMEMFUNC_P TREE_TYPE) but the assert
expects COMPOUND_LITERAL_P constructors.

PR c++/108104

gcc/cp/ChangeLog:

* pt.cc (unify) <default>: Relax assert to accept any
CONSTRUCTOR parm, not just COMPOUND_LITERAL_P one.

gcc/testsuite/ChangeLog:

* g++.dg/template/ptrmem33.C: New test.

(cherry picked from commit 38304846d18d6bb14b0fd6c627c5c6d43a814d01)

c++: pack in requires-expr parm list [PR107417]

Here find_parameter_packs_r isn't detecting the pack T inside the
requires-expr's parameter list ultimately because cp_walk_trees
deliberately avoids walking the list so as to avoid false positives in
the unexpanded pack checker.

But it should still be fine to walk the TREE_TYPE of each parameter,
which we already need to do from for_each_template_parm_r, and is
sufficient to fix the testcase.

PR c++/107417

gcc/cp/ChangeLog:

* pt.cc (for_each_template_parm_r) <case REQUIRES_EXPR>: Move
walking of the TREE_TYPE of each parameter to ...
* tree.cc (cp_walk_subtrees) <case REQUIRES_EXPR>: ... here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires33.C: New test.

(cherry picked from commit 079add3ad39d6620d34665dd9c26c21951eb657c)

c++: substituting CONST_DECL_USING_P enumerators [PR103081]

We implement class-scope using enum by injecting clones of the enum's
CONST_DECLs as fields of the class, for which CONST_DECL_USING_P is
true, so that qualified lookup naturally finds the enumerators.
Substitution into such a CONST_DECL currently ICEs however, because we
assume the DECL_CONTEXT is always the ENUMERAL_TYPE (which has
TYPE_VALUES) but in this case it's the RECORD_TYPE for the class scope
(which has TYPE_FIELDS).

Since these CONST_DECLs appear to always be non-dependent, this patch
fixes this by shortcutting substitution for CONST_DECLs that have
non-dependent DECL_CONTEXT. This subsumes the existing (and seemingly
dead) DECL_NAMESPACE_SCOPE_P early exit test and also benefits
substitution into ordinary non-dependent CONST_DECLs.

PR c++/103081

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy) <case CONST_DECL>: Generalize
early exit test for namespace-scope decls to check dependence of
the enclosing scope instead. Remove dead early exit test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/using-enum-10.C: New test.
* g++.dg/cpp2a/using-enum-10a.C: New test.

(cherry picked from commit b3912122c9dfaba6c8229e8f095885f69782ceda)

c++: ICE with <=> of incompatible pointers [PR107542]

In a SFINAE context composite_pointer_type returns error_mark_node if
the given pointer types are incompatible. But the SPACESHIP_EXPR case
of cp_build_binary_op wasn't prepared for this error_mark_node result,
which led to an ICE (from spaceship_comp_cat) for the below testcase.
(In a non-SFINAE context composite_pointer_type issues a permerror and
returns cv void* in this case, so this ICE seems specific to SFINAE.)

PR c++/107542

gcc/cp/ChangeLog:

* typeck.cc (cp_build_binary_op): In the SPACESHIP_EXPR case,
handle an error_mark_node result type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-sfinae2.C: New test.

(cherry picked from commit 000e9863120cbc75a0f8d497264519974c97669f)

c-family: Fix ICE with -Wsuggest-attribute [PR98487]

Here we crash because check_function_format was using TREE_PURPOSE
directly rather than using get_attribute_name.

PR c/98487

gcc/c-family/ChangeLog:

* c-format.cc (check_function_format): Use get_attribute_name.

gcc/testsuite/ChangeLog:

* c-c++-common/Wsuggest-attribute-1.c: New test.

(cherry picked from commit 68e51bd0a85794cd437d3e740357dfef84dc560d)

Daily bump.

Fortran: ICE on recursive derived types with allocatable components [PR107872]

gcc/fortran/ChangeLog:

PR fortran/107872
* resolve.cc (derived_inaccessible): Skip over allocatable components
to prevent an infinite loop.

gcc/testsuite/ChangeLog:

PR fortran/107872
* gfortran.dg/pr107872.f90: New test.

(cherry picked from commit 01254aa2eb766c7584fd047568d7277d4d65d067)

Daily bump.

libgomp: Fix USM bugs

Fix up some USM corner cases.

libgomp/ChangeLog:

* libgomp.h (OFFSET_USM): New macro.
* target.c (gomp_map_pointer): Handle USM mappings.
(gomp_map_val): Handle OFFSET_USM.
(gomp_map_vars_internal): Move USM check earlier, and use OFFSET_USM.
Add OFFSET_USM check to the second mapping pass.
* testsuite/libgomp.fortran/usm-1.f90: New test.
* testsuite/libgomp.fortran/usm-2.f90: New test.

Daily bump.

AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]

Currently patchable area is at the wrong place on AArch64. It is placed
immediately after function label, before .cfi_startproc. This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
modifies aarch64_print_patchable_function_entry to avoid placing
patchable area before .cfi_startproc.

gcc/
PR target/98776
* config/aarch64/aarch64-protos.h (aarch64_output_patchable_area):
Declared.
* config/aarch64/aarch64.cc (aarch64_print_patchable_function_entry):
Emit an UNSPECV_PATCHABLE_AREA pseudo instruction.
(aarch64_output_patchable_area): New.
* config/aarch64/aarch64.md (UNSPECV_PATCHABLE_AREA): New.
(patchable_area): Define.

gcc/testsuite/
PR target/98776
* gcc.target/aarch64/pr98776.c: New.
* gcc.target/aarch64/pr92424-2.c: Adjust pattern.
* gcc.target/aarch64/pr92424-3.c: Adjust pattern.

Daily bump.

libstdc++: Fix size passed to operator delete [PR108097]

The number of elements gets stored in _M_capacity so use a separate
variable for the number of bytes to allocate.

libstdc++-v3/ChangeLog:

PR libstdc++/108097
* include/std/stacktrace (basic_stracktrace::_Impl): Do not
multiply N by sizeof(value_type) when allocating.

(cherry picked from commit 881c6cabce5d0b27285ed41bd6dabdf48848cce7)

Daily bump.

libphobos: Backport library and bindings fixes from mainline

D Runtime changes:

- Fix MIPS64 bindings for CRuntime_UClibc.

Phobos changes:

- Fix std.path.expandTilde erroneously raising onOutOfMemory
after failed call to getpwnam_r().
- Fix std.random unittest failures on ILP32 targets.
- Use GENERIC_IO on CRuntime_UClibc port of std.stdio.

libphobos/ChangeLog:

* libdruntime/core/stdc/fenv.d: Compile in MIPS uClibc bindings on
MIPS_Any targets.
* libdruntime/core/stdc/math.d: Likewise.
* libdruntime/core/sys/posix/dlfcn.d: Likewise.
* libdruntime/core/sys/posix/setjmp.d: Add MIPS64 definitions for
CRuntime_UClibc.
* libdruntime/core/sys/posix/sys/types.d: Likewise.
* src/std/path.d (expandTilde): Handle more errno codes that could be
left set by getpwnam_r.
* src/std/random.d: Use D_LP64 in unittests.
* src/std/stdio.d: Set CRuntime_UClibc as GENERIC_IO target.

varasm: Fix type confusion bug

This patch fixes a type confusion bug in varasm.cc:assemble_variable.
The problem is that the current code calls:

sect = get_variable_section (decl, false);

and then accesses sect->named.name without checking whether the section
is in fact a named section. In the surrounding else clause, we only know
that SECTION_STYLE (sect) != SECTION_NOSWITCH, so it is possible that
the section is an unnamed section.

In practice, this means that we end up doing a wild string compare
between a function pointer and the string literal ".vtable_map_vars".
This is because sect->named.name aliases sect->unnamed.callback in the
section union.

This can be seen in GDB with a simple testcase such as "int x;".

This patch fixes the issue by checking the SECTION_STYLE of the section
is in fact SECTION_NAMED before trying to do the string comparison.

We drop the existing check of whether sect->named.name is non-NULL
because this should presumably always be the case for a named section.

gcc/ChangeLog:

* varasm.cc (assemble_variable): Fix type confusion bug when
checking for ".vtable_map_vars" section.

(cherry picked from commit de144fdab17dbbb64ccb540056ab78b4ffb3fbbc)

OpenMP, libgomp: Handle unified shared memory in omp_target_is_accessible

This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine
omp_target_is_accessible.

libgomp/ChangeLog:

* target.c (omp_target_is_accessible): Handle unified shared memory.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updated.
* testsuite/libgomp.fortran/target-is-accessible-1.f90: Updated.
* testsuite/libgomp.c-c++-common/target-is-accessible-2.c: New test.
* testsuite/libgomp.fortran/target-is-accessible-2.f90: New test.

Daily bump.

d: Fix undefined reference to nested lambda in template (PR108055)

Sometimes, nested lambdas of templated functions get no code generation
due to them being marked as instantianted outside of all modules being
compiled in the current compilation unit. This despite enclosing
template instances being marked as instantiated inside the current
compilation unit. To fix, all enclosing templates are now checked in
`function_defined_in_root_p'.

Because of this change, `function_needs_inline_definition_p' has also
been fixed up to only check whether the regular function definition
itself is to be emitted in the current compilation unit.

PR d/108055

gcc/d/ChangeLog:

* decl.cc (function_defined_in_root_p): Check all enclosing template
instances for definition in a root module.
(function_needs_inline_definition_p): Replace call to
function_defined_in_root_p with test for outer module `isRoot'.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/imports/pr108055conv.d: New.
* gdc.dg/torture/imports/pr108055spec.d: New.
* gdc.dg/torture/imports/pr108055write.d: New.
* gdc.dg/torture/pr108055.d: New test.

(cherry picked from commit 9fe7d3debbf60ed9fef8053123ad542a99d62100)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8979-g1a8af012222a8386fcda16a61dc17f11ba9cfbfd (12th Dec 2022)

tree-optimization/107898 - ICE with -Walloca-larger-than

The following avoids ICEing with a mismatched prototype for alloca
and -Walloca-larger-than using irange for checks which doesn't
like mismatched types.

PR tree-optimization/107898
* gimple-ssa-warn-alloca.cc (alloca_call_type): Check
the type of the alloca argument is compatible with size_t
before querying ranges.

(cherry picked from commit 9948daa4fd0f0ea0a9d56c2fefe1bca478554d27)

tree-optimization/107865 - ICE with outlining of loops

The following makes sure to clear loops number of iterations when
outlining them as part of a SESE region as can happen with
auto-parallelization. The referenced SSA names become stale otherwise.

PR tree-optimization/107865
* tree-cfg.cc (move_sese_region_to_fn): Free the number of
iterations of moved loops.

* gfortran.dg/graphite/pr107865.f90: New testcase.

(cherry picked from commit bcc2449384f2092cbdf5d6ac2357aeabe3212b2e)

tree-optimization/107833 - invariant motion of uninit uses

The following fixes a wrong-code bug caused by loop invariant motion
hoisting an expression using an uninitialized value outside of its
controlling condition causing IVOPTs to use that to rewrite a defined
value. PR107839 is a similar case involving a bogus uninit diagnostic.

PR tree-optimization/107833
PR tree-optimization/107839
* cfghooks.cc: Include tree.h.
* tree-ssa-loop-im.cc (movement_possibility): Wrap and
make stmts using any ssa_name_maybe_undef_p operand
to preserve execution.
(loop_invariant_motion_in_fun): Call mark_ssa_maybe_undefs
to init maybe-undefined status.
* tree-ssa-loop-ivopts.cc (ssa_name_maybe_undef_p,
ssa_name_set_maybe_undef, ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Move ...
* tree-ssa.cc: ... here.
* tree-ssa.h (ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Declare.
(ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): Define.

* gcc.dg/torture/pr107833.c: New testcase.
* gcc.dg/uninit-pr107839.c: Likewise.

(cherry picked from commit 44c8402d35160515b3c09fd2bc239587e0c32a2b)

tree-optimization/107686 - fix bitfield ref through vec_unpack optimization

The following propely restricts the bitfield access to integral types
when we look through VEC_UNPACK with the intent to emit a widening
conversion.

PR tree-optimization/107686
* tree-ssa-forwprop.cc (optimize_vector_load): Restrict
VEC_UNPACK support to integral typed bitfield refs.

* gcc.dg/pr107686.c: New testcase.

(cherry picked from commit 246bbdaa5f536b7a199dda9860c473137f40d622)

tree-optimization/107766 - ICE with recent -ffp-contract=off fix

The following uses *node to check for FP types rather than the
child nodes which could be constant leafs and thus without a
vector type.

PR tree-optimization/107766
* tree-vect-slp-patterns.cc (complex_mul_pattern::matches):
Use *node to check for FP vector types.

* g++.dg/vect/pr107766.cc: New testcase.

(cherry picked from commit 1a06ae6f2f4f292fd05a900bcf433cb4282da1e3)

tree-optimization/107647 - avoid FMA from SLP with -ffp-contract=off

Only with -ffp-contract=fast we can synthesize FMA operations like
vfmaddsub231ps, so properly guard the transform in SLP pattern
detection.

PR tree-optimization/107647
* tree-vect-slp-patterns.cc (addsub_pattern::recognize): Only
allow FMA generation with -ffp-contract=fast for FP types.
(complex_mul_pattern::matches): Likewise.

* gcc.target/i386/pr107647.c: New testcase.

(cherry picked from commit c5df8392c5848c0462558f41cdf6eab31db301cf)

tree-optimization/107407 - wrong code with DSE

So what happens is that we elide processing of this check with

          /* In addition to kills we can remove defs whose only use
             is another def in defs.  That can only ever be PHIs of which
             we track two for simplicity reasons, the first and last in
             {first,last}_phi_def (we fail for multiple PHIs anyways).
             We can also ignore defs that feed only into
             already visited PHIs.  */
          else if (single_imm_use (vdef, &use_p, &use_stmt)
                   && (use_stmt == first_phi_def
                       || use_stmt == last_phi_def
                       || (gimple_code (use_stmt) == GIMPLE_PHI
                           && bitmap_bit_p (visited,
                                            SSA_NAME_VERSION
                                              (PHI_RESULT (use_stmt))))))

where we have the last PHI being the outer loop virtual PHI and the first
PHI being the loop exit PHI of the outer loop and we've already processed
the single immediate use of the outer loop PHI, the inner loop PHI.  But
we still have to perform the above check!

It's easiest to perform the check when we visit the PHI node instead of
delaying it to the actual processing loop.

PR tree-optimization/107407
* tree-ssa-dse.cc (dse_classify_store): Perform backedge
varying index check when collecting PHI uses rather than
after optimizing processing of the candidate defs.

* gcc.dg/torture/pr107407.c: New testcase.

(cherry picked from commit 031a400e49d8db156c43f9ec0b21ab0c2aee8c6d)

tree-optimization/106868 - bogus -Wdangling-pointer diagnostic

The testcase shows we mishandle the case where there's a pass-through
of a pointer through a function like memcpy. The following adjusts
handling of this copy case to require a taken address and adjust
the PHI case similarly.

PR tree-optimization/106868
* gimple-ssa-warn-access.cc (pass_waccess::gimple_call_return_arg_ref):
Inline into single user ...
(pass_waccess::check_dangling_uses): ... here and adjust the
call and the PHI case to require that ref.aref is the address
of the decl.

* gcc.dg/Wdangling-pointer-pr106868.c: New testcase.

(cherry picked from commit d492d50f644811327c5976e2c918ab6d906ed40c)

libgomp: Handle OpenMP's reverse offloads

This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.

libgomp/ChangeLog:

* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.

(cherry picked from commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8)

Fortran/OpenMP: align/allocator modifiers to the allocate clause

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE
output.
* gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'.
(gfc_free_omp_namelist): Add bool arg.
* match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Match 'align/allocate modifers in
'allocate' clause.
(resolve_omp_clauses): Resolve align.
* st.cc (gfc_free_statement): Update call
* trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'.

libgomp/ChangeLog:

* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
item about 'align'; mark clause as 'Y' and directive as 'N'.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
* testsuite/libgomp.fortran/allocate-3.f90: New test.

(cherry picked from commit b2e1c49b4a4592f9e96ae9ece8af7d0e6527b194)

Daily bump.

d: Remove "final" and "override" from visitor method.

This was added by the backport of an ICE in r12-8969. While harmless,
it was not until r13-758 that "final" and "override" were introduced to
all visitor methods in the D front-end. Removing it from the release
branch just for consistency with the rest of the file.

gcc/d/ChangeLog:

* imports.cc (ImportVisitor::visit (OverloadSet *)): Remove "final"
and "override" from visitor method.

d: Fix internal compiler error: in visit, at d/imports.cc:72 (PR108050)

The visitor for lowering IMPORTED_DECLs did not have an override for
dealing with importing OverloadSet symbols. This has now been
implemented in the code generator.

PR d/108050

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (Import *)): Handle build_import_decl
returning a TREE_LIST.
* imports.cc (ImportVisitor::visit (OverloadSet *)): New override.

gcc/testsuite/ChangeLog:

* gdc.dg/imports/pr108050/mod1.d: New.
* gdc.dg/imports/pr108050/mod2.d: New.
* gdc.dg/imports/pr108050/package.d: New.
* gdc.dg/pr108050.d: New test.

(cherry picked from commit d9d8c9674ad3ad3aa38419d24b1aaaffe31f5d3f)

Daily bump.

i386: fix assert (__builtin_cpu_supports ("x86-64") >= 0)

Similar story as PR103661, we again return a negative number
for __builtin_cpu_supports:

Documentation says:

int __builtin_cpu_supports(const char *feature)
This function returns a positive integer if the run-time CPU supports feature and returns 0 otherwise.
while we return -2147483648.

Moreover, I noticed "x86-64" is not a valid option for __builtin_cpu_is,
but for __builtin_cpu_supports.

PR target/107551

gcc/ChangeLog:

* config/i386/i386-builtins.cc (fold_builtin_cpu): Use same path
as for PR103661.
* doc/extend.texi: Fix "x86-64" use.

gcc/testsuite/ChangeLog:

* gcc.target/i386/builtin_target.c: Add more checks.

(cherry picked from commit d71b20fc30965ba8326ad9363d0aca9d61eb4ed3)

i386: simplify cpu_feature handling

The patch removes unneeded loops for cpu_features2 and CONVERT_EXPR
that can be simplified with NOP_EXPR.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (has_cpu_feature): Directly
compute index in cpu_features2.
(set_cpu_feature): Likewise.
* config/i386/i386-builtins.cc (fold_builtin_cpu): Also remove
loop for cpu_features2 and use NOP_EXPRs.

(cherry picked from commit ef14bba0a6f3836d41d75863e6516d21aef0e936)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8964-g58791f4db575ad9f952025c5eac4cc46e5c27019 (9th Dec 2022)

Daily bump.

OpenMP: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.

Additionally, a limitation of the number of teams on gcn offload devices is
implemented.  The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit).  This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory.  Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).

gcc/ChangeLog:

* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

(cherry picked from commit 81476bc4f4a20bcf3af7ac2548c2322d48499402)

amdgcn: Support AMD-specific 'isa' and 'arch' traits in OpenMP context selectors

Add libgomp support for 'amdgcn' as arch, and for each processor type (as passed
to '-march') as isa traits.
Add test case for all supported 'isa' values used as context selectors in a
metadirective construct.

libgomp/ChangeLog:

* config/gcn/selector.c (GOMP_evaluate_current_device): Recognise 'amdgcn'
as arch, and '-march' values (as well as 'gfx803') as isa traits.
* testsuite/libgomp.c-c++-common/metadirective-6.c: New test.

amdgcn: Add preprocessor builtins for every processor type

Provide a specific builtin for each possible value of '-march'.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_FIJI): -march=fiji.
(TARGET_VEGA10): -march=gfx900.
(TARGET_VEGA20): -march=gfx906.
(TARGET_GFX908): -march=gfx908.
(TARGET_GFX90a): -march=gfx90a.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Define a builtin that
uniquely maps to '-march'.

(cherry picked from commit e41b243302e9964e642924329826448afb21d28e)

aarch64: Specify that FEAT_MOPS sequences clobber CC

According to the architecture pseudocode the FEAT_MOPS sequences overwrite the NZCV flags
as par of their operation, so GCC needs to model that in the relevant RTL patterns.
For the testcase:
void g();
void foo (int a, size_t N, char *__restrict__ in,
         char *__restrict__ out)
{
  if (a != 3)
    __builtin_memcpy (out, in, N);
  if (a > 3)
    g ();
}

we will currently generate:
foo:
        cmp     w0, 3
        bne     .L6
.L1:
        ret
.L6:
        cpyfp   [x3]!, [x2]!, x1!
        cpyfm   [x3]!, [x2]!, x1!
        cpyfe   [x3]!, [x2]!, x1!
        ble     .L1 // Flags reused after CPYF* sequence
        b       g

This is wrong as the result of cmp needs to be recalculated after the MOPS sequence.
With this patch we'll insert a "cmp w0, 3" before the ble, similar to what clang does.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk and to the GCC 12 branch after some baking time.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cpymemdi): Specify clobber of CC reg.
(*aarch64_cpymemdi): Likewise.
(aarch64_movmemdi): Likewise.
(aarch64_setmemdi): Likewise.
(*aarch64_setmemdi): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mops_5.c: New test.
* gcc.target/aarch64/mops_6.c: Likewise.
* gcc.target/aarch64/mops_7.c: Likewise.

(cherry picked from commit cbdffae5745327b0e5eb887afc512daf34b049b1)

libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item

libgomp/
* libgomp.texi (OpenMP 5.2): Add missing 'the'.
(TR11): Add missing '@tab N @tab'.

(cherry picked from commit 9f80367e539839fff1df2c85fc2640638199fc9e)

Daily bump.

tree-optimization/107956 - ICE with NULL call LHS

The following adds a missing check for a NULL call LHS in the
vector pattern recognizer.

PR tree-optimization/107956
* tree-vect-patterns.cc (vect_recog_mask_conversion_pattern):
Check for NULL LHS on masked loads.

(cherry picked from commit 5c11d748564c7ce3b096e87ad350fcddd493e45e)

Daily bump.