git.ipfire.org Git - thirdparty/gcc.git/log

libgomp: Handle OpenMP's reverse offloads

This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.

The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.

For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.

libgomp/ChangeLog:

* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.

(cherry picked from commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8)

Fortran/OpenMP: align/allocator modifiers to the allocate clause

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE
output.
* gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'.
(gfc_free_omp_namelist): Add bool arg.
* match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Match 'align/allocate modifers in
'allocate' clause.
(resolve_omp_clauses): Resolve align.
* st.cc (gfc_free_statement): Update call
* trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'.

libgomp/ChangeLog:

* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
item about 'align'; mark clause as 'Y' and directive as 'N'.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
* testsuite/libgomp.fortran/allocate-3.f90: New test.

(cherry picked from commit b2e1c49b4a4592f9e96ae9ece8af7d0e6527b194)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8964-g58791f4db575ad9f952025c5eac4cc46e5c27019 (9th Dec 2022)

Daily bump.

OpenMP: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.

Additionally, a limitation of the number of teams on gcn offload devices is
implemented.  The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit).  This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory.  Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).

gcc/ChangeLog:

* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.

libgomp/ChangeLog:

* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.

(cherry picked from commit 81476bc4f4a20bcf3af7ac2548c2322d48499402)

amdgcn: Support AMD-specific 'isa' and 'arch' traits in OpenMP context selectors

Add libgomp support for 'amdgcn' as arch, and for each processor type (as passed
to '-march') as isa traits.
Add test case for all supported 'isa' values used as context selectors in a
metadirective construct.

libgomp/ChangeLog:

* config/gcn/selector.c (GOMP_evaluate_current_device): Recognise 'amdgcn'
as arch, and '-march' values (as well as 'gfx803') as isa traits.
* testsuite/libgomp.c-c++-common/metadirective-6.c: New test.

amdgcn: Add preprocessor builtins for every processor type

Provide a specific builtin for each possible value of '-march'.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_FIJI): -march=fiji.
(TARGET_VEGA10): -march=gfx900.
(TARGET_VEGA20): -march=gfx906.
(TARGET_GFX908): -march=gfx908.
(TARGET_GFX90a): -march=gfx90a.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Define a builtin that
uniquely maps to '-march'.

(cherry picked from commit e41b243302e9964e642924329826448afb21d28e)

aarch64: Specify that FEAT_MOPS sequences clobber CC

According to the architecture pseudocode the FEAT_MOPS sequences overwrite the NZCV flags
as par of their operation, so GCC needs to model that in the relevant RTL patterns.
For the testcase:
void g();
void foo (int a, size_t N, char *__restrict__ in,
         char *__restrict__ out)
{
  if (a != 3)
    __builtin_memcpy (out, in, N);
  if (a > 3)
    g ();
}

we will currently generate:
foo:
        cmp     w0, 3
        bne     .L6
.L1:
        ret
.L6:
        cpyfp   [x3]!, [x2]!, x1!
        cpyfm   [x3]!, [x2]!, x1!
        cpyfe   [x3]!, [x2]!, x1!
        ble     .L1 // Flags reused after CPYF* sequence
        b       g

This is wrong as the result of cmp needs to be recalculated after the MOPS sequence.
With this patch we'll insert a "cmp w0, 3" before the ble, similar to what clang does.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk and to the GCC 12 branch after some baking time.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_cpymemdi): Specify clobber of CC reg.
(*aarch64_cpymemdi): Likewise.
(aarch64_movmemdi): Likewise.
(aarch64_setmemdi): Likewise.
(*aarch64_setmemdi): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mops_5.c: New test.
* gcc.target/aarch64/mops_6.c: Likewise.
* gcc.target/aarch64/mops_7.c: Likewise.

(cherry picked from commit cbdffae5745327b0e5eb887afc512daf34b049b1)

libgomp.texi: Fix a OpenMP 5.2 and a TR11 impl-status item

libgomp/
* libgomp.texi (OpenMP 5.2): Add missing 'the'.
(TR11): Add missing '@tab N @tab'.

(cherry picked from commit 9f80367e539839fff1df2c85fc2640638199fc9e)

Daily bump.

tree-optimization/107956 - ICE with NULL call LHS

The following adds a missing check for a NULL call LHS in the
vector pattern recognizer.

PR tree-optimization/107956
* tree-vect-patterns.cc (vect_recog_mask_conversion_pattern):
Check for NULL LHS on masked loads.

(cherry picked from commit 5c11d748564c7ce3b096e87ad350fcddd493e45e)

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8954-gb7306f02da33695bec90f153f6725a51d7c0ac71 (1st Dec 2022)

Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of QImode.

For __builtin_ia32_vec_set_v16qi (a, -1, 2) with
!flag_signed_char. it's transformed to
__builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple,
and expanded to (const_int 255) in the rtl. But for immediate_operand,
it expects (const_int 255) to be signed extended to
(const_int -1). The mismatch caused an unrecognizable insn error.

The patch converts (const_int 255) to (const_int -1) in the backend
expander.

gcc/ChangeLog:

PR target/107863
* config/i386/i386-expand.cc (ix86_expand_vec_set_builtin):
Convert op1 to target mode whenever mode mismatch.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr107863.c: New test.

Daily bump.

d: Include tm.h in all D target platform sources, remove memmodel.h

The tm.h header would pull in config/elfos.h, which defines
TARGET_D_MINFO_SECTION needed for the D module support in the front-end
to emit data to the correct section for the run-time library to pick up.

The removal of it in r13-2385 caused a stage2 bootstrap failure on all
Solaris targets.

The memmodel header has also been removed as it is no longer required
now tm_p.h is no longer used by these sources.

gcc/ChangeLog:

* config/darwin-d.cc: Include tm.h.
* config/dragonfly-d.cc: Likewise.
* config/freebsd-d.cc: Remove memmodel.h.
* config/glibc-d.cc: Likewise.
* config/netbsd-d.cc: Include tm.h.
* config/openbsd-d.cc: Likewise.
* config/sol2-d.cc: Likewise.

(cherry picked from commit a7852bd30a19d29ff7986869453786d460d17877)

d: Fix ICE on named continue label in an unrolled loop [PR107592]

Continue labels in an unrolled loop require a unique label per
iteration. Previously this used the Statement body node for each
unrolled iteration to generate a new entry in the label hash table.
This does not work when the continue label has an identifier, as said
named label is pointing to the outer UnrolledLoopStatement node.

What would happen is that during the lowering of `continue label', an
automatic label associated with the unrolled loop would be generated,
and a jump to that label inserted, but because it was never pushed by
the visitor for the loop itself, it subsequently never gets emitted.

To fix, correctly use the UnrolledLoopStatement as the key to look up
and store the break/continue label pair, but remove the continue label
from the value entry after every loop to force a new label to be
generated by the next call to `push_continue_label'

PR d/107592

gcc/d/ChangeLog:

* toir.cc (IRVisitor::push_unrolled_continue_label): New method.
(IRVisitor::pop_unrolled_continue_label): New method.
(IRVisitor::visit (UnrolledLoopStatement *)): Use them instead of
push_continue_label and pop_continue_label.

gcc/testsuite/ChangeLog:

* gdc.dg/pr107592.d: New test.

(cherry picked from commit 031d3f095520f0e1ee03e29b7ad5067c2a3f96e0)

d: Fix #error You must define PREFERRED_DEBUGGING_TYPE if DWARF is not supported

This moves all D front-end specific target definitions out of the main
target headers, and into its own header that is included by tm_d.h
instead of pulling in the same headers as tm_p.h.

This fixes the build on target configurations that pull in the default D
language target hooks, and subsequently trigger an error because the
definition of PREFERRED_DEBUGGING_TYPE is behind tm.h, the one header
that is avoided from being included in default-d.cc.

PR d/105659

gcc/ChangeLog:

* config.gcc: Set tm_d_file to ${cpu_type}/${cpu_type}-d.h.
* config/aarch64/aarch64-d.cc: Include tm_d.h.
* config/aarch64/aarch64-protos.h (aarch64_d_target_versions): Move to
config/aarch64/aarch64-d.h.
(aarch64_d_register_target_info): Likewise.
* config/aarch64/aarch64.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/arm/arm-d.cc: Include tm_d.h and arm-protos.h instead of
tm_p.h.
* config/arm/arm-protos.h (arm_d_target_versions): Move to
config/arm/arm-d.h.
(arm_d_register_target_info): Likewise.
* config/arm/arm.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/default-d.cc: Remove memmodel.h include.
* config/freebsd-d.cc: Include tm_d.h instead of tm_p.h.
* config/glibc-d.cc: Likewise.
* config/i386/i386-d.cc: Include tm_d.h.
* config/i386/i386-protos.h (ix86_d_target_versions): Move to
config/i386/i386-d.h.
(ix86_d_register_target_info): Likewise.
(ix86_d_has_stdcall_convention): Likewise.
* config/i386/i386.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
(TARGET_D_HAS_STDCALL_CONVENTION): Likewise.
* config/i386/winnt-d.cc: Include tm_d.h instead of tm_p.h.
* config/mips/mips-d.cc: Include tm_d.h.
* config/mips/mips-protos.h (mips_d_target_versions): Move to
config/mips/mips-d.h.
(mips_d_register_target_info): Likewise.
* config/mips/mips.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/netbsd-d.cc: Include tm_d.h instead of tm.h and memmodel.h.
* config/openbsd-d.cc: Likewise.
* config/pa/pa-d.cc: Include tm_d.h.
* config/pa/pa-protos.h (pa_d_target_versions): Move to
config/pa/pa-d.h.
(pa_d_register_target_info): Likewise.
* config/pa/pa.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/riscv/riscv-d.cc: Include tm_d.h.
* config/riscv/riscv-protos.h (riscv_d_target_versions): Move to
config/riscv/riscv-d.h.
(riscv_d_register_target_info): Likewise.
* config/riscv/riscv.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/rs6000/rs6000-d.cc: Include tm_d.h.
* config/rs6000/rs6000-protos.h (rs6000_d_target_versions): Move to
config/rs6000/rs6000-d.h.
(rs6000_d_register_target_info): Likewise.
* config/rs6000/rs6000.h (TARGET_D_CPU_VERSIONS) Likewise.:
(TARGET_D_REGISTER_CPU_TARGET_INFO) Likewise.:
* config/s390/s390-d.cc: Include tm_d.h.
* config/s390/s390-protos.h (s390_d_target_versions): Move to
config/s390/s390-d.h.
(s390_d_register_target_info): Likewise.
* config/s390/s390.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/sol2-d.cc: Include tm_d.h instead of tm.h and memmodel.h.
* config/sparc/sparc-d.cc: Include tm_d.h.
* config/sparc/sparc-protos.h (sparc_d_target_versions): Move to
config/sparc/sparc-d.h.
(sparc_d_register_target_info): Likewise.
* config/sparc/sparc.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* configure: Regenerate.
* configure.ac (tm_d_file): Remove defaults.h.
(tm_d_include_list): Remove options.h and insn-constants.h.
* config/aarch64/aarch64-d.h: New file.
* config/arm/arm-d.h: New file.
* config/i386/i386-d.h: New file.
* config/mips/mips-d.h: New file.
* config/pa/pa-d.h: New file.
* config/riscv/riscv-d.h: New file.
* config/rs6000/rs6000-d.h: New file.
* config/s390/s390-d.h: New file.
* config/sparc/sparc-d.h: New file.

(cherry picked from commit d5ad6f8415171798adaff5787400505ce9882144)

Fix addvdi3 and subvdi3 patterns

While most PA 2.0 instructions support both 32 and 64-bit traps
and conditions, the addi and subi instructions only support 32-bit
traps and conditions. Thus, we need to force immediate operands
to register operands on the 64-bit target and use the add/sub
instructions which can trap on 64-bit signed overflow.

2022-11-30 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.md (addvdi3): Force operand 2 to a register.
Remove "addi,tsv,*" instruction from unamed pattern.
(subvdi3): Force operand 1 to a register.
Remove "subi,tsv" instruction from from unamed pattern.

libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors

libgomp/ChangeLog:

* libgomp.texi (OpenMP Context Selectors): Add 'gfx803' to gcn's isa.

(cherry picked from commit e0b95c2e8b771b53876321a6a0a9497619af73cd)

amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors

Add support for gfx803 as an alias for fiji.
Add test cases for all supported 'isa' values.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803.
* config/gcn/t-omp-device: Add gfx803.

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
* testsuite/libgomp.c/declare-variant-4.h: New header file.

(cherry picked from commit 1fd508744eccda9ad9c6d6fcce5b2ea9c568818d)

Daily bump.

gcc: fix PR rtl-optimization/107482

gcc/
PR rtl-optimization/107482
* ira-color.cc (assign_hard_reg): Only call
update_costs_from_copies when retry_p is false.

(cherry picked from commit e581490f0cfa80c58d2b648d71a44a597fbe3008)

Daily bump.

gcn: Fix __builtin_gcn_first_call_this_thread_p

Contrary naive expectation, unspec_volatile (via prologue_use) did not
prevent the cprop pass (at -O2) to remove the access to the s[0:1]
(PRIVATE_SEGMENT_BUFFER_ARG) register as the volatile got just put on
the preceeding pseudoregister. Solution: Use gen_rtx_USE instead.
Additionally, this patch removes (gen_)prologue_use_di as it is then no
longer used.

Finally, as we already do bit manipulation, instead of using the full
64bit side - and then just keeping the value of 's0', just move directly
to use only s1 of s[0:1] and do the bit manipulations there, generating
more readable assembly code and better matching the '#else' branch.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_builtin_1): Work on s1 instead
of s[0:1] and use USE to prevent removal of setting that register.
* config/gcn/gcn.md (prologue_use_di): Remove.

(cherry picked from commit 9fa67f1c1228a852e23943a41e68b664172c654c)

OpenMP/Fortran: Permit end-clause on directive

gcc/fortran/ChangeLog:

* openmp.cc (OMP_DO_CLAUSES, OMP_SCOPE_CLAUSES,
OMP_SECTIONS_CLAUSES): Add 'nowait'.
(OMP_SINGLE_CLAUSES): Add 'nowait' and 'copyprivate'.
(gfc_match_omp_distribute_parallel_do,
gfc_match_omp_distribute_parallel_do_simd,
gfc_match_omp_parallel_do,
gfc_match_omp_parallel_do_simd,
gfc_match_omp_parallel_sections,
gfc_match_omp_teams_distribute_parallel_do,
gfc_match_omp_teams_distribute_parallel_do_simd): Disallow 'nowait'.
(gfc_match_omp_workshare): Match 'nowait' clause.
(gfc_match_omp_end_single): Use clause matcher for 'nowait'.
(resolve_omp_clauses): Reject 'nowait' + 'copyprivate'.
* parse.cc (decode_omp_directive): Break too long line.
(parse_omp_do, parse_omp_structured_block): Diagnose duplicated
'nowait' clause.

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.2): Mark end-directive as Y.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/copyprivate-1.f90: New test.
* gfortran.dg/gomp/copyprivate-2.f90: New test.
* gfortran.dg/gomp/nowait-2.f90: Move dg-error tests ...
* gfortran.dg/gomp/nowait-4.f90: ... to this new file.
* gfortran.dg/gomp/nowait-5.f90: New test.
* gfortran.dg/gomp/nowait-6.f90: New test.
* gfortran.dg/gomp/nowait-7.f90: New test.
* gfortran.dg/gomp/nowait-8.f90: New test.

(cherry picked from commit 091b6dbc48177fa3ef15d62ea280ef6cb61c05b2)

libgomp: Add no-target-region rev offload test + fix plugin-nvptx

OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case. This commit adds a testcase for this.

In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.

libgomp/ChangeLog:

* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
as valid and the code having no reverse-offload code.
* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.

(cherry picked from commit 9f9d128f459e0c5ace8f7b85504d277b5a838daf)

libgomp.texi: OpenMP Impl Status 5.1 additions + TR11

libgomp/ChangeLog:

* libgomp.texi (OpenMP Implementation Status): Add three 5.1 items
and status for Technical Report (TR) 11.

(cherry picked from commit c16e85d726a7793c05209af031dac0bebf035ab9)

Daily bump.

OpenMP: Generate SIMD clones for functions with "declare target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution. The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled for offload processing at -O2 and higher.

Backport of mainline commit 309e2d95e3b930c6f15c8a5346b913158404c76d.

gcc/ChangeLog:

* common.opt (fopenmp-target-simd-clone): New option.
(target_simd_clone_device): New enum to go with it.
* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
* flag-types.h (enum omp_target_simd_clone_device_kind): New.
* omp-simd-clone.cc (auto_simd_fail): New function.
(auto_simd_check_stmt): New function.
(plausible_type_for_simd_clone): New function.
(ok_for_auto_simd_clone): New function.
(simd_clone_create): Add force_local argument, make the symbol
have internal linkage if it is true.
(expand_simd_clones): Also check for cloneable functions with
"omp declare target". Pass explicit_p argument to
simd_clone.compute_vecsize_and_simdlen target hook.
* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
Add bool explicit_p argument.
* doc/tm.texi: Regenerated.
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
* config/gcn/gcn.cc
(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
* config/i386/i386.cc
(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/target-simd-clone-1.C: New.
* g++.dg/gomp/target-simd-clone-2.C: New.
* gcc.dg/gomp/target-simd-clone-1.c: New.
* gcc.dg/gomp/target-simd-clone-2.c: New.
* gcc.dg/gomp/target-simd-clone-3.c: New.
* gcc.dg/gomp/target-simd-clone-4.c: New.
* gcc.dg/gomp/target-simd-clone-5.c: New.
* gcc.dg/gomp/target-simd-clone-6.c: New.
* gcc.dg/gomp/target-simd-clone-7.c: New.
* gcc.dg/gomp/target-simd-clone-8.c: New.
* lib/scanoffloadipa.exp: New.

libgomp/ChangeLog:

* testsuite/lib/libgomp.exp: Load scanoffloadipa.exp library.
* testsuite/libgomp.c/target-simd-clone-1.c: New.
* testsuite/libgomp.c/target-simd-clone-2.c: New.
* testsuite/libgomp.c/target-simd-clone-3.c: New.

Revert "OpenMP: Generate SIMD clones for functions with "declare target""

This reverts commit f01e3b9dfd81973498c0a71a266e530aeb6f0c97.

Daily bump.

Fortran: reject NULL actual argument without explicit interface [PR107576]

gcc/fortran/ChangeLog:

PR fortran/107576
* interface.cc (gfc_procedure_use): Reject NULL as actual argument
when there is no explicit procedure interface.

gcc/testsuite/ChangeLog:

PR fortran/107576
* gfortran.dg/null_actual_3.f90: New test.

(cherry picked from commit 820c25c83561085f54268bd536f9d216d03c3e18)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8941-ged8d7ecac11d587687986a0895050955c09d2f43 (25th Nov 2022)

Daily bump.

Fix thinko in operator_bitwise_xor::op1_range

There is a thinko in the op1_range method of ranger's operator_bitwise_xor
class in a boolean context: if the result is known to be true, it may infer
that a specific operand is false without any basis.

gcc/
* range-op.cc (operator_bitwise_xor::op1_range): Fix thinko.

gcc/testsuite/
* gnat.dg/opt100.adb: New test.
* gnat.dg/opt100_pkg.adb, gnat.dg/opt100_pkg.ads: New helper.

Fix wrong array type conversion with different storage orde

When two arrays of scalars have a different storage order in Ada, the
front-end makes sure that the conversion is performed component-wise
so that each component can be reversed. So it's a little bit counter
productive that the ldist pass performs the opposite transformation
and synthesizes a memcpy/memmove in this case.

gcc/
* tree-loop-distribution.cc (loop_distribution::classify_builtin_ldst):
Bail out if source and destination do not have the same storage order.

gcc/testsuite/
* gnat.dg/sso18.adb: New test.

libstdc++: Remove unnecessary header from <memory>

Previously <memory> included <bits/stl_algobase.h> so that std::copy,
std::fill etc. could be used by <bits/stl_uninitialized.h>. But that
includes it explicitly now, so that it can be compiled as a header unit.
There's no need to include it in <memory>, where its purpose isn't
obvious.

libstdc++-v3/ChangeLog:

* include/std/memory: Do not include <bits/stl_algobase.h>.

(cherry picked from commit d6ccad7641da10d9c5f1f6cfc676d5f5b9d2d126)

libstdc++: Fix tests with non-const operator==

These tests fail in strict -std=c++20 mode but their equality ops don't
need to be non-const, it looks like an accident.

This fixes two FAILs with -std=c++20:
FAIL: 20_util/tuple/swap.cc (test for excess errors)
FAIL: 26_numerics/valarray/87641.cc (test for excess errors)

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/swap.cc (MoveOnly::operator==): Add
const qualifier.
* testsuite/26_numerics/valarray/87641.cc (X::operator==):
Likewise.

(cherry picked from commit fbad7a74aaaddea3d7b39045a09dd3860603658e)

libstdc++: Remove unnecessary variant member in std::expected

Hui Xie pointed out that we don't need a dummy member in the union,
because all constructors always initialize either _M_val or _M_unex.

We still need the _M_void member of the expected<void, E>
specialization, because the constructor has to initialize something when
not using the _M_unex member.

libstdc++-v3/ChangeLog:

* include/std/expected (expected::_M_invalid): Remove.

(cherry picked from commit f4874691812bc20e3d8e3302db439c27f30c472c)

libstdc++: Check static assertions earlier in chrono::duration

This ensures that we fail a static assertion before giving any other
errors. Instantiating chrono::duration<int, chrono::seconds> will now
print this before the other errors caused by it:

error: static assertion failed: period must be a specialization of ratio

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (duration): Check preconditions on
template arguments before using them.

(cherry picked from commit ed77dcb9be76e592b62449c75a5e751485514afd)

libstdc++: Fix dangling reference in filesystem::path::filename()

The new -Wdangling-reference warning noticed this.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::filename()): Fix dangling
reference.

(cherry picked from commit 49237fe6ef677a81eae701f937546210c90b5914)

libstdc++: Document LWG 1203 API change in manual

libstdc++-v3/ChangeLog:

* doc/xml/manual/intro.xml: Document LWG 1203.
* doc/html/*: Regenerate.

(cherry picked from commit 8b1bc3051bd68ce193a8612fa3b1a65c0353b5b0)

libstdc++: Add missing runtime exception to licence notice

This file is missing the GCC Runtime Library Exception text in the
licence header. That is unintentional, and it should have been present.

libstdc++-v3/ChangeLog:

* include/std/barrier: Add missing runtime exception.

(cherry picked from commit d7f282c4243e24f567b11a5cb6048a27a3df733d)

libstdc++: Add comparisons to std::default_sentinel_t (LWG 3719)

This library defect was recently approved for C++23.

libstdc++-v3/ChangeLog:

* include/bits/fs_dir.h (directory_iterator): Add comparison
with std::default_sentinel_t. Remove redundant operator!= for
C++20.
* (recursive_directory_iterator): Likewise.
* include/bits/iterator_concepts.h [!__cpp_lib_concepts]
(default_sentinel_t, default_sentinel): Define even if concepts
are not supported.
* include/bits/regex.h (regex_iterator): Add comparison with
std::default_sentinel_t. Remove redundant operator!= for C++20.
(regex_token_iterator): Likewise.
(regex_token_iterator::_M_end_of_seq()): Add noexcept.
* testsuite/27_io/filesystem/iterators/lwg3719.cc: New test.
* testsuite/28_regex/iterators/regex_iterator/lwg3719.cc:
New test.
* testsuite/28_regex/iterators/regex_token_iterator/lwg3719.cc:
New test.

(cherry picked from commit db33daa4677997399485176303406794dc900987)

libstdc++: Fix std::is_nothrow_invocable_r for uncopyable prvalues [PR91456]

This is the last missing piece of PR 91456.

This also removes the only use of the C++11 version of
std::is_nothrow_invocable.

libstdc++-v3/ChangeLog:

PR libstdc++/91456
* include/std/type_traits (__is_nothrow_invocable): Remove.
(__is_invocable_impl::__nothrow_type): New member type which
checks if the conversion can throw.
(__is_nt_invocable_impl): Replace class template with alias
template to __is_nt_invocable_impl::__nothrow_type.
* testsuite/20_util/is_nothrow_invocable/91456.cc: New test.
* testsuite/20_util/is_nothrow_convertible/value.cc: Remove
macro used by value_ext.cc test.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Remove
test for non-standard __is_nothrow_invocable trait.

(cherry picked from commit 71c828f84572d933979468baf2cf744180258ee4)

Daily bump.

gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

The new builtins have been added for newlib to reduce dependency on
compiler-internal implementation choices of GCC in newlibs' getreent.c.

gcc/ChangeLog:

* config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P,
GET_STACK_LIMIT): Add new builtins.
* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them.
* config/gcn/gcn.md (prologue_use): Add "register_operand" as
arg to match_operand.
(prologue_use_di): New; DI insn_and_split variant of the former.

Co-Authored-By: Andrew Stubbs <ams@codesourcery.com>
(cherry picked from commit d6bbca7b78745915d98bb1324d79de6a1e6dc801)

Daily bump.

libstdc++: Add workaround for fs::path constraint recursion [PR106201]

This works around a compiler bug where overload resolution attempts
implicit conversion to path in order to call a function with a path&
parameter. Such conversion would produce a prvalue, which would not be
able to bind to the lvalue reference anyway. Attempting to check the
conversion causes a constraint recursion because the arguments to the
path constructor are checked to see if they're iterators, which checks
if they're swappable, which tries to use the swap function that
triggered the conversion in the first place.

This replaces the swap function with an abbreviated function template
that is constrained with same_as<path> auto& so that the invalid
conversion is never considered.

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* include/bits/fs_path.h (filesystem::swap(path&, path&)):
Replace with abbreviated function template.
* include/experimental/bits/fs_path.h (filesystem::swap):
Likewise.
* testsuite/27_io/filesystem/iterators/106201.cc: New test.
* testsuite/experimental/filesystem/iterators/106201.cc: New test.

libstdc++: Fix pool resource build errors for H8 [PR107801]

The array of pool sizes was previously adjusted to work for msp430-elf
which has 16-bit int and either 16-bit size_t or 20-bit size_t. The
largest pool sizes were disabled unless size_t has more than 20 bits.

The H8 family has 16-bit int but 32-bit size_t, which means that the
largest sizes are enabled, but 1<<15 produces a negative number that
then cannot be narrowed to size_t.

Replace the test for 32-bit size_t with a test for 32-bit int, which
means we won't use the 4kiB to 4MiB pools for targets with 16-bit int
even if they have a wider size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/107801
* src/c++17/memory_resource.cc (pool_sizes): Disable large pools
for targets with 16-bit int.

(cherry picked from commit 0f9659e770304d3c44cfa0e793833a461bc487aa)

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8924-ga6b1f6126de5e45777610699b6d634605c17711c (21st Nov 2022)

libgomp/gcn: fix/improve struct output

output.printf_data.(value union) contains text[128], which has the size
of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2]
could be extended to value_u64[6] - sufficient for all required arguments
to gomp_target_rev. Additionally, next_output.printf_data.(msg union)
contained msg_u64 which then is no longer needed and also caused 32bit
vs 64bit alignment issues.

libgomp/
* config/gcn/libgomp-gcn.h (struct output):
Remove 'msg_u64' from the union, change
value_u64[2] to value_u64[6].
* config/gcn/target.c (GOMP_target_ext): Update accordingly.
* plugin/plugin-gcn.c (process_reverse_offload, console_output):
Likewise.

(cherry picked from commit 6edcb5dc42625cb0cf84b19c6fe4f944f6322ea0)

i386: Uglify some local identifiers in *intrin.h [PR107748]

While reporting PR107748 (where is a problem with non-uglified names,
but I've left it out because it needs fixing anyway), I've noticed
various spots where identifiers in *intrin.h headers weren't uglified.
The following patch fixed those that are related to unions (I've grepped
for [a-zA-Z]\.[a-zA-Z] spots).
The reason we need those to be uglified is the same as why the arguments
of the inlines are __ prefixed and most of automatic vars in the inlines
- say a, v or u aren't part of implementation namespace and so users could
#define u whatever->something
#include <x86intrin.h>
and it should still work, as long as u is not e.g. one of the names
of the functions/macros the header provides (_mm* etc.).

2022-11-21 Jakub Jelinek <jakub@redhat.com>

PR target/107748
* config/i386/avx512fp16intrin.h (_mm512_castph512_ph128,
_mm512_castph512_ph256, _mm512_castph128_ph512,
_mm512_castph256_ph512, _mm512_set1_pch): Uglify names of local
variables and union members.
* config/i386/avx512fp16vlintrin.h (_mm256_castph256_ph128,
_mm256_castph128_ph256, _mm256_set1_pch, _mm_set1_pch): Likewise.
* config/i386/smmintrin.h (_mm_extract_ps): Likewise.
* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Likewise.

(cherry picked from commit ec8ec09f9414be871e322fecf4ebf53e3687bd22)

Daily bump.

reg-stack: Fix a -fcompare-debug bug in reg-stack [PR107183]

As the following testcase shows, the swap_rtx_condition function
in reg-stack can result in different code generation between -g and -g0.
The function is doing the changes as it goes, so does analysis and
changes together, which makes it harder to deal with DEBUG_INSNs,
where normally analysis phase ignores them and the later phase
doesn't.
swap_rtx_condition walks instructions two different ways, one is
using next_flags_user function which stops on non-call instructions
that mention the flags register, and the other is a loop on fnstsw
where it stops on instructions mentioning it and tries to find
sahf instruction that uses it (in both cases calls stop it and so
does end of basic block).
Now both of these currently stop on DEBUG_INSNs that mention
the flags register resp. the fnstsw result register.
On success the function recurses on next flags user instruction
if still live and if the recursion failed, reverts the changes
it did too and fails.
If it were just for the next_flags_user case, the fix could be
just not doing
      INSN_CODE (insn) = -1;
      if (recog_memoized (insn) == -1)
        fail = 1;
on DEBUG_INSNs (assuming all changes to those are fine),
swap_rtx_condition_1 just changes one comparison to a different
one.  But due to the possibility of fnstsw result being used
in theory before sahf in some DEBUG_INSNs, this patch takes
a different approach.  swap_rtx_condition has now a new argument
and two modes.  The first mode is when debug_seen is >= 0, in this
case both next_flags_user and the loop for fnstsw -> sahf will
ignore but note DEBUG_INSNs (that mention flags register or fnstsw
result).  If no such DEBUG_INSN is found during the whole call
including recursive invocations (so e.g. for -g0 but probably most
often for -g as well), it behaves as before, if it returns true
all the changes are done and nothing further needs to be done later.
If any DEBUG_INSNs are seen along the way, even when returning success
all the changes are reverted, so it just reports that the function
would be successful if DEBUG_INSNs were ignored.
In this case, compare_for_stack_reg needs to call it again in
debug_seen = -1 mode, which tells the function to update everything
including DEBUG_INSNs.  For the fnstsw -> sahf case which I hope
will be very rare I just reset the DEBUG_INSNs, I don't really
know how to express it easily otherwise.  For the rest
swap_rtx_condition_1 is done even on the DEBUG_INSNs.

2022-11-20  Jakub Jelinek  <jakub@redhat.com>

PR target/107183
* reg-stack.cc (next_flags_user): Add DEBUG_SEEN argument.
If >= 0 and a DEBUG_INSN would be otherwise returned, set
DEBUG_SEEN to 1 and ignore it.
(swap_rtx_condition): Add DEBUG_SEEN argument.  In >= 0
mode only set DEBUG_SEEN to 1 if problematic DEBUG_ISNSs
were seen and revert all changes on success in that case.
Don't try to recog_memoized DEBUG_INSNs.
(compare_for_stack_reg): Adjust swap_rtx_condition caller.
If it returns true and debug_seen is 1, call swap_rtx_condition
again with debug_seen -1.

* gcc.dg/ubsan/pr107183.c: New test.

(cherry picked from commit 6b5c98c1c0003bd470a4428bede6c862637a94b8)

c++: Fix a typo in function name

I've noticed I've made a typo in the name of the function.
Fixed thusly.

2022-11-15 Jakub Jelinek <jakub@redhat.com>

* cp-tree.h (next_common_initial_seqence): Rename to ...
(next_common_initial_sequence): ... this.
* typeck.cc (next_common_initial_seqence): Rename to ...
(next_common_initial_sequence): ... this.
(layout_compatible_type_p): Call next_common_initial_sequence
rather than next_common_initial_seqence.
* semantics.cc (is_corresponding_member_aggr): Likewise.

(cherry picked from commit 87c4057b3fc7fe2c2f8914d2755024ca890a3bc1)

libatomic: Handle AVX+CX16 AMD like Intel for 16b atomics [PR104688]

We got a response from AMD in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688#c10
so the following patch starts treating AMD with AVX and CMPXCHG16B
ISAs like Intel by using vmovdqa for atomic load/store in libatomic.
We still don't have confirmation from Zhaoxin and VIA (anything else
with CPUs featuring AVX and CX16?).

2022-11-15 Jakub Jelinek <jakub@redhat.com>

PR target/104688
* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on AMD CPUs.

(cherry picked from commit 4a7a846687e076eae58ad3ea959245b2bf7fdc07)

libgomp/gcn: Prepare for reverse-offload callback handling

libgomp/ChangeLog:

* config/gcn/libgomp-gcn.h: New file; contains
struct output, declared previously in plugin-gcn.c.
* config/gcn/target.c: Include it.
(GOMP_ADDITIONAL_ICVS): Declare as extern var.
(GOMP_target_ext): Handle reverse offload.
* plugin/plugin-gcn.c: Include libgomp-gcn.h.
(struct kernargs): Replace struct def by the one
from libgomp-gcn.h for output_data.
(process_reverse_offload): New.
(console_output): Call it.

(cherry picked from commit 8c05d8cd4300f74bf2698f0a6b96464b5be571be)

nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'

A recent nvptx-tools change: commit 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820
"ld: Don't search for input files in '-L'directories" (of
<https://github.com/MentorEmbedded/nvptx-tools/pull/38>
"Match standard 'ld' "search" behavior") in GCC/nvptx target testing
generally causes linking to fail with:

    error opening crt0.o
    collect2: error: ld returned 1 exit status
    compiler exited with status 1

Indeed per GCC '-v' output, there is an undecorated 'crt0.o' on the linker
('collect2') command line:

     [...]/build-gcc/./gcc/collect2 -o [...] crt0.o [...]

This is due to:

    gcc/config/nvptx/nvptx.h:#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"

..., and the fix, as used by numerous other GCC targets, is to instead use
'crt0.o%s'; for '%s' means, per 'gcc/gcc.cc', "The Specs Language":

     %s     current argument is the name of a library or startup file of some sort.
            Search for that file in a standard list of directories
            and substitute the full name found.

With that, we get the expected path to 'crt0.o'.

gcc/
* config/nvptx/nvptx.h (STARTFILE_SPEC): Fix 'crt0.o' for
'-mmainkernel'.

(cherry picked from commit dda43e1ef0c9f6c32ad022d3a08ce7651e42a129)

LoongArch: Fix atomic_exchange expanding [PR107713]

We used to expand atomic_exchange_n(ptr, new, mem_order) for subword types
into something like:

    {
      __typeof__(*ptr) t = atomic_load_n(ptr, mem_order);
      atomic_compare_exchange_n(ptr, &t, new, true, mem_order, mem_order);
      return t;
    }

It's incorrect because another thread may store a different value into *ptr
after atomic_load_n.  Then atomic_compare_exchange_n will not store into
*ptr, but atomic_exchange_n should always perform the store.

gcc/ChangeLog:

PR target/107713
* config/loongarch/sync.md
(atomic_cas_value_exchange_7_<mode>): New define_insn.
(atomic_exchange): Use atomic_cas_value_exchange_7_si instead of
atomic_cas_value_cmp_and_7_si.

gcc/testsuite/ChangeLog:

PR target/107713
* gcc.target/loongarch/pr107713-1.c: New test.
* gcc.target/loongarch/pr107713-2.c: New test.

(cherry picked from commit f0024bfb228f94e60e06dc32a4983e40a9b90be5)

Daily bump.

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8916-g14faa5f585f6025df1e04c8c8b34340ff5e4d494 (18th Nov 2022)

gcn: Add __builtin_gcn_kernarg_ptr

Add __builtin_gcn_kernarg_ptr to avoid using hard-coded register values
and permit future ABI changes while keeping the API.

gcc/ChangeLog:

* config/gcn/gcn-builtins.def (KERNARG_PTR): Add.
* config/gcn/gcn.cc (gcn_init_builtin_types): Change siptr_type_node,
sfptr_type_node and voidptr_type_node from FLAT to ADDR_SPACE_DEFAULT.
(gcn_expand_builtin_1): Handle GCN_BUILTIN_KERNARG_PTR.
(gcn_oacc_dim_size): Return in ADDR_SPACE_FLAT.

libgomp/ChangeLog:

* config/gcn/team.c (gomp_gcn_enter_kernel): Use
__builtin_gcn_kernarg_ptr instead of asm ("s8").

Co-Authored-By: Andrew Stubbs <ams@codesourcery.com>
(cherry picked from commit 6f83861cc1c4d09425aa6539877bfa50ef90f183)

c++: constinit on pointer to function [PR104066]

[dcl.constinit]: "The constinit specifier shall be applied only to
a declaration of a variable with static or thread storage duration."

Thus, this ought to be OK:

constinit void (*p)() = nullptr;

but the error message I introduced when implementing constinit was
not looking at funcdecl_p, so the code above was rejected.

Fixed thus. I'm checking constinit_p first because I think that's
far more likely to be false than funcdecl_p.

PR c++/104066

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Check funcdecl_p before complaining
about constinit.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit18.C: New test.

(cherry picked from commit 7b3b2f50953c5143d4b14b59d322d8a793f411dd)

Daily bump.

aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

This patch adds support for Ampere-1A CPU:
- recognize the name of the core and provide detection for -mcpu=native,
- updated extra_costs,
- adds a new fusion pair for (A+B+1 and A-B-1).

Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.
* doc/invoke.texi: Add ampere1a.

(cherry picked from commit 590a06afbf0e96813b5879742f38f3665512c854)

SRA: Limit replacement creation for accesses propagated from LHSs

PR 107206 is fallout from the fix to PR 92706 where we started
propagating accesses across assignments also from LHS to RHS of
assignments so that we would not do harmful total scalarization of the
aggregates on the RHS.

But this can lead to new scalarization of these aggregates and in the
testcase of PR 107206 these can appear in superfluous uses of
un-initialized values and spurious warnings.

Fixed by making sure the the accesses created by propagation in this
direction are only used as a basis for replacements when the structure
would be totally scalarized anyway.

gcc/ChangeLog:

2022-10-18 Martin Jambor <mjambor@suse.cz>

PR tree-optimization/107206
* tree-sra.cc (struct access): New field grp_result_of_prop_from_lhs.
(analyze_access_subtree): Do not create replacements for accesses with
this flag when not toally scalarizing.
(propagate_subaccesses_from_lhs): Set the new flag.

gcc/testsuite/ChangeLog:

2022-10-18 Martin Jambor <mjambor@suse.cz>

PR tree-optimization/107206
* g++.dg/tree-ssa/pr107206.C: New test.

(cherry picked from commit f6c168f8c06047bfaa3005e570126831b8855dcc)

nvptx/mkoffload.cc: Fix "$nohost" check

If lhd_set_decl_assembler_name is invoked - in particular if
!TREE_PUBLIC (decl) && !DECL_FILE_SCOPE_P (decl) - the '.nohost' suffix
might change to '.nohost.2'. This happens for the existing reverse offload
testcases via cgraph_node::analyze and is a side effect of
r13-3455-g178ac530fe67e4f2fc439cc4ce89bc19d571ca31 for some reason.

The solution is to not only check for a tailing '$nohost' but also for
'$nohost$' in nvptx/mkoffload.cc.

gcc/ChangeLog:

* config/nvptx/mkoffload.cc (process): Recognize '$nohost$...'
besides tailing '$nohost' as being for reverse offload.

(cherry picked from commit d59858f6ee7f356f27ccc2d29129826781f9483f)

Daily bump.

libstdc++: Fix std::move_only_function for incomplete parameter types

The std::move_only_function::__param_t alias template attempts to
optimize argument passing for the invoker, by passing by rvalue
reference for types that are non-trivial or large. However, the
precondition for is_trivally_copyable makes it unsuitable for using
here, and can cause ODR violations. Just use is_scalar instead, and pass
all class types (even small, trivial ones) by value.

libstdc++-v3/ChangeLog:

* include/bits/mofunc_impl.h (move_only_function::__param_t):
Use __is_scalar instead of is_trivially_copyable.
* testsuite/20_util/move_only_function/call.cc: Check parameters
involving incomplete types.

libstdc++: Fix wstring conversions in filesystem::path [PR95048]

In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use
std::codecvt<CharT, char, mbstate_t> for conversions from all wide
strings to UTF-8, instead of using std::codecvt_utf8<CharT>. This was
done because for 16-bit wchar_t, std::codecvt_utf8<wchar_t> only
supports UCS-2 and not UTF-16. The rationale for the change was sound,
but the actual fix was not. It's OK to use std::codecvt for char16_t or
char32_t, because the specializations for those types always use UTF-8 ,
but std::codecvt<wchar_t, char, mbstate_t> uses the current locale's
encodings, and the narrow encoding is probably ASCII and can't support
non-ASCII characters.

The correct fix is to use std::codecvt only for char16_t and char32_t.
For 32-bit wchar_t we could have continued using std::codecvt_utf8
because that uses UTF-32 which is fine, switching to std::codecvt broke
non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need
to change, but should have changed to std::codecvt_utf8_utf16<wchar_t>
instead, as that always uses UTF-16 not UCS-2. I actually noted that in
the commit message for r9-7381-g91756c4abc1757 but didn't use that
option. Oops.

This replaces the unconditional std::codecvt<CharT, char, mbstate_t>
with a type defined via template specialization, so it can vary
depending on the wide character type. The code is also simplified to
remove some of the mess of #ifdef and if-constexpr conditions.

libstdc++-v3/ChangeLog:

PR libstdc++/95048
* include/bits/fs_path.h (path::_Codecvt): New class template
that selects the kind of code conversion done.
(path::_Codecvt<wchar_t>): Select based on sizeof(wchar_t).
(_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to
be used for Windows and POSIX.
(path::_S_convert(const EcharT*, const EcharT*)): Simplify by
using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions.
(path::_S_str_convert(basic_string_view<value_type>, const A&)):
Simplify nested conditions.
* include/experimental/bits/fs_path.h (path::_Cvt): Define
nested typedef controlling type of code conversion done.
(path::_Cvt::_S_wconvert): Use new typedef.
(path::string(const A&)): Likewise.
* testsuite/27_io/filesystem/path/construct/95048.cc: New test.
* testsuite/experimental/filesystem/path/construct/95048.cc: New
test.

(cherry picked from commit b331bf303bdc1edead41e2b3d11d1a7804b433cf)

libstdc++: Set active union member in constexpr std::string [PR103295]

Clang still complains about using std::string in constexpr contexts due
to the changes made in commit 98a0d72a. This patch ensures that we set
the active member of the union as according to [class.union.general] p6.

libstdc++-v3/ChangeLog:

PR libstdc++/103295
* include/bits/basic_string.h (_M_use_local_data): Set active
member to _M_local_buf.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
(cherry picked from commit 52672be7d328df50f9a05ce3ab44ebcae50fee1b)

Merge branch 'releases/gcc-12' into devel/omp/gcc-12

Merge up to r12-8907-g58da1386d2233b8e01aaac8f7c4a61a2ccf52743 (14th Nov 2022)

libgomp: Fix up build on mingw [PR107641]

Pointers should be first casted to intptr_t/uintptr_t before casting
them to another integral type to avoid warnings.
Furthermore, the function has code like
  else if (upper <= UINT_MAX)
    something;
  else
    something_else;
so it seems using unsigned type for upper where upper <= UINT_MAX is always
true is not intended.

2022-11-12  Jakub Jelinek  <jakub@redhat.com>

PR libgomp/107641
* env.c (parse_unsigned_long): Cast params[2] to uintptr_t rather than
unsigned long.  Change type of upper from unsigned to unsigned long.

(cherry picked from commit 2a193e9df82917eaf440a20f99a3febe91dcb5fe)

Daily bump.

Add guality testcase for RTL alias analysis fix

gcc/testsuite/
* gcc.dg/guality/param-6.c: New test.

Restore RTL alias analysis for hard frame pointer

The change:

2021-07-28 Bin Cheng <bin.cheng@linux.alibaba.com>

alias.c (init_alias_analysis): Don't skip prologue/epilogue.

broke the alias analysis for the hard frame pointer (when it is used as a
frame pointer, i.e. when the frame pointer is not eliminated) described in
the large comment at the top of the file, because static_reg_base_value is
set for it and, consequently, new_reg_base_value too.

When the instruction saving the stack pointer into the hard frame pointer in
the prologue is processed, it is viewed as a second set of the hard frame
pointer and to a different value by record_set, which then proceeds to reset
new_reg_base_value to 0 and the game is over.

gcc/
* alias.cc (init_alias_analysis): Do not record sets to the hard
frame pointer if the frame pointer has not been eliminated.

Daily bump.

Always use TYPE_MODE instead of DECL_MODE for vector field

e034c5c8957 re PR target/78643 (ICE in convert_move, at expr.c:230)

fixed the case where DECL_MODE of a vector field is BLKmode and its
TYPE_MODE is a vector mode because of target attribute. Remove the
BLKmode check for the case where DECL_MODE of a vector field is a vector
mode and its TYPE_MODE isn't a vector mode because of target attribute.

gcc/

PR target/107304
* expr.cc (get_inner_reference): Always use TYPE_MODE for vector
field with vector raw mode.

gcc/testsuite/

PR target/107304
* gcc.target/i386/pr107304.c: New test.

(cherry picked from commit 1c64aba8cdf6509533f554ad86640f274cdbe37f)

libstdc++: Remove empty <author> elements in manual

This fixes a spurious comma before the list of authors in the PDF
version of the libstdc++ manual.

Also fix the commented-out examples which should show <personblurb> not
<authorblurb>.

libstdc++-v3/ChangeLog:

* doc/xml/authors.xml: Remove empty author element.
* doc/xml/manual/spine.xml: Likewise.
* doc/html/manual/index.html: Regenerate.

(cherry picked from commit 4596339d9fabdcbd66b5a7430fa56544f75ecef1)

Daily bump.

amdgcn: Fix expansion of GCN_BUILTIN_LDEXPV builtin

2022-11-07 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand first argument
of GCN_BUILTIN_LDEXPV to V64DFmode.

amdgcn: Various fixes for SIMD math library

Fix an error if VECTOR_RETURN is used on a constant.
Simplify the implementation of ilogb by removing VECTOR_IFs with a trivial
body.

2022-11-07 Kwok Cheung Yeung <kcy@codesourcery.com>

libgcc/
* config/gcn/simd-math/amdgcnmach.h (VECTOR_RETURN): Store value of
return value in a local variable first.
* config/gcn/simd-math/v64df_ilogb.c (ilogb): Simplify.

amdgcn: Fixed intermittent failure in vectorized version of rint

The lane mask was not being updated properly in nested conditionals.
Also fixed an issue causing inaccuracy in double-precision rint.

2022-11-07  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgcc/
* config/gcn/simd-math/v64df_rint.c (rint): Simplified.  Fixed bug in
nested VECTOR_IF.  Fixed issue with signed right-shift.
* config/gcn/simd-math/v64sf_rint.c (rintf): Simplified.  Fixed bug in
nested VECTOR_IF.

Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS

gcc/ChangeLog:

* config/i386/driver-i386.cc (host_detect_local_cpu):
Move sapphirerapids out of AVX512_VP2INTERSECT.
* config/i386/i386.h: Remove AVX512_VP2INTERSECT from PTA_SAPPHIRERAPIDS
* doc/invoke.texi: Remove AVX512_VP2INTERSECT from SAPPHIRERAPIDS

Daily bump.