Chung-Lin Tang [Fri, 19 May 2023 19:14:04 +0000 (12:14 -0700)]
Use OpenACC code to process OpenMP target regions
This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html
This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.
Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.
gcc/ChangeLog:
* builtins.cc (expand_builtin_omp_builtins): New function.
(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
* cgraphunit.cc (analyze_functions): Add call to
omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
* common.opt (fopenmp-target=): Add new option and enums.
* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
prototype.
(nvptx_mem_shared_p): Likewise.
* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
symbol for number of threads in team.
(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
(need_omp_num_threads): New bool for if any function references
omp_num_threads_sym.
(nvptx_option_override): Initialize omp_num_threads_sym/align.
(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
(nvptx_declare_function_name): Disable shim function under OMPACC mode.
Disable soft-stack under OMPACC mode. Add generation of neutering init
code under OMPACC mode.
(nvptx_output_set_softstack): Return "" under OMPACC mode.
(nvptx_expand_call): Set parallelism to vector for function calls with
"ompacc for" attached.
(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
(nvptx_expand_oacc_join): Likewise.
(nvptx_expand_omp_get_num_threads): New function.
(nvptx_mem_shared_p): New function.
(nvptx_mach_max_workers): Return 1 under OMPACC mode.
(nvptx_mach_vector_length): Return 32 under OMPACC mode.
(nvptx_single): Add adjustments for OMPACC mode, which have
parallel-construct fork/joins, and regions of code where neutering is
dynamically determined.
(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
attribute is attached to function. Disable uniform-simt when under
OMPACC mode.
(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
(nvptx_goacc_fork_join): Return true under OMPACC mode.
* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
(nvptx_shared_mem_operand): New predicate.
(gomp_barrier): New expand pattern.
(omp_get_num_threads): New expand pattern.
(omp_get_num_teams): New insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_team_num): Likewise.
(get_ntid): Likewise.
(nvptx_omp_parallel_fork): Likewise.
(nvptx_omp_parallel_join): Likewise.
* flag-types.h (omp_target_mode_kind): New flag value enum.
* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_ctx_ompacc_p): New function.
(gimplify_omp_for): Handle combined loops under OMPACC.
* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
* omp-expand.cc (remove_exit_barrier): Disable addressable-var
processing for parallel construct child functions under OMPACC mode.
(expand_oacc_for): Add OMPACC mode handling.
(get_target_arguments): Force thread_limit clause value to 1 under
OMPACC mode.
(expand_omp): Under OMPACC mode, avoid child function expanding of
GIMPLE_OMP_PARALLEL.
* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
(ompacc_ctx_p): New function.
(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
(lower_oacc_head_mark): Handle OMPACC mode cases.
(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
vector/gang clauses as needed. Add other OMPACC handling.
(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
(lower_omp_teams): Forward OpenACC privatization variables to outer
target region under OMPACC mode.
(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
GIMPLE_BIND.
* omp-offload.cc (ompacc_supported_clauses_p): New function.
(struct target_region_data): New struct type for tree walk.
(scan_fndecl_for_ompacc): New function.
(scan_omp_target_region_r): New function.
(scan_omp_target_construct_r): New function.
(omp_ompacc_attribute_tagging): New function.
(oacc_dim_call): Add OMPACC case handling.
(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
and no -fopenacc.
* target-insns.def (gomp_barrier): New defined insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_num_threads): Likewise.
(omp_get_team_num): Likewise.
(omp_get_num_teams): Likewise.
* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
for internal clause.
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
OMP_CLAUSE__OMPACC_.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.
* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
mode cases.
libgomp/ChangeLog:
* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
shared memory.
Tobias Burnus [Fri, 12 May 2023 14:27:40 +0000 (16:27 +0200)]
LTO: Fix writing of toplevel asm with offloading [PR109816]
When offloading was enabled, top-level 'asm' were added to the offloading
section, confusing assemblers which did not support the syntax. Additionally,
with offloading and -flto, the top-level assembler code did not end up
in the host files.
As r14-321-g9a41d2cdbcd added top-level 'asm' to one libstdc++ header file,
the issue became more apparent, causing fails with nvptx for some
C++ testcases.
PR libstdc++/109816
gcc/ChangeLog:
* lto-cgraph.cc (output_symtab): Guard lto_output_toplevel_asms by
'!lto_stream_offload_p'.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-class-1.C: New test.
* testsuite/libgomp.c++/target-map-class-2.C: New test.
This is one part of the fix for PR109128, along with a corresponding
binutils's linker change. Without this patch, what happens in the
linker, when an unused object in a .a file has offload data, is that
elf_link_is_defined_archive_symbol calls bfd_link_plugin_object_p,
which ends up calling the plugin's claim_file_handler, which then
records the object as one with offload data. That is, the linker never
decides to use the object in the first place, but use of this _p
interface (called as part of trying to decide whether to use the
object) results in the plugin deciding to use its offload data (and a
consequent mismatch in the offload data present at runtime).
The new hook allows the linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid recording the object as one
with offload data.
The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.
Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.
gcc/ChangeLog:
* omp-transform-loops.cc (full_unroll): Add initialization of index variable.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty pragmas.
Use -O2.
* testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
Copy of testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
but using -O0 which works only for C++.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use -Wall
and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
Likewise and fix broken function calls found by -Wall.
Andrew Stubbs [Wed, 26 Apr 2023 14:23:48 +0000 (15:23 +0100)]
amdgcn: Fix addsub bug
The vec_fmsubadd instuction actually had add twice, by mistake.
Also improve code-gen for all the complex patterns by using properly
undefined values. Mostly this just prevents the compiler reserving space
in the stack frame.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (cmul<conj_op><mode>3): Use gcn_gen_undef.
(cml<addsub_as><mode>4): Likewise.
(vec_addsub<mode>3): Likewise.
(cadd<rot><mode>3): Likewise.
(vec_fmaddsub<mode>4): Likewise.
(vec_fmsubadd<mode>4): Likewise, and use sub for the odd lanes.
Andrew Stubbs [Wed, 19 Apr 2023 16:33:41 +0000 (17:33 +0100)]
amdgcn, openmp: Fix concurrency in low-latency allocator
The previous code works fine on Fiji and Vega 10 devices, but bogs down in The
spin locks on Vega 20 or newer. Adding the sleep instructions fixes the
problem.
libgomp/ChangeLog:
* basic-allocator.c (basic_alloc_free): Use BASIC_ALLOC_YIELD.
(basic_alloc_realloc): Use BASIC_ALLOC_YIELD.
Andrew Stubbs [Fri, 14 Apr 2023 16:05:15 +0000 (17:05 +0100)]
amdgcn: HardFP divide
Implement FP division using hardware instructions. This replaces both the
softfp library calls, and the --fast-math inaccurate divsion we had previously.
The GCN architecture does not have a single divide instruction, but it does
have a number of support instructions designed to make multiply-by-reciprocal
sufficiently accurate for non-fast-math usage.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (SV_SFDF): New iterator.
(SV_FP): New iterator.
(scalar_mode, SCALAR_MODE): Add identity mappings for scalar modes.
(recip<mode>2): Unify the two patterns using SV_FP.
(div_scale<mode><exec_vcc>): New insn.
(div_fmas<mode><exec>): New insn.
(div_fixup<mode><exec>): New insn.
(div<mode>3): Unify the two expanders and rewrite using hardfp.
* config/gcn/gcn.cc (gcn_md_reorg): Support "vccwait" attribute.
* config/gcn/gcn.md (unspec): Add UNSPEC_DIV_SCALE, UNSPEC_DIV_FMAS,
and UNSPEC_DIV_FIXUP.
(vccwait): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/fpdiv.c: Remove the -ffast-math requirement.
Andre Vieira [Tue, 11 Apr 2023 09:07:43 +0000 (10:07 +0100)]
if-conv: Restore MASK_CALL conversion [PR108888]
The original patch to fix this PR broke the if-conversion of calls into
IFN_MASK_CALL. This patch restores that original behaviour and makes sure the
tests added earlier specifically test inbranch SIMD clones.
There were two problems: first, OMP_CLAUSE_CHAIN was erroneously
used as the chain pointer instead of TREE_CHAIN for a non-OMP clause
list. Secondly, "copy_node" by itself is not sufficient to clone the
initialization statement for use in the on-target constructor/destructor
function. Instead we now use walk_tree with "copy_tree_body_r" and
appropriate configuration parameters.
2023-04-05 Julian Brown <julian@codesourcery.com>
gcc/cp/
* decl2.cc (tree-inline.h): Include.
(do_static_initialization_or_destruction): Change OMP_TARGET parameter
to pass the host version of the SSDF function decl. Use
copy_tree_body_r to clone init stmt. Update forward declaration.
(c_parse_final_cleanups): Update calls to
do_static_initialization_or_destruction. Use TREE_CHAIN instead of
OMP_CLAUSE_CHAIN.
Julian Brown [Wed, 22 Mar 2023 22:57:33 +0000 (22:57 +0000)]
[og12] OpenMP: Constructors and destructors for "declare target" static aggregates
This patch adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.
At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.
Tested with offloading to AMD GCN. I will apply to the og12 branch
shortly.
ChangeLog
2023-03-27 Julian Brown <julian@codesourcery.com>
gcc/cp/
* decl2.cc (priority_info): Add omp_tgt_initializations_p and
omp_tgt_destructions_p.
(start_objects, start_static_storage_duration_function,
do_static_initialization_or_destruction,
one_static_initialization_or_destruction,
generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
"declare target" decls. Update forward declarations.
(OMP_SSDF_IDENTIFIER): New macro.
(omp_tgt_ssdf_decls): New vec.
(get_priority_info): Initialize omp_tgt_initializations_p and
omp_tgt_destructions_p fields.
(handle_tls_init): Update call to
omp_static_initialization_or_destruction.
(c_parse_final_cleanups): Support constructors/destructors on OpenMP
offload targets.
gcc/
* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): New builtin.
* tree.cc (get_file_function_name): Support names for on-target
constructor/destructor functions.
libgomp/
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New
test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New
test.
Frederik Harwath [Fri, 24 Mar 2023 17:20:08 +0000 (18:20 +0100)]
openmp: Add C/C++ support for loop transformations on inner loops
Add the parsing of loop transformations on inner loops of a loop-nest.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(c_parser_omp_tile): ... adjust use here,
(c_parser_omp_unroll): ... and here,
(c_parser_omp_for_loop): ... and here. Stop treating loop
transformations like intervening code, parse them, and adjust
the loop-nest depth if necessary for tiling.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_is_pragma): New function.
(cp_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(cp_parser_omp_tile): ... adjust use here,
(cp_parser_omp_unroll): ... and here,
(cp_parser_omp_for_loop): ... and here. Stop treating loop
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/unroll-inner-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: New test.
libgomp/ChangeLog
* testsuite/libgomp.c++/loop-transforms/tile-1.C: Deleted, replaced by
matrix-* tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h:
New header file for new tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
New test.
Frederik Harwath [Fri, 24 Mar 2023 17:29:51 +0000 (18:29 +0100)]
openmp: Add Fortran support for loop transformations on inner loops
So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops. The transformation clauses are extended to
carry the information about the level of the loop-nest at which a
transformation should be applied. The middle end transformation pass
is adjusted to apply the transformations at the right level of a loop
nest and to take their effect on the loop nest depth into account.
gcc/fortran/ChangeLog:
* openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
(resolve_loop_transform_generic): Remove, and ...
(resolve_omp_unroll): ... inline and adapt here. Move function.
Move functin.
(find_nested_loop_in_block): New function.
(find_nested_loop_in_chain): New function, used ...
(is_outer_iteration_variable): ... here, and ...
(expr_is_invariant): ... here.
(resolve_omp_do): Adjust code for resolving loop transformations.
(resolve_omp_tile): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
on new clause.
(compute_transformed_depth): New function to compute the depth
("collapse") of a transformed loop nest, used
(gfc_trans_omp_do): ... here.
gcc/ChangeLog:
* omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
in comment.
(gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
(partial_unroll): Add argument for the loop nest level to be transformed.
(tile): Likewise.
(transform_gomp_for): Pass level to transformatoin functions.
(optimize_transformation_clauses): Handle transformation clauses for all
levels recursively.
* tree-pretty-print.cc (dump_omp_clause): Print
OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
clause operand 0.
(OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
(OMP_CLAUSE_TILE_SIZES): Likewise.
gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(cp_parser_omp_clause_unroll_partial): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
(cp_parser_omp_loop_transform_clause): Likewise.
(cp_parser_omp_nested_loop_transform_clauses): Likewise.
(cp_parser_omp_unroll): Likewise.
* pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
and OMP_CLAUSE_TILE handling to changed number of operands.
gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(c_parser_omp_clause_unroll_partial): Likewise.
(c_parser_omp_tile_sizes): Likewise.
(c_parser_omp_loop_transform_clause): Likewise.
(c_parser_omp_nested_loop_transform_clauses): Likewise.
(c_parser_omp_unroll): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
changed diagnostic messages.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
Frederik Harwath [Fri, 24 Mar 2023 17:18:03 +0000 (18:18 +0100)]
openmp: Add C/C++ support for "omp tile"
This commit adds the C and C++ front end support for the "omp tile"
directive. The middle end support for the transformation is
implemented in a previous commit.
* c-parser.cc (c_parser_nested_omp_unroll_clauses): Rename and
generalize ...
(c_parser_omp_nested_loop_transform_clauses): ... to this.
(c_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(c_parser_omp_tile_sizes): Parse single "sizes" clause.
(c_parser_omp_loop_transform_clause): New function.
(c_parser_omp_tile): New function for parsing "omp tile"
(c_parser_omp_unroll): Adjust to renaming.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_unroll_partial): Adjust.
(cp_parser_nested_omp_unroll_clauses): Rename ...
(cp_parser_omp_nested_loop_transform_clauses): ... to this.
(cp_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(cp_parser_omp_tile_sizes): New function, parses single "sizes" clause
(cp_parser_omp_tile): New function for parsing "omp tile".
(cp_parser_omp_loop_transform_clause): New function.
(cp_parser_omp_unroll): Adjust to renaming.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE.
(cp_parser_pragma): Likewise.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_TILE.
* semantics.cc (finish_omp_clauses): Likewise.
gcc/ChangeLog:
* gimplify.cc (omp_for_drop_tile_clauses): New function, ...
(gimplify_omp_for): ... used here.
libgomp/ChangeLog:
* testsuite/libgomp.c++/loop-transforms/tile-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-2.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-3.C: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/tile-1.c: New test.
* c-c++-common/gomp/loop-transforms/tile-2.c: New test.
* c-c++-common/gomp/loop-transforms/tile-3.c: New test.
* c-c++-common/gomp/loop-transforms/tile-4.c: New test.
* c-c++-common/gomp/loop-transforms/tile-5.c: New test.
* c-c++-common/gomp/loop-transforms/tile-6.c: New test.
* c-c++-common/gomp/loop-transforms/tile-7.c: New test.
* c-c++-common/gomp/loop-transforms/tile-8.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: Adapt
to changed diagnostic messages.
* g++.dg/gomp/loop-transforms/tile-1.h: New test.
* g++.dg/gomp/loop-transforms/tile-1a.C: New test.
* g++.dg/gomp/loop-transforms/tile-1b.C: New test.
* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_loop): Likewise.
* omp-transform-loops.cc (walk_omp_for_loops): New declaration.
(subst_var_in_op): New function.
(subst_var): New function.
(gomp_for_number_of_iterations): Adjust.
(gomp_for_iter_count_type): New function.
(gimple_assign_rhs_to_tree): New function.
(subst_defs): New function.
(gomp_for_uncollapse): Adjust.
(transformation_clause_p): Add OMP_CLAUSE_TILE.
(tile): New function.
(transform_gomp_for): Handle OMP_CLAUSE_TILE.
(optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
* omp-general.cc (omp_loop_transform_clause_p): Add
OMP_CLAUSE_TILE.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
* tree.cc: Add OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_SIZES): New macro.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
Frederik Harwath [Fri, 24 Mar 2023 17:14:23 +0000 (18:14 +0100)]
openmp: Add C/C++ support for "omp unroll" directive
This commit implements the C and the C++ front end changes to support
the "omp unroll" directive. The execution of the loop transformation
relies on the pass that has been added as a part of the earlier
Fortran patch.
gcc/c-family/ChangeLog:
* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_UNROLL.
* c-omp.cc: Add "unroll" to omp_directives[].
* c-pragma.cc: Add "unroll" to omp_pragmas_simd[].
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_UNROLL to
pragma_kind and adjust PRAGMA_OMP__LAST_.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Handle "full" and
"partial" clauses.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(c_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(c_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL
and PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(c_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(c_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise.
* parser.cc (cp_parser_omp_clause_name): Handle "full" clause.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(cp_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(cp_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_UNROLL and
OMP_CLAUSE_FULL.
(cp_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(cp_parser_omp_for_loop): Handle "omp unroll" directives
between directive and loop.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(cp_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
* testsuite/libgomp.c++/loop-transforms/unroll-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/unroll-2.C: New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/unroll-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-3.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-4.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-5.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-6.c: New test.
* g++.dg/gomp/loop-transforms/unroll-1.C: New test.
* g++.dg/gomp/loop-transforms/unroll-2.C: New test.
* g++.dg/gomp/loop-transforms/unroll-3.C: New test.
Frederik Harwath [Fri, 24 Mar 2023 17:11:57 +0000 (18:11 +0100)]
openmp: Add Fortran support for "omp unroll" directive
This commit implements the OpenMP 5.1 "omp unroll" directive for
Fortran. The Fortran front end changes encompass the parsing and the
verification of nesting restrictions etc. The actual loop
transformation is implemented in a new language-independent
"omp_transform_loops" pass which runs before omp lowering. No attempt
is made to re-use existing unrolling optimizations because a separate
implementation allows for better control of the unrolling. The new
pass will also serve as a foundation for the implementation of further
OpenMP loop transformations. This commit only implements the support
for "omp unroll" on the outermost loop of a loop nest. The support
for inner loops will be added later.
gcc/ChangeLog:
* Makefile.in: Add omp_transform_loops.o.
* gimple-pretty-print.cc (dump_gimple_omp_for): Handle "full"
and "partial" clauses.
* gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_TRANSFORM_LOOP.
* gimplify.cc (is_gimple_stmt): Handle OMP_UNROLL.
(gimplify_scan_omp_clauses): Handle OMP_UNROLL_FULL,
OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
(gimplify_adjust_omp_clauses): Handle OMP_UNROLL_FULL,
OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
(gimplify_omp_for): Handle OMP_UNROLL.
(gimplify_expr): Likewise.
* params.opt: Add omp-unroll-full-max-iteration and
omp-unroll-default-factor.
* passes.def: Add pass_omp_transform_loop before
pass_lower_omp.
* tree-core.h (enum omp_clause_code): Add
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_PARTIAL.
* tree-pass.h (make_pass_omp_transform_loops): Declare
pmake_pass_omp_transform_loops.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_PARTIAL.
(dump_generic_node): Handle OMP_UNROLL.
* tree.cc (omp_clause_num_ops): Add number of operators
for OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
OMP_CLAUSE_UNROLL_PARTIAl.
(omp_clause_code_names): Add name strings for
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
OMP_CLAUSE_UNROLL_PARTIAL.
* tree.def (OMP_UNROLL): Define.
* tree.h (OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Define.
* omp-transform-loops.cc: New file.
* omp-general.cc (omp_loop_transform_clause_p): New function.
* omp-general.h (omp_loop_transform_clause_p): New declaration.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-5.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-8.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-6.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-7.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90: New test.
Thomas Schwinge [Fri, 17 Feb 2023 13:13:15 +0000 (14:13 +0100)]
Miscellaneous clean-up re OpenMP 'ompx_host_mem_space'
Like done for nvptx in og12 commit 23f52e49368d7b26a1b1a72d6bb903d31666e961
"Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 'ompx_host_mem_space'".
Tobias Burnus [Thu, 23 Mar 2023 17:04:17 +0000 (18:04 +0100)]
Fortran/OpenMP: Fix 'alloc' and 'from' mapping for allocatable components
Even with 'alloc' and map-entering 'from' mapping, the following should hold.
For explicit mapping, that's already the case, this handles the automatical
deep mapping of allocatable components. Namely:
* On the device, the array bounds (of allocated allocatables) must match the
host, implying 'to' (or 'tofrom') mapping.
* On map exiting, the copying out shall not destroy the unallocated allocation
status (nor the pointer address of allocated allocatables).
The latter was not a problem for allocated allocatables as for those a pointer
was GOMP_MAP_ATTACHed; however, for unallocated allocatables, before it copied
back device-allocated memory which might not be nullified.
While 'alloc' was not deep-mapped at all, for map-entering 'from', the array
bounds were not set, making allocated derived-type components inaccessible on
the device (and wrong on the host on copy back).
The solution is, first, to deep-map 'alloc' as well and to copy to the device
even with 'alloc' and (map-entering) 'from'. This copying is only done if there
is a scalar (for the unallocated case) or array allocatable directly in the
derived type and then it is shallowly copied; the data pointed to is then again
only alloc'ed, unless it contains in turn allocatables.
gcc/fortran/
* trans-openmp.cc (gfc_has_alloc_comps): Add 'bool
shallow_alloc_only=false' arg.
(gfc_omp_replace_alloc_by_to_mapping): New, call it.
(gfc_omp_deep_map_kind_p): Return 'true' also for '(present,)alloc'.
(gfc_omp_deep_mapping_item, gfc_omp_deep_mapping_do): On map entering,
replace shallowly 'alloc'/'from' by '(from)to' mapping if there are
allocatable components.
libgomp/
* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
Tobias Burnus [Thu, 23 Mar 2023 07:57:45 +0000 (08:57 +0100)]
OpenMP/Fortran: Fix unmapping of GOMP_MAP_POINTER for scalar allocatables/pointers
target exit data: Do unmap GOMP_MAP_POINTER for scalar allocatables/pointers
to prevent stale mappings.
While for allocatable/pointer arrays, there is a PSET followed by POINTER,
for allocatable/pointer scalars there is only a POINTER. Before the below
mentioned OG12 patch: For exit data, PSET was converted to RELEASE/DELETE
in gimplify.cc while all POINTER were removed; correct for arrays but leaving
POINTER behind for scalars. Since that commit, all in trans-openmp.cc but
the scalar case was still mishandled before this follow-up commit.
This is a follow up to OG12's 55a18d4744258e3909568e425f9f473c49f9d13f
While the problem is independent, it will be merged into v4 of the
mainline patch
'Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings'
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Fix unmapping of
GOMP_MAP_POINTER for scalar allocatables/pointers.
Andrew Stubbs [Fri, 17 Mar 2023 11:04:12 +0000 (11:04 +0000)]
amdgcn: Fix register size bug
Fix an issue in which "vectors" of duplicate entries placed in scalar
registers caused the following 63 registers to be marked live, for the
purpose of prologue generation, which resulted in stack corruption.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_class_max_nregs): Handle vectors in SGPRs.
(move_callee_saved_registers): Detect the bug condition early.
While writing a testcase for PR106794, I noticed that we failed
to vectorise the testcase in the patch for SVE. The code that
recognises gather loads tries to optimise the point at which
the offset is calculated, to avoid unnecessary extensions or
truncations:
/* Don't include the conversion if the target is happy with
the current offset type. */
But breaking only makes sense if we're at an SSA_NAME (which could
then be vectorised). We shouldn't break on a conversion embedded
in a generic expression.
gcc/
* tree-vect-data-refs.cc (vect_check_gather_scatter): Restrict
early-out optimisation to SSA_NAMEs.
gcc/testsuite/
* gcc.dg/vect/vect-gather-5.c: New test.
Andrew Stubbs [Mon, 6 Mar 2023 12:42:44 +0000 (12:42 +0000)]
amdgcn: gather/scatter with DImode offsets
The GPU architecture requires SImode offsets on gather/scatter instructions,
but they can also take a vector of absolute addresses, so this allows
gather/scatter in more situations.
Andrew Stubbs [Wed, 1 Mar 2023 15:32:50 +0000 (15:32 +0000)]
amdgcn: vec_extract no-op insns
Just using move insn for no-op conversions triggers special move handling in
IRA which declares that subreg of vectors aren't valid and routes everything
through memory. These patterns make the vec_select explicit and all is well.
Thomas Schwinge [Fri, 10 Mar 2023 17:14:44 +0000 (18:14 +0100)]
Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.
This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.
Thomas Schwinge [Mon, 27 Feb 2023 15:41:17 +0000 (16:41 +0100)]
Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data
This does *allow*, but under no circumstances is this currently going to be
used: all potentially applicable data is non-'ephemeral', and thus not
considered for 'gomp_coalesce_buf_add' for OpenACC 'async'. (But a use will
emerge later.)
TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question:
is it more performant to use libgomp CBUF buffering or individual device
asyncronous copying?)
Ephemeral data is small, and therefore individual device asyncronous copying
does seem dubious -- in particular given that for all those, we'd individually
have to allocate and queue for deallocation a temporary buffer to capture the
ephemeral data. Instead, just let the 'cbuf' *be* the temporary buffer.
libgomp/
* target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow
libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral'
data.
Thomas Schwinge [Fri, 24 Feb 2023 15:17:57 +0000 (16:17 +0100)]
OpenACC: Remove 'acc_async_test' -> skip shortcut in 'libgomp/oacc-async.c:goacc_wait'
We're not taking such a shortcut anywhere else, and (with future changes) it
has potential to confuse things if synchronization in a libgomp plugin happens
to have side effects even if an async queue currently is empty.
Thomas Schwinge [Thu, 2 Mar 2023 17:36:47 +0000 (18:36 +0100)]
libgomp: Merge 'gomp_map_vars_openacc' into 'goacc_map_vars' [PR76739]
Upstream has 'goacc_map_vars'; merge the new 'gomp_map_vars_openacc' into it.
(Maybe the latter didn't exist yet when the former was originally added?)
No functional change.
amdgcn: Add instruction patterns for conditional min/max operations
gcc/ChangeLog:
* config/gcn/gcn-valu.md (<expander><mode>3_exec): Add patterns for
{s|u}{max|min} in QI, HI and DI modes.
(<expander><mode>3): Add pattern for {s|u}{max|min} in DI mode.
(cond_<fexpander><mode>): Add pattern for cond_f{max|min}.
(cond_<expander><mode>): Add pattern for cond_{s|u}{max|min}.
* config/gcn/gcn.cc (gcn_spill_class): Allow the exec register to be
saved in SGPRs.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/cond_fmaxnm_1.c: New test.
* gcc.target/gcn/cond_fmaxnm_1_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_2.c: New test.
* gcc.target/gcn/cond_fmaxnm_2_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_3.c: New test.
* gcc.target/gcn/cond_fmaxnm_3_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_4.c: New test.
* gcc.target/gcn/cond_fmaxnm_4_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_5.c: New test.
* gcc.target/gcn/cond_fmaxnm_5_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_6.c: New test.
* gcc.target/gcn/cond_fmaxnm_6_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_7.c: New test.
* gcc.target/gcn/cond_fmaxnm_7_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_8.c: New test.
* gcc.target/gcn/cond_fmaxnm_8_run.c: New test.
* gcc.target/gcn/cond_fminnm_1.c: New test.
* gcc.target/gcn/cond_fminnm_1_run.c: New test.
* gcc.target/gcn/cond_fminnm_2.c: New test.
* gcc.target/gcn/cond_fminnm_2_run.c: New test.
* gcc.target/gcn/cond_fminnm_3.c: New test.
* gcc.target/gcn/cond_fminnm_3_run.c: New test.
* gcc.target/gcn/cond_fminnm_4.c: New test.
* gcc.target/gcn/cond_fminnm_4_run.c: New test.
* gcc.target/gcn/cond_fminnm_5.c: New test.
* gcc.target/gcn/cond_fminnm_5_run.c: New test.
* gcc.target/gcn/cond_fminnm_6.c: New test.
* gcc.target/gcn/cond_fminnm_6_run.c: New test.
* gcc.target/gcn/cond_fminnm_7.c: New test.
* gcc.target/gcn/cond_fminnm_7_run.c: New test.
* gcc.target/gcn/cond_fminnm_8.c: New test.
* gcc.target/gcn/cond_fminnm_8_run.c: New test.
* gcc.target/gcn/cond_smax_1.c: New test.
* gcc.target/gcn/cond_smax_1_run.c: New test.
* gcc.target/gcn/cond_smin_1.c: New test.
* gcc.target/gcn/cond_smin_1_run.c: New test.
* gcc.target/gcn/cond_umax_1.c: New test.
* gcc.target/gcn/cond_umax_1_run.c: New test.
* gcc.target/gcn/cond_umin_1.c: New test.
* gcc.target/gcn/cond_umin_1_run.c: New test.
* gcc.target/gcn/smax_1.c: New test.
* gcc.target/gcn/smax_1_run.c: New test.
* gcc.target/gcn/smin_1.c: New test.
* gcc.target/gcn/smin_1_run.c: New test.
* gcc.target/gcn/umax_1.c: New test.
* gcc.target/gcn/umax_1_run.c: New test.
* gcc.target/gcn/umin_1.c: New test.
* gcc.target/gcn/umin_1_run.c: New test.
Tobias Burnus [Wed, 1 Mar 2023 14:18:40 +0000 (15:18 +0100)]
OpenMP/Fortran: Fix handling of optional is_device_ptr + bind(C) [PR108546]
For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).
Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.
PR middle-end/108546
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.
Tobias Burnus [Mon, 27 Feb 2023 11:47:54 +0000 (12:47 +0100)]
Update dg-dump-scan for "Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
Follow-up to commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
updating the dumps.
* For the goacc testcase, 'to' changed to 'release' and due to 'finally' then
to 'delete', which can be regarded as bugfix.
* For pr78260-2.f90, the calculation moved inside the 'if(...->data == NULL)'
block to handle deferred-string length vars better, esp. when 'optional'.
Martin Liska [Fri, 17 Feb 2023 14:11:02 +0000 (15:11 +0100)]
asan: adjust module name for global variables
As mentioned in the PR, when we use LTO, we wrongly use ltrans output
file name as a module name of a global variable. That leads to a
non-reproducible output.
After the suggested change, we emit context name of normal global
variables. And for artificial variables (like .Lubsan_data3), we use
aux_base_name (e.g. "./a.ltrans0.ltrans").
PR sanitizer/108834
gcc/ChangeLog:
* asan.cc (asan_add_global): Use proper TU name for normal
global variables (and aux_base_name for the artificial one).
gcc/testsuite/ChangeLog:
* c-c++-common/asan/global-overflow-1.c: Test line and column
info for a global variable.
Kewen Lin [Tue, 14 Feb 2023 02:03:26 +0000 (20:03 -0600)]
rs6000/test: Adjust some test cases on partial vector [PR96373]
As Richard pointed out in [1] and the testing on Power10, the
proposed fix for PR96373 requires some updates on a few rs6000
test cases which adopt partial vector. This patch is to fix
all of them with one extra option "-fno-trapping-math" as
Richard suggested.
Besides, the original test case also failed on Power10 without
Richard's proposed fix, this patch adds it together for a bit
better testing coverage.
Richard Biener [Thu, 23 Feb 2023 10:03:03 +0000 (11:03 +0100)]
tree-optimization/108888 - call if-conversion
The following makes sure to only predicate calls necessary.
PR tree-optimization/108888
* tree-if-conv.cc (if_convertible_stmt_p): Set PLF_2 on
calls to predicate.
(predicate_statements): Only predicate calls with PLF_2.
Andrew Stubbs [Thu, 28 Jul 2022 15:07:22 +0000 (16:07 +0100)]
vect: inbranch SIMD clones
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be no regressions.
The sub-set of support should cover all cases needed by amdgcn, at present.
gcc/ChangeLog:
* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.
Tobias Burnus [Wed, 22 Feb 2023 20:18:33 +0000 (21:18 +0100)]
Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings
Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.
The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.
This is the OG12 variant of the submitted but unreviewed GCC 13/mainline
patch at https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612387.html
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.
Tobias Burnus [Wed, 22 Feb 2023 11:35:29 +0000 (12:35 +0100)]
Fix: Fortran/OpenMP: align/allocator modifiers to the allocate clause
When merging r13-4584-gb2e1c49b4a4 to OG12 as commit 58e0579ed87,
the 'align' handling seemingly ended up in the wrong clause.
(Result: libgomp.fortran/allocate-2a.f90 FAILED; now fixed.)
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Move align modifier
handling from OMP_LIST_ALLOCATOR to OMP_LIST_ALLOCATE.
Jonathan Wakely [Wed, 31 Aug 2022 12:57:34 +0000 (13:57 +0100)]
libstdc++: Add noexcept-specifier to std::reference_wrapper::operator()
This isn't required by the standard, but there's an LWG issue suggesting
to add it.
Also use __invoke_result instead of result_of, to match the spec in
recent standards.
libstdc++-v3/ChangeLog:
* include/bits/refwrap.h (reference_wrapper::operator()): Add
noexcept-specifier and use __invoke_result instead of result_of.
* testsuite/20_util/reference_wrapper/invoke-noexcept.cc: New test.
Jonathan Wakely [Thu, 2 Feb 2023 14:06:40 +0000 (14:06 +0000)]
libstdc++: Fix std::filesystem errors with -fkeep-inline-functions [PR108636]
With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internsl. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.
Marek Polacek [Thu, 16 Feb 2023 22:41:24 +0000 (17:41 -0500)]
c++: ICE with redundant capture [PR108829]
Here we crash in is_capture_proxy:
/* Location wrappers should be stripped or otherwise handled by the
caller before using this predicate. */
gcc_checking_assert (!location_wrapper_p (decl));
We only crash with the redundant capture:
int abyPage = [=, abyPage] { ... }
because prune_lambda_captures is only called when there was a default
capture, and with [=] only abyPage won't be in LAMBDA_EXPR_CAPTURE_LIST.
The problem is that LAMBDA_CAPTURE_EXPLICIT_P wasn't propagated
correctly and so var_to_maybe_prune proceeded where it shouldn't.
Co-Authored by: Patrick Palka <ppalka@redhat.com>
PR c++/108829
gcc/cp/ChangeLog:
* pt.cc (prepend_one_capture): Set LAMBDA_CAPTURE_EXPLICIT_P.
(tsubst_lambda_expr): Pass LAMBDA_CAPTURE_EXPLICIT_P to
prepend_one_capture.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-108829-2.C: New test.
* g++.dg/cpp0x/lambda/lambda-108829.C: New test.
Alex Coplan [Mon, 6 Feb 2023 14:32:21 +0000 (14:32 +0000)]
aarch64: Fix up bfmlal lane pattern [PR104921]
As the testcase shows, this pattern had an incorrect constraint leading
to GCC's output getting rejected by the assembler.
This patch fixes the constraint accordingly.
The test is split into two: one that can run without bf16 support from
the assembler and another that checks that the output actually assembles
when such support is available.
gcc/ChangeLog:
PR target/104921
* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
Use correct constraint for operand 3.
gcc/testsuite/ChangeLog:
PR target/104921
* gcc.target/aarch64/pr104921-1.c: New test.
* gcc.target/aarch64/pr104921-2.c: New test.
* gcc.target/aarch64/pr104921.x: Include file for new tests.
Xi Ruoyao [Mon, 13 Feb 2023 10:38:53 +0000 (18:38 +0800)]
LoongArch: Fix multiarch tuple canonization
Multiarch tuple will be coded in file or directory names in
multiarch-aware distros, so one ABI should have only one multiarch
tuple. For example, "--target=loongarch64-linux-gnu --with-abi=lp64s"
and "--target=loongarch64-linux-gnusf" should both set multiarch tuple
to "loongarch64-linux-gnusf". Before this commit,
"--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib"
will produce wrong result (loongarch64-linux-gnu).
A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be
used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some
non-technical reason [1]. Note that we cannot make
"loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because
to implement such an alias, we must create thousands of symlinks in the
distro and doing so would be completely unpractical. This commit also
aligns GCC with the revision.
Tested by building cross compilers with --enable-multiarch and multiple
combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d},
and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then
manually verify the result with eyesight.
* config.gcc (triplet_abi): Set its value based on $with_abi,
instead of $target.
(la_canonical_triplet): Set it after $triplet_abi is set
correctly.
* config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the
multiarch tuple for lp64d "loongarch64-linux-gnu" (without
"f64" suffix).
Thomas Schwinge [Thu, 16 Feb 2023 14:57:37 +0000 (15:57 +0100)]
Attempt to register OpenMP pinned memory using a device instead of 'mlock'
Implemented for nvptx offloading via 'cuMemHostRegister'. This means: (a) not
running into 'mlock' limitations, and (b) the device is aware of this and may
optimize host <-> device memory transfers.
Thomas Schwinge [Thu, 16 Feb 2023 20:59:55 +0000 (21:59 +0100)]
Un-break nvptx libgomp build
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: invalid preprocessing directive #deine; did you mean #define?
52 | #deine BASIC_ALLOC_YIELD
| ^~~~~
| define
* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_not_equal_to, _SimdImplX86::_S_less)
(_SimdImplX86::_S_less_equal): Do not call
__builtin_is_constant_evaluated in constexpr-if.