libgcc/
* config/gcn/simd-math/amdgcnmach.h (VECTOR_RETURN): Store value of
return value in a local variable first.
* config/gcn/simd-math/v64df_ilogb.c (ilogb): Simplify.
Please note that(insn 1110) uses register BX, but its
initialization was eliminated.
Avoid conversion if eliminated move intialized a register, used
in the moved instruction.
2022-11-03 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/107404
* config/i386/i386.md (eliminate reg-reg move by inverting the
condition of a cmove #2 peephole2): Check if eliminated move
initialized a register, used in the moved instruction.
gcc/testsuite/ChangeLog:
PR target/107404
* g++.target/i386/pr107404.C: New test.
Jakub Jelinek [Mon, 24 Oct 2022 15:53:16 +0000 (17:53 +0200)]
c, c++: Fix up excess precision handling of scalar_to_vector conversion [PR107358]
As mentioned earlier in the C++ excess precision support mail, the following
testcase is broken with excess precision both in C and C++ (though just in C++
it was triggered in real-world code).
scalar_to_vector is called in both FEs after the excess precision promotions
(or stripping of EXCESS_PRECISION_EXPR), so we can then get invalid
diagnostics that say float vector + float involves truncation (on ia32
from long double to float).
The following patch fixes that by calling scalar_to_vector on the operands
before the excess precision promotions, let scalar_to_vector just do the
diagnostics (it does e.g. fold_for_warn so it will fold
EXCESS_PRECISION_EXPR around REAL_CST to constants etc.) but will then
do the actual conversions using the excess precision promoted operands
(so say if we have vector double + (float + float) we don't actually do
vector double + (float) ((long double) float + (long double) float)
but
vector double + (double) ((long double) float + (long double) float)
2022-10-24 Jakub Jelinek <jakub@redhat.com>
PR c++/107358
gcc/c/
* c-typeck.cc (build_binary_op): Pass operands before excess precision
promotions to scalar_to_vector call.
gcc/testsuite/
* c-c++-common/pr107358.c: New test.
Jakub Jelinek [Mon, 24 Oct 2022 14:25:29 +0000 (16:25 +0200)]
c++: Fix up constexpr handling of char/signed char/short pre/post inc/decrement [PR105774]
signed char, char or short int pre/post inc/decrement are represented by
normal {PRE,POST}_{INC,DEC}REMENT_EXPRs in the FE and only gimplification
ensures that the {PLUS,MINUS}_EXPR is done in unsigned version of those
types:
case PREINCREMENT_EXPR:
case PREDECREMENT_EXPR:
case POSTINCREMENT_EXPR:
case POSTDECREMENT_EXPR:
{
tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
if (INTEGRAL_TYPE_P (type) && c_promoting_integer_type_p (type))
{
if (!TYPE_OVERFLOW_WRAPS (type))
type = unsigned_type_for (type);
return gimplify_self_mod_expr (expr_p, pre_p, post_p, 1, type);
}
break;
}
This means during constant evaluation we need to do it similarly (either
using unsigned_type_for or using widening to integer_type_node).
The following patch does the latter.
2022-10-24 Jakub Jelinek <jakub@redhat.com>
PR c++/105774
* constexpr.cc (cxx_eval_increment_expression): For signed types
that promote to int, evaluate PLUS_EXPR or MINUS_EXPR in int type.
Jakub Jelinek [Wed, 12 Oct 2022 15:54:08 +0000 (17:54 +0200)]
libgomp: Fix up creation of artificial teams
When not in explicit parallel/target/teams construct, we in some cases create
an artificial parallel with a single thread (either to handle target nowait
or for task reduction purposes). In those cases, it handled again artificially
created implicit task (created by gomp_new_icv for cases where we needed to write
to some ICVs), but as the testcases show, didn't take into account possibility
of this being done from explicit task(s). The code would destroy/free the previous
task and replace it with the new implicit task. If task is an explicit task
(when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to
a local stack variable, so freeing it doesn't work, and additionally we shouldn't
lose the explicit tasks - the new implicit task should instead replace the
ancestor task which is the first implicit one.
2022-10-12 Jakub Jelinek <jakub@redhat.com>
* task.c (gomp_create_artificial_team): Fix up handling of invocations
from within explicit task.
* target.c (GOMP_target_ext): Likewise.
* testsuite/libgomp.c/task-7.c: New test.
* testsuite/libgomp.c/task-8.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-17.c: New test.
* testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
Jakub Jelinek [Sat, 24 Sep 2022 07:24:26 +0000 (09:24 +0200)]
openmp: Fix ICE with taskgroup at -O0 -fexceptions [PR107001]
The following testcase ICEs because with -O0 -fexceptions GOMP_taskgroup_end
call isn't directly followed by GOMP_RETURN statement, but there are some
conditionals to handle exceptions and we fail to find the correct GOMP_RETURN.
The fix is to treat taskgroup similarly to target data, both of these constructs
emit a try { body } finally { end_call } around the construct's body during
gimplification and we need to see proper construct nesting during gimplification
and omp lowering (including nesting of regions checks), but during omp expansion
we don't really need their nesting anymore, all we need is emit something at
the start of the region and the end of the region is the end API call we've
already emitted during gimplification. For target data, we weren't adding
GOMP_RETURN statement during omp lowering, so after that pass it is treated
merely like stand-alone omp directives. This patch does the same for
taskgroup too.
2022-09-24 Jakub Jelinek <jakub@redhat.com>
PR c/107001
* omp-low.cc (lower_omp_taskgroup): Don't add GOMP_RETURN statement
at the end.
* omp-expand.cc (build_omp_regions_1): Clarify GF_OMP_TARGET_KIND_DATA
is not stand-alone directive. For GIMPLE_OMP_TASKGROUP, also don't
update parent.
(omp_make_gimple_edges) <case GIMPLE_OMP_TASKGROUP>: Reset
cur_region back after new_omp_region.
Jakub Jelinek [Sat, 24 Sep 2022 07:19:26 +0000 (09:19 +0200)]
openmp, c: Tighten up c_tree_equal [PR106981]
This patch changes c_tree_equal to work more like cp_tree_equal, be
more strict in what it accepts. The ICE on the first testcase was
due to INTEGER_CST wi::wide (t1) == wi::wide (t2) comparison which
ICEs if the two constants have different precision, but as the second
testcase shows, being too lenient in it can also lead to miscompilation
of valid OpenMP programs where we think certain expression is the same
even when it isn't and can be guaranteed at runtime to represent different
memory location. So, the patch looks through only NON_LVALUE_EXPRs
and for constants as well as casts requires that the types match before
actually comparing the constant values or recursing on the cast operands.
2022-09-24 Jakub Jelinek <jakub@redhat.com>
PR c/106981
gcc/c/
* c-typeck.cc (c_tree_equal): Only strip NON_LVALUE_EXPRs at the
start. For CONSTANT_CLASS_P or CASE_CONVERT: return false if t1 and
t2 have different types.
gcc/testsuite/
* c-c++-common/gomp/pr106981.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/pr106981.c: New test.
Jakub Jelinek [Wed, 7 Sep 2022 06:54:13 +0000 (08:54 +0200)]
openmp: Fix handling of target constructs in static member functions [PR106829]
Just calling current_nonlambda_class_type in static member functions returns
non-NULL, but something that isn't *this and if unlucky can match part of the
IL and can be added to target clauses.
if (DECL_NONSTATIC_MEMBER_P (decl)
&& current_class_ptr)
is a guard used elsewhere (in check_accessibility_of_qualified_id).
2022-09-07 Jakub Jelinek <jakub@redhat.com>
PR c++/106829
* semantics.cc (finish_omp_target_clauses): If current_function_decl
isn't a nonstatic member function, don't set data.current_object to
non-NULL.
... which is 'libgomp.oacc-fortran/declare-allocatable-1.f90' adjusted
for missing support for OpenACC "Changes from Version 2.0 to 2.5":
"The 'declare create' directive with a Fortran 'allocatable' has new behavior".
Thus, after 'allocate'/before 'deallocate', call 'acc_create'/'acc_delete'
manually.
Tobias Burnus [Wed, 2 Nov 2022 16:04:53 +0000 (17:04 +0100)]
Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr
When using 'map(alloc: var, dt%comp)' needs to have a 'to' mapping of
the array descriptor as otherwise the bounds are not available in the
target region. - Likewise for character strings.
This patch implements this; however, some additional issues are exposed
by the testcase; those are '#if 0'ed and will be handled later.
Submitted to mainline (but pending review):
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604887.html
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Ensure DT struct-comp with
array descriptor and 'alloc:' have the descriptor mapped with 'to:'.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.
Tobias Burnus [Wed, 2 Nov 2022 08:06:28 +0000 (09:06 +0100)]
OpenMP/Fortran: 'target update' with strides + DT components
OpenMP 5.0 permits to use arrays with strides and derived
type components for the list items to the 'from'/'to' clauses
of the 'target update' directive.
Submitted to mainline (but pending review):
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604687.html
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_clauses): Permit derived types.
(resolve_omp_clauses):Accept noncontiguous
arrays.
* trans-openmp.cc (gfc_trans_omp_clauses): Fixes for
derived-type changes; fix size for scalars.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-11.f90: New test.
* testsuite/libgomp.fortran/target-13.f90: New test.
openmp: Bugfix in omp_expand_metadirective for same blocks/edges to be deleted.
This patch handles an ICE that is thrown in omp_expand_metadirective when a
basic_block for a metadirective label is tried to be deleted multiple times.
To avoid this situation, processed labels are added to the already existing
list of labels that are not intended to be deleted.
The issue occured in the attached test case.
gcc/ChangeLog:
* omp-expand-metadirective.cc (omp_expand_metadirective): Add already
processed labels to "labels" (the list of labels not to be deleted).
Andrew Stubbs [Fri, 28 Oct 2022 12:09:20 +0000 (13:09 +0100)]
amdgcn: add fmin/fmax patterns
Add fmin/fmax for scalar, vector, and reductions. The smin/smax patterns are
already using the IEEE compliant hardware instructions anyway, so we can just
expand to use those insns.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (fminmaxop): New iterator.
(<fexpander><mode>3): New define_expand.
(<fexpander><mode>3<exec>): Likewise.
(reduc_<fexpander>_scal_<mode>): Likewise.
* config/gcn/gcn.md (fexpander): New attribute.
Andrew Stubbs [Fri, 28 Oct 2022 11:38:43 +0000 (12:38 +0100)]
amdgcn: multi-size vector reductions
Add support for vector reductions for any vector width by switching iterators
and generalising the code slightly. There's no one-instruction way to move an
item from lane 31 to lane 0 (63, 15, 7, 3, and 1 are all fine though), and
vec_extract is probably fewer cycles anyway, so now we always reduce to an
SGPR.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (V64_SI): Delete iterator.
(V64_DI): Likewise.
(V64_1REG): Likewise.
(V64_INT_1REG): Likewise.
(V64_2REG): Likewise.
(V64_ALL): Likewise.
(V64_FP): Likewise.
(reduc_<reduc_op>_scal_<mode>): Use V_ALL. Use gen_vec_extract.
(fold_left_plus_<mode>): Use V_FP.
(*<reduc_op>_dpp_shr_<mode>): Use V_1REG.
(*<reduc_op>_dpp_shr_<mode>): Use V_DI.
(*plus_carry_dpp_shr_<mode>): Use V_INT_1REG.
(*plus_carry_in_dpp_shr_<mode>): Use V_SI.
(*plus_carry_dpp_shr_<mode>): Use V_DI.
(mov_from_lane63_<mode>): Delete.
(mov_from_lane63_<mode>): Delete.
* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Support partial vectors.
* config/gcn/gcn.md (unspec): Remove UNSPEC_MOV_FROM_LANE63.
Fortran: Add missing TKR initialization to class variables [PR100097, PR100098]
gcc/fortran/ChangeLog:
PR fortran/100097
PR fortran/100098
* trans-array.cc (gfc_trans_class_array): New function to
initialize class descriptor's TKR information.
* trans-array.h (gfc_trans_class_array): Add function prototype.
* trans-decl.cc (gfc_trans_deferred_vars): Add calls to the new
function for both pointers and allocatables.
gcc/testsuite/ChangeLog:
PR fortran/100097
PR fortran/100098
* gfortran.dg/PR100097.f90: New test.
* gfortran.dg/PR100098.f90: New test.
Jakub Jelinek [Fri, 28 Oct 2022 09:03:56 +0000 (11:03 +0200)]
openmp: Allow optional comma after directive-specifier in C/C++
Previously we've been allowing that comma only in C++ when in attribute
form (which was the reason why it has been allowed), but 5.1 allows that
even in pragma form in C/C++ (with clarifications in 5.2) and 5.2
also in Fortran (which this patch doesn't implement).
Note, for directives which take an argument (== unnamed clause),
comma is not allowed in between the directive name and the argument,
like the directive-1.c testcase shows.
2022-10-28 Jakub Jelinek <jakub@redhat.com>
gcc/c/
* c-parser.cc (c_parser_omp_all_clauses): Allow optional
comma before the first clause.
(c_parser_omp_allocate, c_parser_omp_atomic, c_parser_omp_depobj,
c_parser_omp_flush, c_parser_omp_scan_loop_body,
c_parser_omp_ordered, c_finish_omp_declare_variant,
c_parser_omp_declare_target, c_parser_omp_declare_reduction,
c_parser_omp_requires, c_parser_omp_error,
c_parser_omp_assumption_clauses): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_all_clauses): Allow optional comma
before the first clause even in pragma syntax.
(cp_parser_omp_allocate, cp_parser_omp_atomic, cp_parser_omp_depobj,
cp_parser_omp_flush, cp_parser_omp_scan_loop_body,
cp_parser_omp_ordered, cp_parser_omp_assumption_clauses,
cp_finish_omp_declare_variant, cp_parser_omp_declare_target,
cp_parser_omp_declare_reduction_exprs, cp_parser_omp_requires,
cp_parser_omp_error): Likewise.
gcc/testsuite/
* c-c++-common/gomp/directive-1.c: New test.
* c-c++-common/gomp/clauses-6.c: New test.
* c-c++-common/gomp/declare-variant-2.c (f75a): Declare.
(f75): Use f75a as variant instead of f1 and don't expect error.
* g++.dg/gomp/clause-4.C (foo): Don't expect error on comma
before first clause.
* gcc.dg/gomp/clause-2.c (foo): Likewise.
This patch prevents compiler-generated artificial variables from being
treated as privatization candidates for OpenACC.
The rationale is that e.g. "gang-private" variables actually must be
shared by each worker and vector spawned within a particular gang, but
that sharing is not necessary for any compiler-generated variable (at
least at present, but no such need is anticipated either). Variables on
the stack (and machine registers) are already private per-"thread"
(gang, worker and/or vector), and that's fine for artificial variables.
We're restricting this to blocks, as we still need to understand what it
means for a 'DECL_ARTIFICIAL' to appear in a 'private' clause.
Several tests need their scan output patterns adjusted to compensate.
2022-10-14 Julian Brown <julian@codesourcery.com>
PR middle-end/90115
gcc/
* omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not
privatization candidates.
Thomas Schwinge [Tue, 18 Oct 2022 14:59:54 +0000 (16:59 +0200)]
[og12] OpenACC: Don't gang-privatize artificial variables: restrict to blocks
Follow-up to og12 commit d4504346d2a1d6ffecb8b2d8e3e04ab8ea259785
"[og12] OpenACC: Don't gang-privatize artificial variables", to restore
the previous behavior, until we understand what it means for a
'DECL_ARTIFICIAL' to appear in a 'private' clause.
In file included from [...]/source-gcc/gcc/coretypes.h:482,
from [...]/source-gcc/gcc/omp-low.cc:27:
[...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly<Ca, bool>::type maybe_lt(const Ca&, const poly_int_pod<N, Cb>&) [with unsigned int N = 1; Ca = int; Cb = long unsigned int; typename if_nonpoly<Ca, bool>::type = bool]’:
[...]/source-gcc/gcc/poly-int.h:1510:7: required from ‘poly_int<N, typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type> ordered_max(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type = long unsigned int; typename if_nonpoly<Cb>::type = int]’
[...]/source-gcc/gcc/omp-low.cc:5180:33: required from here
[...]/source-gcc/gcc/poly-int.h:1384:12: error: comparison of integer expressions of different signedness: ‘const int’ and ‘const long unsigned int’ [-Werror=sign-compare]
1384 | return a < b.coeffs[0];
| ~~^~~~~~~~~~~
[...]/source-gcc/gcc/poly-int.h: In instantiation of ‘typename if_nonpoly<Cb, bool>::type maybe_lt(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename if_nonpoly<Cb, bool>::type = bool]’:
[...]/source-gcc/gcc/poly-int.h:1515:2: required from ‘poly_int<N, typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type> ordered_max(const poly_int_pod<N, C>&, const Cb&) [with unsigned int N = 1; Ca = long unsigned int; Cb = int; typename poly_result<Ca, typename if_nonpoly<Cb>::type>::type = long unsigned int; typename if_nonpoly<Cb>::type = int]’
[...]/source-gcc/gcc/omp-low.cc:5180:33: required from here
[...]/source-gcc/gcc/poly-int.h:1373:22: error: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const int’ [-Werror=sign-compare]
1373 | return a.coeffs[0] < b;
| ~~~~~~~~~~~~^~~
gcc/
* omp-low.cc (lower_rec_simd_input_clauses): For 'ordered_max',
cast 'omp_max_simt_vf ()', 'omp_max_simd_vf ()' to 'unsigned'.
PASS: gcc.dg/vect/bb-slp-cond-1.c (test for excess errors)
PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects scan-tree-dump vect "(no need for alias check [^\\n]* when VF is 1|no alias between [^\\n]* when [^\\n]* is outside \\(-16, 16\\))"
[-PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "loop vectorized" 1-]
PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects (test for excess errors)
PASS: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects execution test
PASS: gcc.dg/vect/bb-slp-cond-1.c execution test
PASS: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump vect "(no need for alias check [^\\n]* when VF is 1|no alias between [^\\n]* when [^\\n]* is outside \\(-16, 16\\))"
[-PASS: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times vect "loop vectorized" 1-]
openacc: Revert erroneous gang reduction change in OpenAcc test
This patch reverts an erroneous modification of an OpenAcc test case in d27d6c9e1e3bc18ba0113757b743b306ea69f825 "Various OpenACC reduction
enhancements - test cases".
In commit 081c96621da, the call to resize_reg_info() was moved before
the call to remove_scratches() and the latter one can increase the
number of regs and that would cause an out of bounds usage on the
reg_renumber global array.
Without this patch, the following testcase randomly fails with:
during RTL pass: ira
In file included from /src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:13:
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c: In function 'checkgSf13':
/src/gcc/testsuite/gcc.dg/compat/fp-struct-test-by-value-y.h:28:1: internal compiler error: Segmentation fault
/src/gcc/testsuite/gcc.dg/compat/struct-by-value-5b_y.c:22:1: note: in expansion of macro 'TEST'
gcc/ChangeLog:
* ira.cc: Resize array after reg number increased.
Philipp Tomsich [Sun, 7 Aug 2022 22:30:52 +0000 (00:30 +0200)]
aarch64: update Ampere-1 core definition
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.
Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').
Philipp Tomsich [Mon, 3 Oct 2022 19:59:50 +0000 (21:59 +0200)]
aarch64: fix off-by-one in reading cpuinfo
Fixes: 341573406b39
Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string. This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.
IBM zSystems: Fix function_ok_for_sibcall [PR106355]
For a parameter with BLKmode we cannot use REG_NREGS in order to
determine the number of consecutive registers. Streamlined this with
the implementation of s390_function_arg.
Fix some indentation whitespace, too.
gcc/ChangeLog:
PR target/106355
* config/s390/s390.cc (s390_call_saved_register_used): For a
parameter with BLKmode fix determining number of consecutive
registers.
gcc/testsuite/ChangeLog:
* gcc.target/s390/pr106355.h: Common code for new tests.
* gcc.target/s390/pr106355-1.c: New test.
* gcc.target/s390/pr106355-2.c: New test.
* gcc.target/s390/pr106355-3.c: New test.
Martin Liska [Mon, 24 Oct 2022 13:34:39 +0000 (15:34 +0200)]
x86: fix VENDOR_MAX enum value
PR target/107364
gcc/ChangeLog:
* common/config/i386/i386-cpuinfo.h (enum processor_vendor):
Reorder enum values as BUILTIN_VENDOR_MAX should not point
in the middle of the valid enum values.
Thomas Schwinge [Mon, 24 Oct 2022 19:59:37 +0000 (21:59 +0200)]
libgomp/nvptx: Prepare for reverse-offload callback handling, resolve spurious SIGSEGVs
Per commit r13-3460-g131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
I'm seeing a lot of libgomp execution test regressions. Random
example, 'libgomp.c-c++-common/error-1.c':
Andrew Stubbs [Fri, 21 Oct 2022 13:19:31 +0000 (14:19 +0100)]
vect: WORKAROUND vectorizer bug
This patch disables vectorization of memory accesses to non-default address
spaces where the pointer size is different to the usual pointer size. This
condition typically occurs in OpenACC programs on amdgcn, where LDS memory is
used for broadcasting gang-private variables between threads. In particular,
see libgomp.oacc-c-c++-common/private-variables.c
The problem is that the address space information is dropped from the various
types in the middle-end and eventually it triggers an ICE trying to do an
address conversion. That ICE can be avoided by defining
POINTERS_EXTEND_UNSIGNED, but that just produces wrong RTL code later on.
A correct solution would ensure that all the vectypes have the correct address
spaces, but I don't have time for that right now.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_analyze_data_refs): Workaround an
address-space bug.
Andrew Stubbs [Sat, 15 Oct 2022 22:38:50 +0000 (23:38 +0100)]
amdgcn, libgomp: USM allocation update
Allocate Unified Shared Memory via malloc and hsa_amd_svm_attributes_set,
instead of hsa_allocate_memory. This scheme should be more efficient for
for memory that is first accessed by the CPU.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED): New.
(HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): New.
(HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG): New.
(HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED): New.
(hsa_amd_svm_attribute_pair_t): New.
(struct hsa_runtime_fn_info): Add hsa_amd_svm_attributes_set_fn.
(dump_hsa_system_info): Dump HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED and
HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT.
(DLSYM_OPT_FN): New.
(init_hsa_runtime_functions): Add hsa_amd_svm_attributes_set.
(GOMP_OFFLOAD_usm_alloc): Use malloc and hsa_amd_svm_attributes_set.
(GOMP_OFFLOAD_usm_free): Use regular free.
* testsuite/libgomp.c/usm-1.c: Add -mxnack=on for amdgcn.
* testsuite/libgomp.c/usm-2.c: Likewise.
* testsuite/libgomp.c/usm-3.c: Likewise.
* testsuite/libgomp.c/usm-4.c: Likewise.
Tobias Burnus [Mon, 24 Oct 2022 15:16:29 +0000 (17:16 +0200)]
libgomp/nvptx: Prepare for reverse-offload callback handling
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.
Tobias Burnus [Mon, 24 Oct 2022 14:50:25 +0000 (16:50 +0200)]
gcc/testsuite: Change 'cunrolli' to 'cunrolli1' in dump scan + options
The OG12 commit 3e8b51d143e openacc: Move pass_oacc_device_lower after pass_graphite
adds a new pass, which also re-invokes some previous passes. This
seems to have the effect that the pass names and, hence, the dump
names have now a tailing number.
In that commit, 'cunrolli' was changed to 'cunrolli1 for some testcases.
This commit does likewise for some more testcases. In particular,
. in scan-tree-dump the tailing '1' is crucial to change UNRESOLVED
to PASS.
. for "-fdisable-tree-cunrolli1" option, it changes FAIL (excess errors)
to PASS
. even without the change, "-fdump-tree-cunrolli{-details,-optimized}"
has PASS, but I believe the tailing 1 ensures that only the first
'cunrolli' dumps.
For 'target parallel' and similarly nested directives, cgraph_node's
calls_declare_variant_alt was not set in the parent region node but in
cfun->decl. Hence, pass_omp_device_lower did not process handle the
internal function GOMP_TARGET_REV. - Solution is to set it to the
DECL_CONTEXT, which is set in adjust_context_and_scope.
The cgraph_node::create_clone issue is exposed with -O2 for the existing
libgomp.fortran/reverse-offload-1.f90.
PR middle-end/107236
gcc/ChangeLog:
* omp-expand.cc (expand_omp_target): Set calls_declare_variant_alt
in DECL_CONTEXT and not to cfun->decl.
* cgraphclones.cc (cgraph_node::create_clone): Copy also the
node's calls_declare_variant_alt value.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/target-device-ancestor-6.f90: New test.
Tobias Burnus [Fri, 21 Oct 2022 13:31:25 +0000 (15:31 +0200)]
omp-oacc-kernels-decompose.cc: fix -fcompare-debug with GIMPLE_DEBUG
GIMPLE_DEBUG were put in a parallel region of its own, which is not
only pointless but also breaks -fcompare-debug. With this commit,
they are handled like simple assignments: those placed are places
into the same body as the loop such that only one parallel region
remains as without debugging. This fixes the existing testcase
libgomp.oacc-c-c++-common/kernels-loop-g.c.
Note: GIMPLE_DEBUG are only accepted with -fcompare-debug; if they
appear otherwise, decompose_kernels_region_body rejects them with
a sorry (unchanged).
gcc/
* omp-oacc-kernels-decompose.cc (top_level_omp_for_in_stmt,
decompose_kernels_region_body): Handle GIMPLE_DEBUG like
simple assignment.
After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]",
"big" private data now works for GCN offloading, too.
Julian Brown [Fri, 14 Oct 2022 11:06:07 +0000 (11:06 +0000)]
amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]
The GCN backend uses a heuristic to determine whether to use FLAT or
GLOBAL addressing in a particular (offload) function: namely, if a
function takes a pointer-to-scalar parameter, it is assumed that the
pointer may refer to "flat scratch" space, and thus FLAT addressing must
be used instead of GLOBAL.
I came up with this heuristic initially whilst working on support for
moving OpenACC gang-private variables into local-data share (scratch)
memory. The assumption that only scalar variables would be transformed in
that way turned out to be wrong. For example, prior to the next patch in
the series, Fortran compiler-generated temporary structures were treated
as gang private and moved to LDS space, typically overflowing the region
allocated for such variables. That will no longer happen after that
patch is applied, but there may be other cases of structs moving to LDS
space now or in the future that this patch may be needed for.
2022-10-14 Julian Brown <julian@codesourcery.com>
PR target/105421
gcc/
* config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer
argument forces FLAT addressing mode, not just
pointer-to-non-aggregate.
Richard Biener [Fri, 21 Oct 2022 07:45:44 +0000 (09:45 +0200)]
tree-optimization/107323 - loop distribution partition ordering issue
The following reverts part of the PR94125 fix which causes us to
use a bogus partition ordering after applying versioning for
alias to the testcase in PR107323. Instead PR94125 is fixed by
appropriately considering to be merged SCCs when skipping edges
we want to ignore because of the alias versioning.
PR tree-optimization/107323
* tree-loop-distribution.cc (pg_unmark_merged_alias_ddrs):
New function.
(loop_distribution::break_alias_scc_partitions): Revert
postorder save/restore from the PR94125 fix. Instead
make sure to not ignore edges from SCCs we are going to
merge.
Thomas Schwinge [Mon, 2 May 2022 13:15:26 +0000 (15:15 +0200)]
Make 'c-c++-common/goacc/kernels-decompose-pr100400-1-*.c' behave consistently, regardless of checking level
Fix-up for commit c14ea6a72fb1ae66e3d32ac8329558497c6e4403
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]".
For C++ compilation of 'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c',
we first emit a 'sorry' diagnostic, and then a 'gcc_unreachable' (or
'internal_error', see below) diagnostic, but for example, for
'--enable-checking=release' (thus, '!CHECKING_P'), the second one may actually
be turned into a 'confused by earlier errors, bailing out' diagnostic. (See
'gcc/diagnostic.cc:diagnostic_report_diagnostic': "When not checking, ICEs are
converted to fatal errors when an error has already occurred.") Thus, make
'c-c++-common/goacc/kernels-decompose-pr100400-1-2.c' behave consistently via
'-Wfatal-errors', and thus only matching the 'sorry' diagnostic.
For example, for '--enable-checking=no' (thus, '!ENABLE_ASSERT_CHECKING'), a
call to 'gcc_unreachable' cannot be assumed emit an 'internal_error'-like
diagnostic, so explicitly call 'internal_error' in
'gcc/omp-oacc-kernels-decompose.cc:visit_loops_in_gang_single_region', in the
'GIMPLE_OMP_FOR' case, to avoid regressing
'c-c++-common/goacc/kernels-decompose-pr100400-1-3.c', and
'c-c++-common/goacc/kernels-decompose-pr100400-1-4.c'.
Bit of a brown-paper-bag bug, but: GCC was generating
non-existent merging forms of BRKAS and BRKBS. Those
instructions only support zero predication (although
BRKA and BRKB support both).
https://github.com/ARM-software/acle/pull/199 adds a new feature
macro for RCPC, for use in things like inline assembly. This patch
adds the associated support to GCC.
Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a
entry didn't include it. This was probably harmless in practice
since GCC simply ignored the extension until now. (The GAS
definition is OK.)
gcc/
* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add
AARCH64_FL_RCPC.
(AARCH64_ISA_RCPC): New macro.
* config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1)
(neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RCPC when appropriate.
Tobias Burnus [Thu, 20 Oct 2022 11:25:25 +0000 (13:25 +0200)]
libgomp.c-c++-common/requires-4.c: dg-xfail-run-if for USM with -foffload-memory=
The USM implementation uses -foffload-memory=... which allocates variables
in a special memory. This does not support static variables. Hence, XFAIL
this test on nvptx/gcn. The requires-4a.c testcase tests the same but uses
hash memory instead.
libgomp/
* testsuite/libgomp.c-c++-common/requires-4.c: dg-xfail-run-if on
nvptx and gcn.
Tobias Burnus [Thu, 20 Oct 2022 11:07:37 +0000 (13:07 +0200)]
libgomp: Add offload_device_gcn check, add requires-4a.c test
Duplicate libgomp.c-c++-common/requires-4.c (as ...-4a.c) but
with using a heap-allocated instead of static memory for a variable.
This change and the added offload_device_gcn check prepare for
pseudo-USM, where the device hardware cannot access all host
memory but only managed and pinned memory; for those, requires-4.c
will fail and the new check permits to add
target { ! { offload_device_nvptx || offload_device_gcn } }
to requires-4.c; however, it has not been added yet as pseuo-USM
support is not yet on mainline. (Review is pending for the USM
patches.)
include/ChangeLog:
* gomp-constants.h (GOMP_DEVICE_HSA): Comment out unused define.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (check_effective_target_offload_device_gcn):
New.
* testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_gcn,
on_device_arch_gcn): New.
* testsuite/libgomp.c-c++-common/requires-4a.c: New test; copied from
requires-4.c but using heap-allocated memory.
Alexandre Oliva [Wed, 22 Jun 2022 02:11:02 +0000 (23:11 -0300)]
libstdc++: eh_globals: gthreads: reset _S_init before deleting key
Clear __eh_globals_init's _S_init in the dtor before deleting the
gthread key.
This ensures that, in case any code involved in deleting the key
interacts with eh_globals, the key that is being deleted won't be
used, and the non-thread-specific eh_globals fallback will.
Tobias Burnus [Wed, 19 Oct 2022 15:31:14 +0000 (17:31 +0200)]
Fix omp-expand.cc's expand_omp_target for OpenACC
In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c,
"Fortran/OpenMP: Support mapping of DT with allocatable components"
the size of the addr/sizes/kind arrays was passed as 4th argument.
However, OpenACC uses >3 arguments for its own purpose, e.g. to
handle noncontiguous arrays by passing an array descriptor there.
This patch restores the previous behaviour for OpenACC, fixing
testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c.
gcc/
* omp-expand.cc (expand_omp_target): Fix OpenACC in case there
are more than 3 arguments to the builtin function.
Tobias Burnus [Wed, 19 Oct 2022 13:53:25 +0000 (15:53 +0200)]
Fortran: Fix delinearization regression
The delinearization patch "Fortran: delinearize multi-dimensional array
accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses
gfc_build_array_ref for the non-delinearization path. The generated
code depends on whether there can be negative strides or not, an
addition to that function in r12-8230-g7964ab6c364 - adding a Boolean
argument.
The follow-up OG12 commit "Fix Fortran array-access regressions", 9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument
to the call in gfc_conv_array_ref, but always evaluating as false.
This commit changes it to a call to non_negative_strides_array_p
(Note: for 'se->expr' not 'base'; the former could be 'arraydesc'
while the later is then 'arraydesc.data' whose TREE_TYPE does not
contain information about the array type.)
However, doing so revealed a bug in non_negative_strides_array_p,
fixed in this commit but also submitted as "Fortran: Fix
non_negative_strides_array_p" to mainline,
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html
As a side effect of this commit, several testcases now pass and the
OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90
could be undone, except that the latter now uses the delinearized
array syntax in one case, which is an improvement (as honored in
the scan-dump-tree). Hence, this commit (partially) reverts the
commits:
21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump 014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90} 2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update
The main testcase for non_negative_strides_array_p is
gfortran.dg/array_reference_3.f90, which now also passes as well.
Additionally, this changes prevents some unintended implicit
mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed
before - and now passes again.
Kewen Lin [Mon, 26 Sep 2022 05:33:18 +0000 (00:33 -0500)]
rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously. That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981. It
caused ICE due to unexpected NULL insn.
PR target/96072
gcc/ChangeLog:
* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.
Kewen Lin [Mon, 26 Sep 2022 03:01:50 +0000 (22:01 -0500)]
rs6000: Fix condition of define_expand vec_shr_<mode> [PR100645]
PR100645 exposes one latent bug in define_expand vec_shr_<mode>
that the current condition TARGET_ALTIVEC is too loose. The
mode iterator VEC_L contains a few modes, they are not always
supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
be used like some other VEC_L usages.
PR target/100645
gcc/ChangeLog:
* config/rs6000/vector.md (vec_shr_<mode>): Replace condition
TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P.