Patrick Palka [Thu, 15 Dec 2022 21:02:05 +0000 (16:02 -0500)]
c++: extract_local_specs and unevaluated contexts [PR100295]
Here during partial instantiation of the constexpr if, extra_local_specs
walks the statement looking for local specializations within to capture.
However, we're thwarted by the fact that 'ts' first appears inside an
unevaluated context, and so the calls to process_outer_var_ref for its
local specializations are a no-op. And since we walk each tree exactly
once, we end up not capturing the local specializations despite 'ts'
later occurring in an evaluated context.
This patch fixes this by making extract_local_specs walk evaluated
contexts first before walking unevaluated contexts. We could probably
get away with not walking unevaluated contexts at all, but this approach
seems more clearly safe.
PR c++/100295
PR c++/107579
gcc/cp/ChangeLog:
* pt.cc (el_data::skip_unevaluated_operands): New data member.
(extract_locals_r): If skip_unevaluated_operands is true,
don't walk into unevaluated contexts.
(extract_local_specs): Walk the pattern twice, first with
skip_unevaluated_operands true followed by it set to false.
Patrick Palka [Thu, 15 Dec 2022 20:38:47 +0000 (15:38 -0500)]
c++: partial ordering with memfn ptr cst [PR108104]
Here we're triggering an overzealous assert in unify during partial
ordering since the member function pointer constants are represented as
ordinary CONSTRUCTORs (with TYPE_PTRMEMFUNC_P TREE_TYPE) but the assert
expects COMPOUND_LITERAL_P constructors.
PR c++/108104
gcc/cp/ChangeLog:
* pt.cc (unify) <default>: Relax assert to accept any
CONSTRUCTOR parm, not just COMPOUND_LITERAL_P one.
Patrick Palka [Sun, 4 Dec 2022 15:47:24 +0000 (10:47 -0500)]
c++: pack in requires-expr parm list [PR107417]
Here find_parameter_packs_r isn't detecting the pack T inside the
requires-expr's parameter list ultimately because cp_walk_trees
deliberately avoids walking the list so as to avoid false positives in
the unexpanded pack checker.
But it should still be fine to walk the TREE_TYPE of each parameter,
which we already need to do from for_each_template_parm_r, and is
sufficient to fix the testcase.
PR c++/107417
gcc/cp/ChangeLog:
* pt.cc (for_each_template_parm_r) <case REQUIRES_EXPR>: Move
walking of the TREE_TYPE of each parameter to ...
* tree.cc (cp_walk_subtrees) <case REQUIRES_EXPR>: ... here.
We implement class-scope using enum by injecting clones of the enum's
CONST_DECLs as fields of the class, for which CONST_DECL_USING_P is
true, so that qualified lookup naturally finds the enumerators.
Substitution into such a CONST_DECL currently ICEs however, because we
assume the DECL_CONTEXT is always the ENUMERAL_TYPE (which has
TYPE_VALUES) but in this case it's the RECORD_TYPE for the class scope
(which has TYPE_FIELDS).
Since these CONST_DECLs appear to always be non-dependent, this patch
fixes this by shortcutting substitution for CONST_DECLs that have
non-dependent DECL_CONTEXT. This subsumes the existing (and seemingly
dead) DECL_NAMESPACE_SCOPE_P early exit test and also benefits
substitution into ordinary non-dependent CONST_DECLs.
PR c++/103081
gcc/cp/ChangeLog:
* pt.cc (tsubst_copy) <case CONST_DECL>: Generalize
early exit test for namespace-scope decls to check dependence of
the enclosing scope instead. Remove dead early exit test.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/using-enum-10.C: New test.
* g++.dg/cpp2a/using-enum-10a.C: New test.
Patrick Palka [Wed, 30 Nov 2022 00:25:37 +0000 (19:25 -0500)]
c++: ICE with <=> of incompatible pointers [PR107542]
In a SFINAE context composite_pointer_type returns error_mark_node if
the given pointer types are incompatible. But the SPACESHIP_EXPR case
of cp_build_binary_op wasn't prepared for this error_mark_node result,
which led to an ICE (from spaceship_comp_cat) for the below testcase.
(In a non-SFINAE context composite_pointer_type issues a permerror and
returns cv void* in this case, so this ICE seems specific to SFINAE.)
PR c++/107542
gcc/cp/ChangeLog:
* typeck.cc (cp_build_binary_op): In the SPACESHIP_EXPR case,
handle an error_mark_node result type.
Andrew Stubbs [Fri, 2 Dec 2022 16:37:58 +0000 (16:37 +0000)]
libgomp: Fix USM bugs
Fix up some USM corner cases.
libgomp/ChangeLog:
* libgomp.h (OFFSET_USM): New macro.
* target.c (gomp_map_pointer): Handle USM mappings.
(gomp_map_val): Handle OFFSET_USM.
(gomp_map_vars_internal): Move USM check earlier, and use OFFSET_USM.
Add OFFSET_USM check to the second mapping pass.
* testsuite/libgomp.fortran/usm-1.f90: New test.
* testsuite/libgomp.fortran/usm-2.f90: New test.
Sebastian Pop [Wed, 30 Nov 2022 19:45:24 +0000 (19:45 +0000)]
AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]
Currently patchable area is at the wrong place on AArch64. It is placed
immediately after function label, before .cfi_startproc. This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
modifies aarch64_print_patchable_function_entry to avoid placing
patchable area before .cfi_startproc.
Iain Buclaw [Tue, 13 Dec 2022 21:34:57 +0000 (22:34 +0100)]
libphobos: Backport library and bindings fixes from mainline
D Runtime changes:
- Fix MIPS64 bindings for CRuntime_UClibc.
Phobos changes:
- Fix std.path.expandTilde erroneously raising onOutOfMemory
after failed call to getpwnam_r().
- Fix std.random unittest failures on ILP32 targets.
- Use GENERIC_IO on CRuntime_UClibc port of std.stdio.
libphobos/ChangeLog:
* libdruntime/core/stdc/fenv.d: Compile in MIPS uClibc bindings on
MIPS_Any targets.
* libdruntime/core/stdc/math.d: Likewise.
* libdruntime/core/sys/posix/dlfcn.d: Likewise.
* libdruntime/core/sys/posix/setjmp.d: Add MIPS64 definitions for
CRuntime_UClibc.
* libdruntime/core/sys/posix/sys/types.d: Likewise.
* src/std/path.d (expandTilde): Handle more errno codes that could be
left set by getpwnam_r.
* src/std/random.d: Use D_LP64 in unittests.
* src/std/stdio.d: Set CRuntime_UClibc as GENERIC_IO target.
Alex Coplan [Thu, 1 Dec 2022 17:36:02 +0000 (17:36 +0000)]
varasm: Fix type confusion bug
This patch fixes a type confusion bug in varasm.cc:assemble_variable.
The problem is that the current code calls:
sect = get_variable_section (decl, false);
and then accesses sect->named.name without checking whether the section
is in fact a named section. In the surrounding else clause, we only know
that SECTION_STYLE (sect) != SECTION_NOSWITCH, so it is possible that
the section is an unnamed section.
In practice, this means that we end up doing a wild string compare
between a function pointer and the string literal ".vtable_map_vars".
This is because sect->named.name aliases sect->unnamed.callback in the
section union.
This can be seen in GDB with a simple testcase such as "int x;".
This patch fixes the issue by checking the SECTION_STYLE of the section
is in fact SECTION_NAMED before trying to do the string comparison.
We drop the existing check of whether sect->named.name is non-NULL
because this should presumably always be the case for a named section.
gcc/ChangeLog:
* varasm.cc (assemble_variable): Fix type confusion bug when
checking for ".vtable_map_vars" section.
Iain Buclaw [Sat, 10 Dec 2022 21:11:41 +0000 (22:11 +0100)]
d: Fix undefined reference to nested lambda in template (PR108055)
Sometimes, nested lambdas of templated functions get no code generation
due to them being marked as instantianted outside of all modules being
compiled in the current compilation unit. This despite enclosing
template instances being marked as instantiated inside the current
compilation unit. To fix, all enclosing templates are now checked in
`function_defined_in_root_p'.
Because of this change, `function_needs_inline_definition_p' has also
been fixed up to only check whether the regular function definition
itself is to be emitted in the current compilation unit.
PR d/108055
gcc/d/ChangeLog:
* decl.cc (function_defined_in_root_p): Check all enclosing template
instances for definition in a root module.
(function_needs_inline_definition_p): Replace call to
function_defined_in_root_p with test for outer module `isRoot'.
Richard Biener [Tue, 29 Nov 2022 08:03:46 +0000 (09:03 +0100)]
tree-optimization/107898 - ICE with -Walloca-larger-than
The following avoids ICEing with a mismatched prototype for alloca
and -Walloca-larger-than using irange for checks which doesn't
like mismatched types.
PR tree-optimization/107898
* gimple-ssa-warn-alloca.cc (alloca_call_type): Check
the type of the alloca argument is compatible with size_t
before querying ranges.
Richard Biener [Fri, 25 Nov 2022 07:27:42 +0000 (08:27 +0100)]
tree-optimization/107865 - ICE with outlining of loops
The following makes sure to clear loops number of iterations when
outlining them as part of a SESE region as can happen with
auto-parallelization. The referenced SSA names become stale otherwise.
PR tree-optimization/107865
* tree-cfg.cc (move_sese_region_to_fn): Free the number of
iterations of moved loops.
* gfortran.dg/graphite/pr107865.f90: New testcase.
Richard Biener [Fri, 2 Dec 2022 13:52:20 +0000 (14:52 +0100)]
tree-optimization/107833 - invariant motion of uninit uses
The following fixes a wrong-code bug caused by loop invariant motion
hoisting an expression using an uninitialized value outside of its
controlling condition causing IVOPTs to use that to rewrite a defined
value. PR107839 is a similar case involving a bogus uninit diagnostic.
PR tree-optimization/107833
PR tree-optimization/107839
* cfghooks.cc: Include tree.h.
* tree-ssa-loop-im.cc (movement_possibility): Wrap and
make stmts using any ssa_name_maybe_undef_p operand
to preserve execution.
(loop_invariant_motion_in_fun): Call mark_ssa_maybe_undefs
to init maybe-undefined status.
* tree-ssa-loop-ivopts.cc (ssa_name_maybe_undef_p,
ssa_name_set_maybe_undef, ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Move ...
* tree-ssa.cc: ... here.
* tree-ssa.h (ssa_name_any_use_dominates_bb_p,
mark_ssa_maybe_undefs): Declare.
(ssa_name_maybe_undef_p, ssa_name_set_maybe_undef): Define.
* gcc.dg/torture/pr107833.c: New testcase.
* gcc.dg/uninit-pr107839.c: Likewise.
Richard Biener [Fri, 28 Oct 2022 13:03:49 +0000 (15:03 +0200)]
tree-optimization/107407 - wrong code with DSE
So what happens is that we elide processing of this check with
/* In addition to kills we can remove defs whose only use
is another def in defs. That can only ever be PHIs of which
we track two for simplicity reasons, the first and last in
{first,last}_phi_def (we fail for multiple PHIs anyways).
We can also ignore defs that feed only into
already visited PHIs. */
else if (single_imm_use (vdef, &use_p, &use_stmt)
&& (use_stmt == first_phi_def
|| use_stmt == last_phi_def
|| (gimple_code (use_stmt) == GIMPLE_PHI
&& bitmap_bit_p (visited,
SSA_NAME_VERSION
(PHI_RESULT (use_stmt))))))
where we have the last PHI being the outer loop virtual PHI and the first
PHI being the loop exit PHI of the outer loop and we've already processed
the single immediate use of the outer loop PHI, the inner loop PHI. But
we still have to perform the above check!
It's easiest to perform the check when we visit the PHI node instead of
delaying it to the actual processing loop.
PR tree-optimization/107407
* tree-ssa-dse.cc (dse_classify_store): Perform backedge
varying index check when collecting PHI uses rather than
after optimizing processing of the candidate defs.
The testcase shows we mishandle the case where there's a pass-through
of a pointer through a function like memcpy. The following adjusts
handling of this copy case to require a taken address and adjust
the PHI case similarly.
PR tree-optimization/106868
* gimple-ssa-warn-access.cc (pass_waccess::gimple_call_return_arg_ref):
Inline into single user ...
(pass_waccess::check_dangling_uses): ... here and adjust the
call and the PHI case to require that ref.aref is the address
of the decl.
* gcc.dg/Wdangling-pointer-pr106868.c: New testcase.
Tobias Burnus [Mon, 12 Dec 2022 08:38:00 +0000 (09:38 +0100)]
libgomp: Handle OpenMP's reverse offloads
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.
The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.
For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.
libgomp/ChangeLog:
* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.
* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
item about 'align'; mark clause as 'Y' and directive as 'N'.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
* testsuite/libgomp.fortran/allocate-3.f90: New test.
Iain Buclaw [Sun, 11 Dec 2022 18:08:01 +0000 (19:08 +0100)]
d: Remove "final" and "override" from visitor method.
This was added by the backport of an ICE in r12-8969. While harmless,
it was not until r13-758 that "final" and "override" were introduced to
all visitor methods in the D front-end. Removing it from the release
branch just for consistency with the rest of the file.
gcc/d/ChangeLog:
* imports.cc (ImportVisitor::visit (OverloadSet *)): Remove "final"
and "override" from visitor method.
Iain Buclaw [Sat, 10 Dec 2022 18:12:43 +0000 (19:12 +0100)]
d: Fix internal compiler error: in visit, at d/imports.cc:72 (PR108050)
The visitor for lowering IMPORTED_DECLs did not have an override for
dealing with importing OverloadSet symbols. This has now been
implemented in the code generator.
PR d/108050
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (Import *)): Handle build_import_decl
returning a TREE_LIST.
* imports.cc (ImportVisitor::visit (OverloadSet *)): New override.
Similar story as PR103661, we again return a negative number
for __builtin_cpu_supports:
Documentation says:
int __builtin_cpu_supports(const char *feature)
This function returns a positive integer if the run-time CPU supports feature and returns 0 otherwise.
while we return -2147483648.
Moreover, I noticed "x86-64" is not a valid option for __builtin_cpu_is,
but for __builtin_cpu_supports.
PR target/107551
gcc/ChangeLog:
* config/i386/i386-builtins.cc (fold_builtin_cpu): Use same path
as for PR103661.
* doc/extend.texi: Fix "x86-64" use.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin_target.c: Add more checks.
Martin Liska [Wed, 15 Dec 2021 09:54:23 +0000 (10:54 +0100)]
i386: simplify cpu_feature handling
The patch removes unneeded loops for cpu_features2 and CONVERT_EXPR
that can be simplified with NOP_EXPR.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (has_cpu_feature): Directly
compute index in cpu_features2.
(set_cpu_feature): Likewise.
* config/i386/i386-builtins.cc (fold_builtin_cpu): Also remove
loop for cpu_features2 and use NOP_EXPRs.
OpenMP: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.
Additionally, a limitation of the number of teams on gcn offload devices is
implemented. The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit). This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory. Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).
gcc/ChangeLog:
* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
amdgcn: Support AMD-specific 'isa' and 'arch' traits in OpenMP context selectors
Add libgomp support for 'amdgcn' as arch, and for each processor type (as passed
to '-march') as isa traits.
Add test case for all supported 'isa' values used as context selectors in a
metadirective construct.
libgomp/ChangeLog:
* config/gcn/selector.c (GOMP_evaluate_current_device): Recognise 'amdgcn'
as arch, and '-march' values (as well as 'gfx803') as isa traits.
* testsuite/libgomp.c-c++-common/metadirective-6.c: New test.
Kyrylo Tkachov [Wed, 30 Nov 2022 17:38:16 +0000 (17:38 +0000)]
aarch64: Specify that FEAT_MOPS sequences clobber CC
According to the architecture pseudocode the FEAT_MOPS sequences overwrite the NZCV flags
as par of their operation, so GCC needs to model that in the relevant RTL patterns.
For the testcase:
void g();
void foo (int a, size_t N, char *__restrict__ in,
char *__restrict__ out)
{
if (a != 3)
__builtin_memcpy (out, in, N);
if (a > 3)
g ();
}
we will currently generate:
foo:
cmp w0, 3
bne .L6
.L1:
ret
.L6:
cpyfp [x3]!, [x2]!, x1!
cpyfm [x3]!, [x2]!, x1!
cpyfe [x3]!, [x2]!, x1!
ble .L1 // Flags reused after CPYF* sequence
b g
This is wrong as the result of cmp needs to be recalculated after the MOPS sequence.
With this patch we'll insert a "cmp w0, 3" before the ble, similar to what clang does.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk and to the GCC 12 branch after some baking time.
gcc/ChangeLog:
* config/aarch64/aarch64.md (aarch64_cpymemdi): Specify clobber of CC reg.
(*aarch64_cpymemdi): Likewise.
(aarch64_movmemdi): Likewise.
(aarch64_setmemdi): Likewise.
(*aarch64_setmemdi): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mops_5.c: New test.
* gcc.target/aarch64/mops_6.c: Likewise.
* gcc.target/aarch64/mops_7.c: Likewise.
liuhongt [Mon, 28 Nov 2022 01:59:47 +0000 (09:59 +0800)]
Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of QImode.
For __builtin_ia32_vec_set_v16qi (a, -1, 2) with
!flag_signed_char. it's transformed to
__builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple,
and expanded to (const_int 255) in the rtl. But for immediate_operand,
it expects (const_int 255) to be signed extended to
(const_int -1). The mismatch caused an unrecognizable insn error.
The patch converts (const_int 255) to (const_int -1) in the backend
expander.
d: Include tm.h in all D target platform sources, remove memmodel.h
The tm.h header would pull in config/elfos.h, which defines
TARGET_D_MINFO_SECTION needed for the D module support in the front-end
to emit data to the correct section for the run-time library to pick up.
The removal of it in r13-2385 caused a stage2 bootstrap failure on all
Solaris targets.
The memmodel header has also been removed as it is no longer required
now tm_p.h is no longer used by these sources.
Iain Buclaw [Fri, 11 Nov 2022 23:54:47 +0000 (00:54 +0100)]
d: Fix ICE on named continue label in an unrolled loop [PR107592]
Continue labels in an unrolled loop require a unique label per
iteration. Previously this used the Statement body node for each
unrolled iteration to generate a new entry in the label hash table.
This does not work when the continue label has an identifier, as said
named label is pointing to the outer UnrolledLoopStatement node.
What would happen is that during the lowering of `continue label', an
automatic label associated with the unrolled loop would be generated,
and a jump to that label inserted, but because it was never pushed by
the visitor for the loop itself, it subsequently never gets emitted.
To fix, correctly use the UnrolledLoopStatement as the key to look up
and store the break/continue label pair, but remove the continue label
from the value entry after every loop to force a new label to be
generated by the next call to `push_continue_label'
PR d/107592
gcc/d/ChangeLog:
* toir.cc (IRVisitor::push_unrolled_continue_label): New method.
(IRVisitor::pop_unrolled_continue_label): New method.
(IRVisitor::visit (UnrolledLoopStatement *)): Use them instead of
push_continue_label and pop_continue_label.
Iain Buclaw [Tue, 16 Aug 2022 14:18:02 +0000 (16:18 +0200)]
d: Fix #error You must define PREFERRED_DEBUGGING_TYPE if DWARF is not supported
This moves all D front-end specific target definitions out of the main
target headers, and into its own header that is included by tm_d.h
instead of pulling in the same headers as tm_p.h.
This fixes the build on target configurations that pull in the default D
language target hooks, and subsequently trigger an error because the
definition of PREFERRED_DEBUGGING_TYPE is behind tm.h, the one header
that is avoided from being included in default-d.cc.
PR d/105659
gcc/ChangeLog:
* config.gcc: Set tm_d_file to ${cpu_type}/${cpu_type}-d.h.
* config/aarch64/aarch64-d.cc: Include tm_d.h.
* config/aarch64/aarch64-protos.h (aarch64_d_target_versions): Move to
config/aarch64/aarch64-d.h.
(aarch64_d_register_target_info): Likewise.
* config/aarch64/aarch64.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/arm/arm-d.cc: Include tm_d.h and arm-protos.h instead of
tm_p.h.
* config/arm/arm-protos.h (arm_d_target_versions): Move to
config/arm/arm-d.h.
(arm_d_register_target_info): Likewise.
* config/arm/arm.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/default-d.cc: Remove memmodel.h include.
* config/freebsd-d.cc: Include tm_d.h instead of tm_p.h.
* config/glibc-d.cc: Likewise.
* config/i386/i386-d.cc: Include tm_d.h.
* config/i386/i386-protos.h (ix86_d_target_versions): Move to
config/i386/i386-d.h.
(ix86_d_register_target_info): Likewise.
(ix86_d_has_stdcall_convention): Likewise.
* config/i386/i386.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
(TARGET_D_HAS_STDCALL_CONVENTION): Likewise.
* config/i386/winnt-d.cc: Include tm_d.h instead of tm_p.h.
* config/mips/mips-d.cc: Include tm_d.h.
* config/mips/mips-protos.h (mips_d_target_versions): Move to
config/mips/mips-d.h.
(mips_d_register_target_info): Likewise.
* config/mips/mips.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/netbsd-d.cc: Include tm_d.h instead of tm.h and memmodel.h.
* config/openbsd-d.cc: Likewise.
* config/pa/pa-d.cc: Include tm_d.h.
* config/pa/pa-protos.h (pa_d_target_versions): Move to
config/pa/pa-d.h.
(pa_d_register_target_info): Likewise.
* config/pa/pa.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/riscv/riscv-d.cc: Include tm_d.h.
* config/riscv/riscv-protos.h (riscv_d_target_versions): Move to
config/riscv/riscv-d.h.
(riscv_d_register_target_info): Likewise.
* config/riscv/riscv.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/rs6000/rs6000-d.cc: Include tm_d.h.
* config/rs6000/rs6000-protos.h (rs6000_d_target_versions): Move to
config/rs6000/rs6000-d.h.
(rs6000_d_register_target_info): Likewise.
* config/rs6000/rs6000.h (TARGET_D_CPU_VERSIONS) Likewise.:
(TARGET_D_REGISTER_CPU_TARGET_INFO) Likewise.:
* config/s390/s390-d.cc: Include tm_d.h.
* config/s390/s390-protos.h (s390_d_target_versions): Move to
config/s390/s390-d.h.
(s390_d_register_target_info): Likewise.
* config/s390/s390.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* config/sol2-d.cc: Include tm_d.h instead of tm.h and memmodel.h.
* config/sparc/sparc-d.cc: Include tm_d.h.
* config/sparc/sparc-protos.h (sparc_d_target_versions): Move to
config/sparc/sparc-d.h.
(sparc_d_register_target_info): Likewise.
* config/sparc/sparc.h (TARGET_D_CPU_VERSIONS): Likewise.
(TARGET_D_REGISTER_CPU_TARGET_INFO): Likewise.
* configure: Regenerate.
* configure.ac (tm_d_file): Remove defaults.h.
(tm_d_include_list): Remove options.h and insn-constants.h.
* config/aarch64/aarch64-d.h: New file.
* config/arm/arm-d.h: New file.
* config/i386/i386-d.h: New file.
* config/mips/mips-d.h: New file.
* config/pa/pa-d.h: New file.
* config/riscv/riscv-d.h: New file.
* config/rs6000/rs6000-d.h: New file.
* config/s390/s390-d.h: New file.
* config/sparc/sparc-d.h: New file.
While most PA 2.0 instructions support both 32 and 64-bit traps
and conditions, the addi and subi instructions only support 32-bit
traps and conditions. Thus, we need to force immediate operands
to register operands on the 64-bit target and use the add/sub
instructions which can trap on 64-bit signed overflow.
2022-11-30 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (addvdi3): Force operand 2 to a register.
Remove "addi,tsv,*" instruction from unamed pattern.
(subvdi3): Force operand 1 to a register.
Remove "subi,tsv" instruction from from unamed pattern.
* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
* testsuite/libgomp.c/declare-variant-4.h: New header file.
Tobias Burnus [Mon, 28 Nov 2022 14:21:30 +0000 (15:21 +0100)]
gcn: Fix __builtin_gcn_first_call_this_thread_p
Contrary naive expectation, unspec_volatile (via prologue_use) did not
prevent the cprop pass (at -O2) to remove the access to the s[0:1]
(PRIVATE_SEGMENT_BUFFER_ARG) register as the volatile got just put on
the preceeding pseudoregister. Solution: Use gen_rtx_USE instead.
Additionally, this patch removes (gen_)prologue_use_di as it is then no
longer used.
Finally, as we already do bit manipulation, instead of using the full
64bit side - and then just keeping the value of 's0', just move directly
to use only s1 of s[0:1] and do the bit manipulations there, generating
more readable assembly code and better matching the '#else' branch.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_builtin_1): Work on s1 instead
of s[0:1] and use USE to prevent removal of setting that register.
* config/gcn/gcn.md (prologue_use_di): Remove.
Tobias Burnus [Mon, 28 Nov 2022 14:20:36 +0000 (15:20 +0100)]
OpenMP/Fortran: Permit end-clause on directive
gcc/fortran/ChangeLog:
* openmp.cc (OMP_DO_CLAUSES, OMP_SCOPE_CLAUSES,
OMP_SECTIONS_CLAUSES): Add 'nowait'.
(OMP_SINGLE_CLAUSES): Add 'nowait' and 'copyprivate'.
(gfc_match_omp_distribute_parallel_do,
gfc_match_omp_distribute_parallel_do_simd,
gfc_match_omp_parallel_do,
gfc_match_omp_parallel_do_simd,
gfc_match_omp_parallel_sections,
gfc_match_omp_teams_distribute_parallel_do,
gfc_match_omp_teams_distribute_parallel_do_simd): Disallow 'nowait'.
(gfc_match_omp_workshare): Match 'nowait' clause.
(gfc_match_omp_end_single): Use clause matcher for 'nowait'.
(resolve_omp_clauses): Reject 'nowait' + 'copyprivate'.
* parse.cc (decode_omp_directive): Break too long line.
(parse_omp_do, parse_omp_structured_block): Diagnose duplicated
'nowait' clause.
libgomp/ChangeLog:
* libgomp.texi (OpenMP 5.2): Mark end-directive as Y.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/copyprivate-1.f90: New test.
* gfortran.dg/gomp/copyprivate-2.f90: New test.
* gfortran.dg/gomp/nowait-2.f90: Move dg-error tests ...
* gfortran.dg/gomp/nowait-4.f90: ... to this new file.
* gfortran.dg/gomp/nowait-5.f90: New test.
* gfortran.dg/gomp/nowait-6.f90: New test.
* gfortran.dg/gomp/nowait-7.f90: New test.
* gfortran.dg/gomp/nowait-8.f90: New test.
Tobias Burnus [Mon, 28 Nov 2022 14:16:47 +0000 (15:16 +0100)]
libgomp: Add no-target-region rev offload test + fix plugin-nvptx
OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case. This commit adds a testcase for this.
In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.
libgomp/ChangeLog:
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
as valid and the code having no reverse-offload code.
* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.
Sandra Loosemore [Sat, 26 Nov 2022 01:35:32 +0000 (01:35 +0000)]
OpenMP: Generate SIMD clones for functions with "declare target"
This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution. The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled for offload processing at -O2 and higher.
* common.opt (fopenmp-target-simd-clone): New option.
(target_simd_clone_device): New enum to go with it.
* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
* flag-types.h (enum omp_target_simd_clone_device_kind): New.
* omp-simd-clone.cc (auto_simd_fail): New function.
(auto_simd_check_stmt): New function.
(plausible_type_for_simd_clone): New function.
(ok_for_auto_simd_clone): New function.
(simd_clone_create): Add force_local argument, make the symbol
have internal linkage if it is true.
(expand_simd_clones): Also check for cloneable functions with
"omp declare target". Pass explicit_p argument to
simd_clone.compute_vecsize_and_simdlen target hook.
* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
Add bool explicit_p argument.
* doc/tm.texi: Regenerated.
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
* config/gcn/gcn.cc
(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
* config/i386/i386.cc
(ix86_simd_clone_compute_vecsize_and_simdlen): Update.
Eric Botcazou [Fri, 25 Nov 2022 09:49:20 +0000 (10:49 +0100)]
Fix thinko in operator_bitwise_xor::op1_range
There is a thinko in the op1_range method of ranger's operator_bitwise_xor
class in a boolean context: if the result is known to be true, it may infer
that a specific operand is false without any basis.
Eric Botcazou [Tue, 22 Nov 2022 18:03:49 +0000 (19:03 +0100)]
Fix wrong array type conversion with different storage orde
When two arrays of scalars have a different storage order in Ada, the
front-end makes sure that the conversion is performed component-wise
so that each component can be reversed. So it's a little bit counter
productive that the ldist pass performs the opposite transformation
and synthesizes a memcpy/memmove in this case.
gcc/
* tree-loop-distribution.cc (loop_distribution::classify_builtin_ldst):
Bail out if source and destination do not have the same storage order.
Jonathan Wakely [Thu, 15 Sep 2022 17:21:32 +0000 (18:21 +0100)]
libstdc++: Remove unnecessary header from <memory>
Previously <memory> included <bits/stl_algobase.h> so that std::copy,
std::fill etc. could be used by <bits/stl_uninitialized.h>. But that
includes it explicitly now, so that it can be compiled as a header unit.
There's no need to include it in <memory>, where its purpose isn't
obvious.
libstdc++-v3/ChangeLog:
* include/std/memory: Do not include <bits/stl_algobase.h>.
Jonathan Wakely [Thu, 10 Nov 2022 14:11:27 +0000 (14:11 +0000)]
libstdc++: Fix tests with non-const operator==
These tests fail in strict -std=c++20 mode but their equality ops don't
need to be non-const, it looks like an accident.
This fixes two FAILs with -std=c++20:
FAIL: 20_util/tuple/swap.cc (test for excess errors)
FAIL: 26_numerics/valarray/87641.cc (test for excess errors)
Jonathan Wakely [Tue, 1 Nov 2022 13:47:24 +0000 (13:47 +0000)]
libstdc++: Remove unnecessary variant member in std::expected
Hui Xie pointed out that we don't need a dummy member in the union,
because all constructors always initialize either _M_val or _M_unex.
We still need the _M_void member of the expected<void, E>
specialization, because the constructor has to initialize something when
not using the _M_unex member.
Jonathan Wakely [Mon, 21 Nov 2022 11:52:34 +0000 (11:52 +0000)]
libstdc++: Check static assertions earlier in chrono::duration
This ensures that we fail a static assertion before giving any other
errors. Instantiating chrono::duration<int, chrono::seconds> will now
print this before the other errors caused by it:
error: static assertion failed: period must be a specialization of ratio
libstdc++-v3/ChangeLog:
* include/bits/chrono.h (duration): Check preconditions on
template arguments before using them.
Jonathan Wakely [Fri, 23 Sep 2022 12:28:37 +0000 (13:28 +0100)]
libstdc++: Fix std::is_nothrow_invocable_r for uncopyable prvalues [PR91456]
This is the last missing piece of PR 91456.
This also removes the only use of the C++11 version of
std::is_nothrow_invocable.
libstdc++-v3/ChangeLog:
PR libstdc++/91456
* include/std/type_traits (__is_nothrow_invocable): Remove.
(__is_invocable_impl::__nothrow_type): New member type which
checks if the conversion can throw.
(__is_nt_invocable_impl): Replace class template with alias
template to __is_nt_invocable_impl::__nothrow_type.
* testsuite/20_util/is_nothrow_invocable/91456.cc: New test.
* testsuite/20_util/is_nothrow_convertible/value.cc: Remove
macro used by value_ext.cc test.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Remove
test for non-standard __is_nothrow_invocable trait.
The new builtins have been added for newlib to reduce dependency on
compiler-internal implementation choices of GCC in newlibs' getreent.c.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P,
GET_STACK_LIMIT): Add new builtins.
* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them.
* config/gcn/gcn.md (prologue_use): Add "register_operand" as
arg to match_operand.
(prologue_use_di): New; DI insn_and_split variant of the former.
Jonathan Wakely [Tue, 22 Nov 2022 18:15:56 +0000 (18:15 +0000)]
libstdc++: Add workaround for fs::path constraint recursion [PR106201]
This works around a compiler bug where overload resolution attempts
implicit conversion to path in order to call a function with a path&
parameter. Such conversion would produce a prvalue, which would not be
able to bind to the lvalue reference anyway. Attempting to check the
conversion causes a constraint recursion because the arguments to the
path constructor are checked to see if they're iterators, which checks
if they're swappable, which tries to use the swap function that
triggered the conversion in the first place.
This replaces the swap function with an abbreviated function template
that is constrained with same_as<path> auto& so that the invalid
conversion is never considered.
libstdc++-v3/ChangeLog:
PR libstdc++/106201
* include/bits/fs_path.h (filesystem::swap(path&, path&)):
Replace with abbreviated function template.
* include/experimental/bits/fs_path.h (filesystem::swap):
Likewise.
* testsuite/27_io/filesystem/iterators/106201.cc: New test.
* testsuite/experimental/filesystem/iterators/106201.cc: New test.
Jonathan Wakely [Tue, 22 Nov 2022 09:53:36 +0000 (09:53 +0000)]
libstdc++: Fix pool resource build errors for H8 [PR107801]
The array of pool sizes was previously adjusted to work for msp430-elf
which has 16-bit int and either 16-bit size_t or 20-bit size_t. The
largest pool sizes were disabled unless size_t has more than 20 bits.
The H8 family has 16-bit int but 32-bit size_t, which means that the
largest sizes are enabled, but 1<<15 produces a negative number that
then cannot be narrowed to size_t.
Replace the test for 32-bit size_t with a test for 32-bit int, which
means we won't use the 4kiB to 4MiB pools for targets with 16-bit int
even if they have a wider size_t.
libstdc++-v3/ChangeLog:
PR libstdc++/107801
* src/c++17/memory_resource.cc (pool_sizes): Disable large pools
for targets with 16-bit int.
Tobias Burnus [Mon, 21 Nov 2022 14:25:48 +0000 (15:25 +0100)]
libgomp/gcn: fix/improve struct output
output.printf_data.(value union) contains text[128], which has the size
of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2]
could be extended to value_u64[6] - sufficient for all required arguments
to gomp_target_rev. Additionally, next_output.printf_data.(msg union)
contained msg_u64 which then is no longer needed and also caused 32bit
vs 64bit alignment issues.
libgomp/
* config/gcn/libgomp-gcn.h (struct output):
Remove 'msg_u64' from the union, change
value_u64[2] to value_u64[6].
* config/gcn/target.c (GOMP_target_ext): Update accordingly.
* plugin/plugin-gcn.c (process_reverse_offload, console_output):
Likewise.
Jakub Jelinek [Mon, 21 Nov 2022 09:28:27 +0000 (10:28 +0100)]
i386: Uglify some local identifiers in *intrin.h [PR107748]
While reporting PR107748 (where is a problem with non-uglified names,
but I've left it out because it needs fixing anyway), I've noticed
various spots where identifiers in *intrin.h headers weren't uglified.
The following patch fixed those that are related to unions (I've grepped
for [a-zA-Z]\.[a-zA-Z] spots).
The reason we need those to be uglified is the same as why the arguments
of the inlines are __ prefixed and most of automatic vars in the inlines
- say a, v or u aren't part of implementation namespace and so users could
#define u whatever->something
#include <x86intrin.h>
and it should still work, as long as u is not e.g. one of the names
of the functions/macros the header provides (_mm* etc.).
2022-11-21 Jakub Jelinek <jakub@redhat.com>
PR target/107748
* config/i386/avx512fp16intrin.h (_mm512_castph512_ph128,
_mm512_castph512_ph256, _mm512_castph128_ph512,
_mm512_castph256_ph512, _mm512_set1_pch): Uglify names of local
variables and union members.
* config/i386/avx512fp16vlintrin.h (_mm256_castph256_ph128,
_mm256_castph128_ph256, _mm256_set1_pch, _mm_set1_pch): Likewise.
* config/i386/smmintrin.h (_mm_extract_ps): Likewise.
* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Likewise.