Andrew Stubbs [Fri, 15 Jan 2021 11:26:46 +0000 (11:26 +0000)]
DWARF address space for variables
Add DWARF address class attributes for variables that exist outside the
generic address space. In particular, this is the case for gang-private
variables in OpenACC offload kernels.
gcc/ChangeLog:
* dwarf2out.c (add_location_or_const_value_attribute): Set
DW_AT_address_class, if appropriate.
Andrew Stubbs [Fri, 8 Jan 2021 13:56:52 +0000 (13:56 +0000)]
amdgcn: Fix DWARF variables with alloca
Require a frame pointer for entry functions that use alloca because it isn't
possible to encode the DWARF frame otherwise. Adjust the CFA definition
expressions accordingly.
gcc/ChangeLog:
* config/gcn/gcn.c (gcn_expand_prologue): Use the frame pointer for
the DWARF CFA, if it exists.
(gcn_frame_pointer_rqd): Require a frame pointer for entry functions
that use alloca.
Julian Brown [Wed, 25 Nov 2020 17:08:01 +0000 (09:08 -0800)]
[og10] vect: Add target hook to prefer gather/scatter instructions
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.
The attached patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.
Julian Brown [Fri, 6 Nov 2020 23:17:29 +0000 (15:17 -0800)]
[og10] openacc: Adjust loop lowering for AMD GCN
This patch adjusts OpenACC loop lowering in the AMD GCN target compiler
in such a way that the autovectorizer can vectorize the "vector" dimension
of those loops in more cases.
Rather than generating "SIMT" code that executes a scalar instruction
stream for each lane of a vector in lockstep, for GCN we model the GPU
like a typical CPU, with separate instructions to operate on scalar and
vector data. That means that unlike other offload targets, we rely on
the autovectorizer to handle the innermost OpenACC parallelism level,
which is "vector".
Because of this, the OpenACC builtin functions to return the current
vector lane and the vector width return 0 and 1 respectively, despite
the native vector width being 64 elements wide.
This allows generated code to work with our chosen compilation model,
but the way loops are lowered in omp-offload.c:oacc_xform_loop does not
understand the discrepancy between logical (OpenACC) and physical vector
sizes correctly. That means that if a loop is partitioned over e.g. the
worker AND vector dimensions, we actually lower with unit vector size --
meaning that if we then autovectorize, we end up trying to vectorize
over the "worker" dimension rather than the vector one! Then, because
the number of workers is not fixed at compile time, that means the
autovectorizer has a hard time analysing the loop and thus vectorization
often fails entirely.
We can fix this by deducing the true vector width in oacc_xform_loop,
and using that when we are on a "non-SIMT" offload target. We can then
rearrange how loops are lowered in that function so that the loop form
fed to the autovectorizer is more amenable to vectorization -- namely,
the innermost step is set to process each loop iteration sequentially.
For some benchmarks, allowing vectorization to succeed leads to quite
impressive performance improvements -- I've observed between 2.5x and
40x on one machine/GPU combination.
The low-level builtins available to user code (__builtin_goacc_parlevel_id
and __builtin_goacc_parlevel_size) continue to return 0/1 respectively
for the vector dimension for AMD GCN, even if their containing loop is
vectorized -- that's a quirk that we might possibly want to address at
some later date.
Only non-"chunking" loops are handled at present. "Chunking" loops are
still lowered as before.
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* omp-offload.c (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
Add overloaded wrapper for previous arguments & behaviour.
(oacc_xform_loop): Lower vector loops to iterate a multiple of
omp_max_vf times over contiguous steps on non-SIMT targets.
Julian Brown [Fri, 6 Nov 2020 22:53:29 +0000 (14:53 -0800)]
amdgcn: Fix exec register live-on-entry to BB in md-reorg
This patch fixes a corner case in the AMD GCN md-reorg pass when the
EXEC register is live on entry to a BB, and could be clobbered by code
inserted by the pass before a use in (e.g.) a different BB.
Backport from mainline:
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_md_reorg): Fix case where EXEC reg is live
on entry to a BB.
Julian Brown [Mon, 30 Nov 2020 19:10:04 +0000 (11:10 -0800)]
amdgcn: Improve FP division accuracy
GCN has a reciprocal-approximation instruction but no
hardware divide. This patch adjusts the open-coded reciprocal
approximation/Newton-Raphson refinement steps to use fused multiply-add
instructions as is necessary to obtain a properly-rounded result, and
adds further refinement steps to correctly round the full division result.
The patterns in question are still guarded by a flag_reciprocal_math
condition, and do not yet support denormals.
Backport from mainline:
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn-valu.md (recip<mode>2<exec>, recip<mode>2): Use unspec
for reciprocal-approximation instructions.
(div<mode>3): Use fused multiply-accumulate operations for reciprocal
refinement and division result.
* config/gcn/gcn.md (UNSPEC_RCP): New unspec constant.
gcc/testsuite/
* gcc.target/gcn/fpdiv.c: New test.
Julian Brown [Mon, 30 Nov 2020 20:01:37 +0000 (12:01 -0800)]
amdgcn: Fix subdf3 pattern
This patch fixes a typo in the subdf3 pattern that meant it had a
non-standard name and thus the compiler would emit a libcall rather than
the proper hardware instruction for DFmode subtraction.
Chung-Lin Tang [Tue, 10 Nov 2020 11:36:58 +0000 (03:36 -0800)]
openmp: Implement OpenMP 5.0 base-pointer attachement and clause ordering
This patch implements some parts of the target variable mapping changes
specified in OpenMP 5.0, including base-pointer attachment/detachment
behavior for array section list-items in map clauses, and ordering of
map clauses according to map kind.
* c-common.h (c_omp_adjust_map_clauses): New declaration.
* c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses.
(c_omp_adjust_map_clauses): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.
* c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
(c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_target): Likewise.
* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
interaction between reference case and attach/detach.
(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/ChangeLog:
* gimplify.c (is_or_contains_p): New static helper function.
(omp_target_reorder_clauses): New function.
(gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to
reorder clause list according to OpenMP 5.0 rules. Add handling of
GOMP_MAP_ATTACH_DETACH for OpenMP cases.
* omp-low.c (is_omp_target): New static helper function.
(scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH
for OpenMP cases.
(lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for
OpenMP cases.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid.
* gfortran.dg/gomp/map-2.f90: Likewise.
* c-c++-common/gomp/map-5.c: New testcase.
libgomp/ChangeLog:
* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
usable.
* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_internal):
Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use
of gomp_attach_pointer for OpenMP cases.
(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead. The max-active-levels-var ICV
is now per data environment rather than per device.
Jakub Jelinek [Thu, 22 Oct 2020 07:31:01 +0000 (09:31 +0200)]
openmp: Change omp_get_initial_device () to match OpenMP 5.1 requirements
> Therefore, I think until omp_get_initial_device () value is changed, we
The following so far untested patch implements that change.
OpenMP 4.5 said for omp_get_initial_device:
The value of the device number is implementation defined. If it is between 0 and one less than
omp_get_num_devices() then it is valid for use with all device constructs and routines; if it is
outside that range, then it is only valid for use with the device memory routines and not in the
device clause.
and OpenMP 5.0 similarly, but OpenMP 5.1 says:
The value of the device number is the value returned by the omp_get_num_devices routine.
As the new value is compatible with what has been required earlier, I think
we can change it already now.
* icv.c (omp_get_initial_device): Remove including corresponding
ialias.
* icv-device.c (omp_get_initial_device): New function. Return
gomp_get_num_devices (). Add ialias.
* target.c (resolve_device): Don't fail with
OMP_TARGET_OFFLOAD=mandatory if device_id is equal to
gomp_get_num_devices ().
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy, omp_target_memcpy_rect, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_pause_resource): Use
gomp_get_num_devices () instead of GOMP_DEVICE_HOST_FALLBACK on the
first use in the functions, in uses dominated by the
gomp_get_num_devices call use num_devices_openmp instead.
* libgomp.texi (omp_get_initial_device): Document.
* config/gcn/icv-device.c (omp_get_initial_device): New function.
Add ialias.
* config/nvptx/icv-device.c (omp_get_initial_device): Likewise.
* testsuite/libgomp.c/target-40.c: New test.
openmp: Implement support for OMP_TARGET_OFFLOAD environment variable
This implements support for the OMP_TARGET_OFFLOAD environment variable
introduced in the OpenMP 5.0 standard, which controls how offloading
is handled. It may be set to MANDATORY (abort if offloading cannot be
performed), DISABLED (no offloading to devices) or DEFAULT (offload to
device if possible, fall back to host if not).
2020-10-20 Kwok Cheung Yeung <kcy@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
libgomp/
* env.c (gomp_target_offload_var): New.
(parse_target_offload): New.
(handle_omp_display_env): Print value of OMP_TARGET_OFFLOAD.
(initialize_env): Parse OMP_TARGET_OFFLOAD.
* libgomp.h (gomp_target_offload_t): New.
(gomp_target_offload_var): New.
* libgomp.texi (OMP_TARGET_OFFLOAD): New section.
* target.c (resolve_device): Generate error if device not found and
offloading is mandatory.
(gomp_target_fallback): Generate error if offloading is mandatory.
(GOMP_target): Add argument in call to gomp_target_fallback.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Generate error if offloading is mandatory.
(GOMP_target_data): Add argument in call to gomp_target_data_fallback.
(GOMP_target_data_ext): Likewise.
(gomp_target_task_fn): Add argument in call to gomp_target_fallback.
(gomp_target_init): Return early if offloading is disabled.
Inside of the function, use automatic
variables corresponding to num_devices, num_devices_openmp and devices
global variables and update the globals only at the end of the
function.
openmp: Add support for the omp_get_supported_active_levels runtime library routine
This patch implements the omp_get_supported_active_levels runtime routine
from the OpenMP 5.0 specification, which returns the maximum number of
active nested parallel regions supported by this implementation. The
current maximum (set using the omp_set_max_active_levels routine or the
OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number.
vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with no linkage
This is because r11-3402 now always declares float64x2 variables for
aarch64, leading to a duplicate declaration in these testcases.
The fix is simply to remove these now useless declarations.
These tests are skipped on arm*, so there is no impact on that target.
This patch implements the missing reinterprets to and from poly128_t and
float64x2_t.
I've plugged in the appropriate testing in the advsimd-intrinsics.exp
too.
Bootstrapped and tested on aarch64-none-linux-gnu.
Tested advsimd-intrinsics.exp on arm-none-eabi too to make sure arm
testing isn't affected.
This patch implements the missing vrndns_f32 intrinsic. This operates on a scalar float32_t value.
It can be mapped down to a __builtin_aarch64_frintnsf builtin.
This patch does that.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/71233
* config/aarch64/aarch64-simd-builtins.def (frintn): Use BUILTIN_VHSDF_HSDF
for modes. Remove explicit hf instantiation.
* config/aarch64/arm_neon.h (vrndns_f32): Define.
gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vrndns_f32_1.c: New test.
AArch64: Implement missing _p64 intrinsics for vector permutes
This patch implements some missing vector permute intrinsics operating on poly64x2_t types.
They are implemented identically to their uint64x2_t brethren.
Bootstrapped and tested on aarch64-none-linux-gnu.
This patch implements some missing vceq* intrinsics on poly types.
The behaviour is to produce the appropriate CMEQ instruction as for the unsigned types.
Bootstrapped and tested on aarch64-none-linux-gnu.
Eric Botcazou [Mon, 28 Sep 2020 07:00:46 +0000 (09:00 +0200)]
Fix bogus alignment warning on address clause
The compiler gives a bogus alignment warning on an address clause and
a discriminated record type with variable size.
gcc/ada/ChangeLog:
* gcc-interface/decl.c (maybe_saturate_size): Add ALIGN parameter
and round down the result to ALIGN.
(gnat_to_gnu_entity): Adjust calls to maybe_saturate_size.
gcc/testsuite/ChangeLog:
* gnat.dg/addr16.adb: New test.
* gnat.dg/addr16_pkg.ads: New helper.
Jakub Jelinek [Sun, 27 Sep 2020 21:18:26 +0000 (23:18 +0200)]
optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]
The following testcase is miscompiled on i686-linux, because
we try to expand a double-word bitwise logic operation with op0
being a (mem:DI u) and target (mem:DI u+4), i.e. partial overlap, and
thus end up with:
movl 4(%esp), %eax
andl u, %eax
movl %eax, u+4
! movl u+4, %eax optimized out
andl 8(%esp), %eax
movl %eax, u+8
rather than with the desired:
movl 4(%esp), %edx
movl 8(%esp), %eax
andl u, %edx
andl u+4, %eax
movl %eax, u+8
movl %edx, u+4
because the store of the first word to target overwrites the second word of
the operand.
expand_binop for this (and several similar places) already check for target
== op0 or target == op1, this patch just adds reg_overlap_mentioned_p calls
next to it.
Pedantically, at least for some of these it might be sufficient to force
a different target if there is overlap but target is not rtx_equal_p to
the operand (e.g. in this bitwise logical case, but e.g. not in the shift
cases where there is reordering), though that would go against the
preexisting target == op? checks and the rationale that REG_EQUAL notes in
that case isn't correct.
2020-09-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97073
* optabs.c (expand_binop, expand_absneg_bit, expand_unop,
expand_copysign_bit): Check reg_overlap_mentioned_p between target
and operand(s) and if it returns true, force a pseudo as target.
Mark Eggleston [Thu, 11 Jun 2020 13:33:51 +0000 (14:33 +0100)]
Fortran : ICE in build_field PR95614
Local identifiers can not be the same as a module name. Original
patch by Steve Kargl resulted in name clashes between common block
names and local identifiers. A local identifier can be the same as
a global identier if that identifier represents a common. The patch
was modified to allow global identifiers that represent a common
block.
2020-09-27 Steven G. Kargl <kargl@gcc.gnu.org>
Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/fortran/
PR fortran/95614
* decl.c (gfc_get_common): Use gfc_match_common_name instead
of match_common_name.
* decl.c (gfc_bind_idents): Use gfc_match_common_name instead
of match_common_name.
* match.c : Rename match_common_name to gfc_match_common_name.
* match.c (gfc_match_common): Use gfc_match_common_name instead
of match_common_name.
* match.h : Rename match_common_name to gfc_match_common_name.
* resolve.c (resolve_common_vars): Check each symbol in a
common block has a global symbol. If there is a global symbol
issue an error if the symbol type is known as is not a common
block name.
2020-09-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95614
* gfortran.dg/pr95614_1.f90: New test.
* gfortran.dg/pr95614_2.f90: New test.
Add processing STRICT_LOW_PART for matched reloads.
2020-06-04 Vladimir Makarov <vmakarov@redhat.com>
PR middle-end/95464
* lra.c (lra_emit_move): Add processing STRICT_LOW_PART.
* lra-constraints.c (match_reload): Use STRICT_LOW_PART in output
reload if the original insn has it too.
Joe Ramsay [Wed, 19 Aug 2020 12:34:06 +0000 (12:34 +0000)]
arm: Require MVE memory operand for destination of vst1q intrinsic
Previously, the machine description patterns for vst1q accepted a generic memory
operand for the destination, which could lead to an unrecognised builtin when
expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
memory operands.
PR target/96683
* gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
H.J. Lu [Mon, 14 Sep 2020 15:52:27 +0000 (08:52 -0700)]
rtl_data: Add sp_is_clobbered_by_asm
Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.
gcc/
PR target/97032
* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
to true if the stack pointer is clobbered by asm statement.
* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
if the stack pointer is clobbered by asm statement.
gcc/testsuite/
PR target/97032
* gcc.target/i386/pr97032.c: New test.
Alan Modra [Fri, 18 Sep 2020 13:51:05 +0000 (23:21 +0930)]
[RS6000] Power10 libffi fixes
Power10 pc-relative code doesn't use or preserve r2 as a TOC pointer.
That means calling between pc-relative and TOC using code can't be
done without intervening linker stubs, and a call from TOC code to
pc-relative code must have a nop after the bl in order to restore r2.
Now the PowerPC libffi assembly code doesn't use r2 except for the
implicit use when making calls back to C, ffi_closure_helper_LINUX64
and ffi_prep_args64. So changing the assembly to interoperate with
pc-relative code without stubs is easily done.
PR target/97166
* src/powerpc/linux64.S (ffi_call_LINUX64): Don't emit global
entry when __PCREL__. Call using @notoc. Add nops.
* src/powerpc/linux64_closure.S (ffi_closure_LINUX64): Likewise.
(ffi_go_closure_linux64): Likewise.
Jonathan Wakely [Tue, 22 Sep 2020 19:02:58 +0000 (20:02 +0100)]
libstdc++: Fix out-of-bounds string_view access in filesystem::path [PR 97167]
libstdc++-v3/ChangeLog:
PR libstdc++/97167
* src/c++17/fs_path.cc (path::_Parser::root_path()): Check
for empty string before inspecting the first character.
* testsuite/27_io/filesystem/path/append/source.cc: Append
empty string_view to path.
David Faust [Tue, 22 Sep 2020 18:31:35 +0000 (20:31 +0200)]
bpf: use xBPF signed div, mod insns when available
The 'mod' and 'div' operators in eBPF are unsigned, with no signed
counterpart. xBPF adds two new ALU operations, sdiv and smod, for
signed division and modulus, respectively. Update bpf.md with
'define_insn' blocks for signed div and mod to use them when targetting
xBPF, and add new tests to ensure they are used appropriately.
2020-09-17 David Faust <david.faust@oracle.com>
gcc/
* config/bpf/bpf.md: Add defines for signed div and mod operators.
gcc/testsuite/
* gcc.target/bpf/diag-sdiv.c: New test.
* gcc.target/bpf/diag-smod.c: New test.
* gcc.target/bpf/xbpf-sdiv-1.c: New test.
* gcc.target/bpf/xbpf-smod-1.c: New test.
Jonathan Wakely [Tue, 22 Sep 2020 07:42:18 +0000 (08:42 +0100)]
libstdc++: Use correct argument type for __use_alloc, again [PR 96803]
While backporting 5494edae83ad33c769bd1ebc98f0c492453a6417 I noticed
that it's still not correct. I made the allocator-extended constructor
use the right type for the uses-allocator construction detection, but I
used an rvalue when it should be a const lvalue.
This should fix it properly this time.
libstdc++-v3/ChangeLog:
PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Use correct value category in __use_alloc call.
* testsuite/20_util/tuple/cons/96803.cc: Check with constructors
that require correct value category to be used.
Jonathan Wakely [Wed, 26 Aug 2020 18:32:30 +0000 (19:32 +0100)]
libstdc++: Use correct argument type for __use_alloc [PR 96803]
The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.
libstdc++-v3/ChangeLog:
PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.
Jonathan Wakely [Sun, 20 Sep 2020 23:17:02 +0000 (00:17 +0100)]
libstdc++: Fix noexcept-specifier for std::bind_front [PR 97101]
libstdc++-v3/ChangeLog:
PR libstdc++/97101
* include/std/functional (bind_front): Fix order of parameters
in is_nothrow_constructible_v specialization.
* testsuite/20_util/function_objects/bind_front/97101.cc: New test.
Jonathan Wakely [Thu, 10 Sep 2020 14:39:15 +0000 (15:39 +0100)]
libstdc++: handle small max_blocks_per_chunk in pool resources [PR 94160]
When a pool resource is constructed with max_blocks_per_chunk=1 it ends
up creating a pool with blocks_per_chunk=0 which means it never
allocates anything. Instead it returns null pointers, which should be
impossible.
To avoid this problem, round the max_blocks_per_chunk value to a
multiple of four, so it's never smaller than four.
libstdc++-v3/ChangeLog:
PR libstdc++/94160
* src/c++17/memory_resource.cc (munge_options): Round
max_blocks_per_chunk to a multiple of four.
(__pool_resource::_M_alloc_pools()): Simplify slightly.
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
Check that valid pointers are returned when small values are
used for max_blocks_per_chunk.
Andrew Stubbs [Mon, 27 Jul 2020 09:55:22 +0000 (10:55 +0100)]
dwarf: Multi-register CFI address support
Add support for architectures such as AMD GCN, in which the pointer size is
larger than the register size. This allows the CFI information to include
multi-register locations for the stack pointer, frame pointer, and return
address.
Note that this uses a newly proposed DWARF operator DW_OP_LLVM_piece_end,
which is currently only recognized by the ROCGDB debugger from AMD. The exact
name and encoding for this operator is subject to change if and when the DWARF
standard accepts it.
gcc/ChangeLog:
* dwarf2cfi.c (dw_stack_pointer_regnum): Change type to struct cfa_reg.
(dw_frame_pointer_regnum): Likewise.
(new_cfi_row): Use set_by_dwreg.
(get_cfa_from_loc_descr): Use set_by_dwreg. Support register spans
with DW_OP_piece and DW_OP_LLVM_piece_end. Support DW_OP_lit*,
DW_OP_const*, DW_OP_minus, and DW_OP_plus.
(lookup_cfa_1): Use set_by_dwreg.
(def_cfa_0): Update for cfa_reg and support register spans.
(reg_save): Change sreg parameter to struct cfa_reg. Support register
spans.
(dwf_cfa_reg): New function.
(dwarf2out_flush_queued_reg_saves): Use dwf_cfa_reg instead of
dwf_regno.
(dwarf2out_frame_debug_def_cfa): Likewise.
(dwarf2out_frame_debug_adjust_cfa): Likewise.
(dwarf2out_frame_debug_cfa_offset): Likewise. Update reg_save usage.
(dwarf2out_frame_debug_cfa_register): Likewise.
(dwarf2out_frame_debug_expr): Likewise.
(create_pseudo_cfg): Use set_by_dwreg.
(initial_return_save): Use set_by_dwreg and dwf_cfa_reg,
(create_cie_data): Use dwf_cfa_reg.
(execute_dwarf2_frame): Use dwf_cfa_reg.
(dump_cfi_row): Use set_by_dwreg.
* dwarf2out.c (build_span_loc): New function.
(build_cfa_loc): Support register spans.
(build_cfa_aligned_loc): Update cfa_reg usage.
(convert_cfa_to_fb_loc_list): Use set_by_dwreg.
* dwarf2out.h (struct cfa_reg): New type.
(struct dw_cfa_location): Use struct cfa_reg.
(build_span_loc): New prototype.
* gengtype.c (main): Accept poly_uint16_pod type.
include/ChangeLog:
* dwarf2.def (DW_OP_LLVM_piece_end): New extension operator.
2020-09-20 John David Anglin < danglin@gcc.gnu.org>
gcc/ChangeLog
* config/pa/pa-hpux11.h (LINK_GCC_C_SEQUENCE_SPEC): Delete.
* config/pa/pa64-hpux.h (LINK_GCC_C_SEQUENCE_SPEC): Likewise.
(ENDFILE_SPEC): Link with libgcc_stub.a and mill.a.
* config/pa/pa32-linux.h (ENDFILE_SPEC): Link with libgcc.a.
Andrew Stubbs [Thu, 17 Sep 2020 11:48:21 +0000 (12:48 +0100)]
amdgcn: Remove omp_gcn pass
This pass only had an optimization for obtaining team/thread numbers in it,
and that turns out to be invalid in the presence of nested parallel regions,
so we can simply delete the whole thing.
Of course, it would be nice to apply the optimization where it is valid, but
that will take more effort than I have to spend right now.
Andrew Stubbs [Thu, 17 Sep 2020 11:53:39 +0000 (12:53 +0100)]
libgomp: disable barriers in nested teams
Both GCN and NVPTX allow nested parallel regions, but the barrier
implementation did not allow the nested teams to run independently of each
other (due to hardware limitations). This patch fixes that, under the
assumption that each thread will create a new subteam of one thread, by
simply not using barriers when there's no other thread to synchronise.
libgomp/ChangeLog:
* config/gcn/bar.c (gomp_barrier_wait_end): Skip the barrier if the
total number of threads is one.
(gomp_team_barrier_wake): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/nvptx/bar.c (gomp_barrier_wait_end): Likewise.
(gomp_team_barrier_wake): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
* testsuite/libgomp.c-c++-common/nested-parallel-unbalanced.c: New test.
Jakub Jelinek [Thu, 17 Sep 2020 16:43:33 +0000 (18:43 +0200)]
openmp: Also implicitly mark as declare target to functions mentioned in target regions
OpenMP 5.0 also specifies that functions referenced from target regions
(except for target regions with device(ancestor:)) are also implicitly declare target to.
This patch implements that.
2020-05-14 Jakub Jelinek <jakub@redhat.com>
* function.h (struct function): Add has_omp_target bit.
* omp-offload.c (omp_discover_declare_target_fn_r): New function,
old renamed to ...
(omp_discover_declare_target_tgt_fn_r): ... this.
(omp_discover_declare_target_var_r): Call
omp_discover_declare_target_tgt_fn_r instead of
omp_discover_declare_target_fn_r.
(omp_discover_implicit_declare_target): Also queue functions with
has_omp_target bit set, for those walk with
omp_discover_declare_target_fn_r, for declare target to functions
walk with omp_discover_declare_target_tgt_fn_r.
gcc/c/
* c-parser.c (c_parser_omp_target): Set cfun->has_omp_target.
gcc/cp/
* cp-gimplify.c (cp_genericize_r): Set cfun->has_omp_target.
gcc/fortran/
* trans-openmp.c: Include function.h.
(gfc_trans_omp_target): Set cfun->has_omp_target.
libgomp/
* testsuite/libgomp.c-c++-common/target-40.c: New test.
Jakub Jelinek [Thu, 17 Sep 2020 16:36:28 +0000 (18:36 +0200)]
openmp: Implement discovery of implicit declare target to clauses
This attempts to implement what the OpenMP 5.0 spec in declare target section
says as ammended by the 5.1 changes so far (related to device_type(host)), except
that it doesn't have the device(ancestor: ...) handling yet because we do not
support it yet, and I've left so far out the except lambda note, because I need
that clarified.
2020-05-12 Jakub Jelinek <jakub@redhat.com>
* omp-offload.h (omp_discover_implicit_declare_target): Declare.
* omp-offload.c: Include context.h.
(omp_declare_target_fn_p, omp_declare_target_var_p,
omp_discover_declare_target_fn_r, omp_discover_declare_target_var_r,
omp_discover_implicit_declare_target): New functions.
* cgraphunit.c (analyze_functions): Call
omp_discover_implicit_declare_target.
Marek Polacek [Wed, 16 Sep 2020 13:27:29 +0000 (09:27 -0400)]
preprocessor: Fix ICE with too long line in fmtwarn [PR96935]
Here we ICE in char_span::subspan because the offset it gets is -1.
It's -1 because get_substring_ranges_for_loc gets a location whose
column was 0. That only happens in testcases like the attached where
we're dealing with extremely long lines (at least 4065 chars it seems).
This does happen in practice, though, so it's not just a theoretical
problem (e.g. when building the SU2 suite).
Fixed by checking that the column get_substring_ranges_for_loc gets is
sane, akin to other checks in that function.
gcc/ChangeLog:
PR preprocessor/96935
* input.c (get_substring_ranges_for_loc): Return if start.column
is less than 1.
gcc/testsuite/ChangeLog:
PR preprocessor/96935
* gcc.dg/format/pr96935.c: New test.
Jakub Jelinek [Wed, 16 Sep 2020 07:42:33 +0000 (09:42 +0200)]
store-merging: Consider also overlapping stores earlier in the by bitpos sorting [PR97053]
As the testcases show, if we have something like:
MEM <char[12]> [&b + 8B] = {};
MEM[(short *) &b] = 5;
_5 = *x_4(D);
MEM <long long unsigned int> [&b + 2B] = _5;
MEM[(char *)&b + 16B] = 88;
MEM[(int *)&b + 20B] = 1;
then in sort_by_bitpos the stores are almost like in the given order,
except the first store is after the = _5; store.
We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
while the former INTEGER_CST, and we can't coalesce the = _5 store with
the = {} store because the former is MEM_REF, the latter INTEGER_CST.
But we happily coalesce the remaining 3 stores, which is wrong, because the
= _5; store overlaps those and is in between them in the program order.
We already have code to deal with similar cases in check_no_overlap, but we
deal only with the following stores in sort_by_bitpos order, not the earlier
ones.
The following patch checks also the earlier ones. In coalesce_immediate_stores
it computes the first one that needs to be checked (all the ones whose
bitpos + bitsize is smaller or equal to merged_store->start don't need to be
checked and don't need to be checked even for any following attempts because
of the sort_by_bitpos sorting) and the end of that (that is the first store
in the merged_store).
2020-09-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97053
* gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if
any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
be merged store.
(imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
Adjust check_no_overlap caller.
(imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
and last_earlier variables, adjust them during iterations. Adjust
check_no_overlap callers, call check_no_overlap even when extending
overlapping stores by extra INTEGER_CST stores.
* gcc.dg/store_merging_31.c: New test.
* gcc.dg/store_merging_32.c: New test.
Will Schmidt [Wed, 9 Sep 2020 15:59:38 +0000 (10:59 -0500)]
[PATCH,rs6000] Testsuite fixup pr96139 tests
Hi,
As reported, the recently added pr96139 tests will fail on older targets
because the tests are missing the appropriate -mvsx or -maltivec options.
This adds the options and clarifies the dg-require statements.
The pr96139-c.c test needs -maltivec to work, but does not actually use
vectors, so does not require -mvsx like the others.
Sniff-regtested OK when specifying older targets on a power7 host.
--target_board=unix/'{-mcpu=power4,-mcpu=power5,-mcpu=power6,-mcpu=power7,
-mcpu=power8,-mcpu=power9}''{-m64,-m32}'"
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96139-a.c: Specify -mvsx option and update the
dg-require stanza to match.
* gcc.target/powerpc/pr96139-b.c: Same.
* gcc.target/powerpc/pr96139-c.c: Specify -maltivec option and update
the dg-require stanza to match.
Will Schmidt [Mon, 20 Jul 2020 15:51:37 +0000 (10:51 -0500)]
[PATCH, rs6000] Fix vector long long subtype (PR96139)
Hi,
This corrects an issue with the powerpc vector long long subtypes.
As reported by SjMunroe, when building some code with -Wall, and
attempting to print an element of a "long long vector" with a
long long printf format string, we will report an error because
the vector sub-type was improperly defined as int.
When defining a V2DI_type_node we use a TARGET_POWERPC64 ternary to
define the V2DI_type_node with "vector long" or "vector long long".
We also need to specify the proper sub-type when we define the type.
PR target/96139
2020-09-03 Will Schmidt <will_schmidt@vnet.ibm.com>
gcc/ChangeLog:
* config/rs6000/rs6000-call.c (rs6000_init_builtin): Update V2DI_type_node
and unsigned_V2DI_type_node definitions.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96139-a.c: New test.
* gcc.target/powerpc/pr96139-b.c: New test.
* gcc.target/powerpc/pr96139-c.c: New test.
PR fortran/96668
* trans.h (gfc_omp_finish_clause): Add bool openacc arg.
* trans-openmp.c (gfc_omp_finish_clause): Ditto. Use
GOMP_MAP_ALWAYS_POINTER with PSET for pointers.
(gfc_trans_omp_clauses): Like the latter and also if the always
modifier is used.
PR fortran/96668
* libgomp.h (struct target_var_desc): Add has_null_ptr_assoc member.
* target.c (gomp_map_vars_existing): Add always_to_flag flag.
(gomp_map_vars_existing): Update call to it.
(gomp_map_fields_existing): Likewise
(gomp_map_vars_internal): Update PSET handling such that if a nullptr is
now allocated or if GOMP_MAP_POINTER is used PSET is updated and pointer
remapped.
(GOMP_target_enter_exit_data): Hanlde GOMP_MAP_ALWAYS_POINTER like
GOMP_MAP_POINTER.
* testsuite/libgomp.fortran/map-alloc-ptr-1.f90: New test.
* testsuite/libgomp.fortran/map-alloc-ptr-2.f90: New test.
Richard Biener [Mon, 24 Aug 2020 12:12:01 +0000 (14:12 +0200)]
debug/96690 - mangle symbols eventually used by late dwarf output
The following makes sure to, at early debug generation time, mangle
symbols we eventually end up outputting during late finish.
2020-08-24 Richard Biener <rguenther@suse.de>
PR debug/96690
* dwarf2out.c (reference_to_unused): Make FUNCTION_DECL
processing more consistent with respect to
symtab->global_info_ready.
(tree_add_const_value_attribute): Unconditionally call
rtl_for_decl_init to do all mangling early but throw
away the result if early_dwarf.