One of the problems in this PR was that if we had:
vector_type1 array[] = { vector_value1 };
process_init_element would only treat vector_value1 as initialising
a vector_type1 if they had the same TYPE_MAIN_VARIANT. This has
several problems:
(1) It gives confusing error messages if the vector types are
incompatible. (Tested by gcc.dg/pr96377-1.c.)
(2) It means that we reject code that should be valid with
-flax-vector-conversions. (Tested by gcc.dg/pr96377-2.c.)
(3) On arm and aarch64 targets, it means that we reject some
initializers that mix Advanced SIMD and standard GNU vectors.
These vectors have traditionally had different TYPE_MAIN_VARIANTs
because they have different mangling schemes. (Tested by
gcc.dg/pr96377-[3-6].c.)
(4) It means that we reject SVE initializers that should be valid.
(Tested by gcc.target/aarch64/sve/gnu_vectors_[34].c.)
because applying the binary operator to arm_neon_value1 strips
the "Advanced SIMD type" attributes that were added in that patch.
Stripping the attributes is problematic for other reasons though,
so that still needs to be fixed separately.
gcc/c/
PR c/96377
* c-typeck.c (process_init_element): Split test for whether to
recurse into a record, union or array into...
(initialize_elementwise_p): ...this new function. Don't recurse
into a vector type if the initialization value is also a vector.
Both intrinsics did not handle the case where the va_list object comes
from a ref parameter.
gcc/d/ChangeLog:
PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().
preprocessor: Teach traditional about has_include [PR95889]
Traditional cpp (used by fortran) didn;t know about the new
__has_include__ implementation. Hey, since when did traditional cpp
grow __has_include__? That wasn't in knr!
Julian Brown [Thu, 18 Jun 2020 12:11:08 +0000 (05:11 -0700)]
openacc: Deep copy attach/detach should not affect reference counts
Attach and detach operations are not supposed to affect structural or
dynamic reference counts for OpenACC. Previously they did so, which led to
subtle problems in some circumstances. We can avoid reference-counting
attach/detach operations by extending and slightly repurposing the
do_detach field in target_var_desc. It is now called is_attach to better
reflect its new role.
2020-07-27 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* libgomp.h (struct target_var_desc): Rename do_detach field to
is_attach.
* oacc-mem.c (goacc_exit_datum_1): Add assert. Don't set finalize for
GOMP_MAP_FORCE_DETACH. Update checking to use is_attach field.
(goacc_enter_data_internal): Don't affect reference counts
for attach mappings.
(goacc_exit_data_internal): Don't affect reference counts for detach
mappings.
* target.c (gomp_map_vars_existing): Don't affect reference counts for
attach mappings.
(gomp_map_vars_internal): Set renamed is_attach flag unconditionally to
mark attach mappings.
(gomp_unmap_vars_internal): Use is_attach flag to prevent affecting
reference count for attach mappings.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
test as shouldfail.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
gracefully in no-finalize mode.
Julian Brown [Thu, 2 Jul 2020 21:18:20 +0000 (14:18 -0700)]
openacc: Remove unnecessary detach finalization
The call to gomp_detach_pointer in gomp_unmap_vars_internal does not
need to force finalization, and doing so may mask mismatched pointer
attachments/detachments. This patch removes the forcing.
2020-07-16 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* target.c (gomp_unmap_vars_internal): Remove unnecessary forcing of
finalization for detach operation.
* testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c:
New test.
coroutines: Correct frame capture of compiler temps [PR95591+4].
When a full expression contains a co_await (or co_yield), this means
that it might be suspended; which would destroy temporary variables
(on a stack for example). However the language guarantees that such
temporaries are live until the end of the expression.
In order to preserve this, we 'promote' temporaries where necessary
so that they are saved in the coroutine frame (which allows them to
remain live potentially until the frame is destroyed). In addition,
sub-expressions that produce control flow (such as TRUTH_AND/OR_IF
or COND_EXPR) must be handled specifically to ensure that await
expressions are properly expanded.
This patch corrects two mistakes in which we were (a) failing to
promote some temporaries and (b) we we failing to sequence DTORs for
the captures properly. This manifests in a number of related (but not
exact duplicate) PRs.
The revised code collects the actions into one place and maps all the
control flow into one form - a benefit of this is that co_returns are
now expanded earlier (which provides an opportunity to address PR95517
in some future patch).
We replace a statement that contains await expression(s) with a bind
scope that has a variable list describing the temporaries that have
been 'promoted' and a statement list that contains a series of cleanup
expressions for each of those. Where we encounter nested conditional
expressions, these are wrapped in a try-finally block with a guard var
for each sub-expression variable that needs a DTOR. The guards are all
declared and initialized to false before the first conditional sub-
expression. The 'finally' block contains a series of if blocks (one
per guard variable) enclosing the relevant DTOR.
Variables listed in a bind scope in this manner are automatically moved
to a coroutine frame version by existing code (so we re-use that rather
than having a separate mechanism).
gcc/cp/ChangeLog:
PR c++/95591
PR c++/95599
PR c++/95823
PR c++/95824
PR c++/95895
* coroutines.cc (struct coro_ret_data): Delete.
(coro_maybe_expand_co_return): Delete.
(co_return_expander): Delete.
(expand_co_returns): Delete.
(co_await_find_in_subtree): Remove unused name.
(build_actor_fn): Remove unused parm, remove handling
for co_return expansion.
(register_await_info): Demote duplicate info message to a
warning.
(coro_make_frame_entry): Move closer to use site.
(struct susp_frame_data): Add fields for final suspend label
and a flag to indicate await expressions with initializers.
(captures_temporary): Delete.
(register_awaits): Remove unused code, update comments.
(find_any_await): New.
(tmp_target_expr_p): New.
(struct interesting): New.
(find_interesting_subtree): New.
(struct var_nest_node): New.
(flatten_await_stmt): New.
(handle_nested_conditionals): New.
(process_conditional): New.
(replace_statement_captures): Rename to...
(maybe_promote_temps): ... this.
(maybe_promote_captured_temps): Delete.
(analyze_expression_awaits): Check for await expressions with
initializers. Simplify handling for truth-and/or-if.
(expand_one_truth_if): Simplify (map cases that need expansion
to COND_EXPR).
(await_statement_walker): Handle CO_RETURN_EXPR. Simplify the
handling for truth-and/or-if expressions.
(register_local_var_uses): Ensure that we create names in the
implementation namespace.
(morph_fn_to_coro): Add final suspend label to suspend frame
callback data and remove it from the build_actor_fn call.
gcc/testsuite/ChangeLog:
PR c++/95591
PR c++/95599
PR c++/95823
PR c++/95824
PR c++/95895
* g++.dg/coroutines/pr95591.C: New test.
* g++.dg/coroutines/pr95599.C: New test.
* g++.dg/coroutines/pr95823.C: New test.
* g++.dg/coroutines/pr95824.C: New test.
Martin Liska [Mon, 27 Jul 2020 10:30:24 +0000 (12:30 +0200)]
expr: build string_constant only for a char type
gcc/ChangeLog:
PR tree-optimization/96058
* expr.c (string_constant): Build string_constant only
for a type that has same precision as char_type_node
and is an integral type.
Jakub Jelinek [Tue, 28 Jul 2020 09:08:29 +0000 (11:08 +0200)]
expander: Fix ICE in maybe_warn_rdwr_sizes [PR96335]
The following testcase ICEs in maybe_warn_rdwr_sizes. The problem is that
the caller uses its fndecl and fntype variables to fill up rdwr_map, and
the fntype in that case is a prototype with the access attribute and all
the checks needed for that performed. But the maybe_warn_rdwr_sizes
function tries to rediscover fndecl/fntype itself and does it differently
from how the caller did (for fndecl get_callee_fndecl and fntype from that
FUNCTION_DECL, otherwise sets fntype to CALL_EXPR_FN's type).
On the testcase, get_callee_fndecl does find a FUNCTION_DECL because
it does STRIP_NOPS in between.
Instead of trying to rediscover those, this patch just passes them down,
like is done in several other functions.
2020-07-28 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96335
* calls.c (maybe_warn_rdwr_sizes): Add FNDECL and FNTYPE arguments,
instead of trying to rediscover them in the body.
(initialize_argument_information): Adjust caller.
Martin Liska [Fri, 24 Jul 2020 12:33:27 +0000 (14:33 +0200)]
Use vec::reserve before vec_safe_grow_cleared is called
gcc/ChangeLog:
PR lto/45375
* symbol-summary.h: Call vec_safe_reserve before grow is called
in order to grow to a reasonable size.
* vec.h (vec_safe_reserve): Add missing function for vl_ptr
type.
Mark Eggleston [Thu, 11 Jun 2020 10:05:40 +0000 (11:05 +0100)]
Fortran : ICE in gfc_check_pointer_assign PR95612
Output an error if the right hand value is a zero sized array or
does not have a symbol tree otherwise continue checking.
2020-07-27 Steven G. Kargl <kargl@gcc.gnu.org>
gcc/fortran/
PR fortran/95612
* expr.c (gfc_check_pointer_assigb): Output an error if
rvalue is a zero sized array or output an error if rvalue
doesn't have a symbol tree.
2020-07-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95612
* gfortran.dg/pr95612.f90: New test.
PR 93567, G edit descriptor uses E instead of F editing in rounding mode UP.
The switch between FMT_E and FMT_F is based on the absolute value.
Set r=0 for rounding toward zero and r = 1 otherwise.
If (exp_d - m) == 1 there is no rounding needed.
libgfortran/ChangeLog:
PR fortran/93567
* io/write_float.def (determine_en_precision): Fix switch between
FMT_E and FMT_F.
gcc/testsuite/ChangeLog:
PR fortran/93567
* gfortran.dg/round_3.f08: Add test cases.
sparc/sparc64: use crtendS.o for default-pie executables [PR96190]
In --enable-default-pie mode compiler should switch from
using crtend.o to crtendS.o. On sparc it is especially important
because crtend.o contains PIC-unfriendly code.
We use GNU_USER_TARGET_ENDFILE_SPEC as a baseline spec to get
crtendS.o instead of crtend.o in !no-pie mode.
gcc:
2020-07-14 Sergei Trofimovich <siarheit@google.com>
PR target/96190
* config/sparc/linux.h (ENDFILE_SPEC): Use GNU_USER_TARGET_ENDFILE_SPEC
to get crtendS.o for !no-pie mode.
* config/sparc/linux64.h (ENDFILE_SPEC): Ditto.
Harald Anlauf [Mon, 6 Jul 2020 16:58:23 +0000 (18:58 +0200)]
PR fortran/95980 - ICE on using sync images with -fcheck=bounds
In SELECT TYPE, the argument may be an incorrectly specified unlimited
polymorphic variable. Avoid a NULL pointer dereference for clean error
recovery.
gcc/fortran/
PR fortran/95980
* match.c (copy_ts_from_selector_to_associate, build_class_sym):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.
* resolve.c (resolve_select_type):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.
Peter Bergner [Wed, 22 Jul 2020 16:44:35 +0000 (11:44 -0500)]
rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode
PR96236 shows a problem where we don't correctly store our 512-bit accumulators
correctly in little-endian mode. The patch below detects when we're doing a
little-endian memory access and stores to the correct memory locations.
Thomas Koenig [Sun, 19 Jul 2020 15:27:45 +0000 (17:27 +0200)]
Always use name from c_interop_kinds_table for -fc-prototypes.
When a user specified a KIND that was a parameter taking the value
of an iso_c_binding KIND, the code used the name of that parameter
to look up the type name. Corrected by always looking it up in
the table of C interop kinds (which was previously done for
non-C-interop types, anyway).
gcc/fortran/ChangeLog:
PR fortran/96220
* dump-parse-tree.c (get_c_type_name): Always use the entries from
c_interop_kinds_table to find the correct C type.
Thomas Koenig [Thu, 23 Jul 2020 15:56:09 +0000 (17:56 +0200)]
Fix handling of implicit_pure by checking if non-pure procedures are called.
Procedures are marked as implicit_pure if they fulfill the criteria of
pure procedures. In this case, a procedure was not marked as not being
implicit_pure which called another procedure, which had not yet been
marked as not being implicit_impure.
Fixed by iterating over all procedures, setting callers of procedures
which are non-pure and non-implicit_pure as non-implicit_pure and
doing this until no more procedure has been changed.
Kito Cheng [Wed, 22 Jul 2020 06:50:40 +0000 (14:50 +0800)]
PR target/96260 - KASAN should work even back-end not porting anything.
- Most KASAN function don't need any porting anything in back-end
except asan stack protection.
- However kernel will given shadow offset when enable asan stack
protection, so eveything in KASAN can work if shadow offset is given.
- Verified with x86 and risc-v.
- Verified with RISC-V linux kernel.
gcc/ChangeLog:
PR target/96260
* asan.c (asan_shadow_offset_set_p): New.
* asan.h (asan_shadow_offset_set_p): Ditto.
* toplev.c (process_options): Allow -fsanitize=kernel-address
even TARGET_ASAN_SHADOW_OFFSET not implemented, only check when
asan stack protection is enabled.
The trunk versions of the patch made GNU and Advanced SIMD vectors
distinct (but inter-convertible) in all cases. However, the
traditional behaviour is that the types are distinct in template
arguments but not otherwise.
Following a suggestion from Jason, this patch puts the check
for different vector types under comparing_specializations.
In order to keep the backport as simple as possible, the patch
hard-codes the name of the attribute in the frontend rather than
adding a new branch-only target hook.
I didn't find a test that tripped the assert on the branch,
even with the --param in the PR, so instead I tested this by
forcing the hash function to only hash the tree code. That made
the static assertion in the test fail without the patch but pass
with it.
This means that the tests pass for unmodified sources even
without the patch (unless you're very unlucky).
gcc/
PR target/95726
* config/aarch64/aarch64.c (aarch64_attribute_table): Add
"Advanced SIMD type".
* config/aarch64/aarch64-builtins.c: Include stringpool.h and
attribs.h.
(aarch64_init_simd_builtin_types): Add an "Advanced SIMD type"
attribute to each Advanced SIMD type.
* config/arm/arm.c (arm_attribute_table): Add "Advanced SIMD type".
* config/arm/arm-builtins.c: Include stringpool.h and attribs.h.
(arm_init_simd_builtin_types): Add an "Advanced SIMD type"
attribute to each Advanced SIMD type.
gcc/cp/
PR target/95726
* typeck.c (structural_comptypes): When comparing template
specializations, differentiate between vectors that have and
do not have an "Advanced SIMD type" attribute.
gcc/testsuite/
PR target/95726
* g++.target/aarch64/pr95726.C: New test.
* g++.target/arm/pr95726.C: Likewise.
Jakub Jelinek [Wed, 15 Jul 2020 09:34:44 +0000 (11:34 +0200)]
fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]
The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
intrinsics have an argument which must have a constant passed to it
and so use an inline version only for ifdef __OPTIMIZE__ and have
a #define for -O0. But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
don't need a constant argument, they are essentially the first
set with the constant added to them implicitly based on the comparison
name, and so there is no #define version for them (correctly).
But their inline versions are defined in between the first and s[ds]
set and so inside of ifdef __OPTIMIZE__, which means that with -O0
they aren't defined at all.
This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
use #define #endif block.
Marek Polacek [Tue, 23 Jun 2020 01:26:49 +0000 (21:26 -0400)]
c++: Make convert_like complain about bad ck_ref_bind again [PR95789]
convert_like issues errors about bad_p conversions at the beginning
of the function, but in the ck_ref_bind case, it only issues them
after we've called convert_like on the next conversion.
This doesn't work as expected since r10-7096 because when we see
a conversion from/to class type in a template, we return early, thereby
missing the error, and a bad_p conversion goes by undetected. That
made the attached test to compile even though it should not.
I had thought that I could just move the ck_ref_bind/bad_p errors
above to the rest of them, but that regressed diagnostics because
expr then wasn't converted yet by the nested convert_like_real call.
So, for bad_p conversions, do the normal processing, but still return
the IMPLICIT_CONV_EXPR to avoid introducing trees that the template
processing can't handle well. This I achieved by adding a wrapper
function.
gcc/cp/ChangeLog:
PR c++/95789
PR c++/96104
PR c++/96179
* call.c (convert_like_real_1): Renamed from convert_like_real.
(convert_like_real): New wrapper for convert_like_real_1.
gcc/testsuite/ChangeLog:
PR c++/95789
PR c++/96104
PR c++/96179
* g++.dg/conversion/ref4.C: New test.
* g++.dg/conversion/ref5.C: New test.
* g++.dg/conversion/ref6.C: New test.
libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvprof
The version of nvprof in CUDA 9.0 causes a hang when used to profile an
OpenACC program. This is because it calls acc_get_device_type from
a callback called during device initialization, which then attempts
to acquire acc_device_lock while it is already taken, resulting in
deadlock. This works around the issue by returning acc_device_none
from acc_get_device_type without attempting to acquire the lock when
initialization has not completed yet.
2020-07-14 Tom de Vries <tom@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread):
New variable.
(acc_init_1): Set acc_init_thread to pthread_self (). Set
acc_init_state to initializing at the start, and to initialized at the
end.
(self_initializing_p): New function.
(acc_get_device_type): Return acc_device_none if called by thread that
is currently executing acc_init_1.
* libgomp.texi (acc_get_device_type): Update documentation.
(Implementation Status and Implementation-Defined Behavior): Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c: New.
ipa-devirt: Fix crash in obj_type_ref_class [PR95114]
The testcase has failed since r9-5035, because obj_type_ref_class
tries to look up an ODR type when no ODR type information is
available. (The information was available earlier in the
compilation, but was freed during pass_ipa_free_lang_data.)
We then crash dereferencing the null get_odr_type result.
The test passes with -O2. However, it fails again if -fdump-tree-all
is used, since obj_type_ref_class is called indirectly from the
dump routines.
Other code creates ODR type entries on the fly by passing “true”
as the insert parameter. But obj_type_ref_class can't do that
unconditionally, since it should have no side-effects when used
from the dumping code.
Following a suggestion from Honza, this patch adds parameters
to say whether the routines are being called from dump routines
and uses those to derive the insert parameter.
gcc/
PR middle-end/95114
* tree.h (virtual_method_call_p): Add a default-false parameter
that indicates whether the function is being called from dump
routines.
(obj_type_ref_class): Likewise.
* tree.c (virtual_method_call_p): Likewise.
* ipa-devirt.c (obj_type_ref_class): Likewise. Lazily add ODR
type information for the type when the parameter is false.
* tree-pretty-print.c (dump_generic_node): Update calls to
virtual_method_call_p and obj_type_ref_class accordingly.
gcc/testsuite/
PR middle-end/95114
* g++.target/aarch64/pr95114.C: New test.
value-range: Fix handling of POLY_INT_CST anti-ranges [PR96146]
The range infrastructure has code to decompose POLY_INT_CST ranges
to worst-case integer bounds. However, it had the fundamental flaw
(obvious in hindsight) that it applied to anti-ranges too, meaning
that a range 2+2X would end up with a range of ~[2, +INF], i.e.
[-INF, 1]. This patch decays to varying in that case instead.
I'm still a bit uneasy about this. ISTM that in terms of
generality:
I.e. an SSA_NAME could store a POLY_INT_CST and a POLY_INT_CST
could store an INTEGER_CST (before canonicalisation). POLY_INT_CST
is also “as constant as” ADDR_EXPR (well, OK, only some ADDR_EXPRs
are run-time rather than link-time constants, whereas all POLY_INT_CSTs
are, but still). So it seems like we should at least be able to treat
POLY_INT_CST as symbolic. On the other hand, I don't have any examples
in which that would be useful.
gcc/
PR tree-optimization/96146
* value-range.cc (value_range::set): Only decompose POLY_INT_CST
bounds to integers for VR_RANGE. Decay to VR_VARYING for anti-ranges
involving POLY_INT_CSTs.
gcc/testsuite/
PR tree-optimization/96146
* gcc.target/aarch64/sve/acle/general/pr96146.c: New test.
Jakub Jelinek [Tue, 14 Jul 2020 14:01:11 +0000 (16:01 +0200)]
expr: Unbreak build of mesa [PR96194]
> > The store to the whole of each volatile object was picked apart
> > like there had been an individual assignment to each of the
> > fields. Reads were added as part of that; see PR for details.
> > The reads from volatile memory were a clear bug; individual
> > stores questionable. A separate patch clarifies the docs.
This breaks building of mesa on both the trunk and 10 branch.
The problem is that the middle-end may never create temporaries of non-POD
(TREE_ADDRESSABLE) types, those can be only created when the language says
so and thus only the FE is allowed to create those.
This patch just reverts the behavior to what we used to do before for the
stores to volatile non-PODs. Perhaps we want to do something else, but
definitely we can't create temporaries of the non-POD type. It is up to
discussions on what should happen in those cases.
2020-07-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96194
* expr.c (expand_constructor): Don't create temporary for store to
volatile MEM if exp has an addressable type.
Because the check for power10_hw is not called
check_effective_target_power10_hw, it needs to be looked
for by is-effective-target-keyword. Also reorder things
in is-effective-target to put power10_hw with the other
ppc stuff.
2020-07-13 Aaron Sawdey <acsawdey@linux.ibm.com>
gcc/testsuite/
* lib/target-supports.exp (is-effective-target):
Reorder to put powerpc stuff together.
(is-effective-target-keyword): Add power10_hw.
Add a test for dejagnu to determine if execution of MMA instructions is
supported in the test environment. Add an execution test to make sure
that __builtin_cpu_supports("mma") is true if we can execute MMA
instructions.
Szabolcs Nagy [Thu, 28 May 2020 09:28:30 +0000 (10:28 +0100)]
doc: Clarify __builtin_return_address [PR94891]
The expected semantics and valid usage of __builtin_return_address is
not clear since it exposes implementation internals that are normally
not meaningful to portable c code.
This documentation change tries to clarify the semantics in case the
return address is stored in a mangled form. This affects AArch64 when
pointer authentication is used for the return address signing (i.e.
-mbranch-protection=pac-ret).
2020-07-13 Szabolcs Nagy <szabolcs.nagy@arm.com>
gcc/ChangeLog:
PR target/94891
* doc/extend.texi: Update the text for __builtin_return_address.
Note that a mangled address might not even fit into a void *, e.g.
with AArch64 ilp32 ABI the return address is stored as 64bit, so
the mangled return address cannot be accessed via _Unwind_GetPtr.
This patch changes the unwinder hooks as follows:
MD_POST_EXTRACT_ROOT_ADDR is removed: root address comes from
__builtin_return_address which is not mangled.
MD_POST_EXTRACT_FRAME_ADDR is renamed to MD_DEMANGLE_RETURN_ADDR,
it now operates on _Unwind_Word instead of void *, so the hook
should work when return address signing is enabled on AArch64 ilp32.
(But for that __builtin_aarch64_autia1716 should be fixed to operate
on 64bit input instead of a void *.)
MD_POST_FROB_EH_HANDLER_ADDR is removed: it is the responsibility of
__builtin_eh_return to do the mangling if necessary.
Szabolcs Nagy [Thu, 4 Jun 2020 12:42:16 +0000 (13:42 +0100)]
aarch64: fix __builtin_eh_return with pac-ret [PR94891]
Currently __builtin_eh_return takes a signed return address, which can
cause ABI and API issues: 1) pointer representation problems if the
address is passed around before eh return, 2) the source code needs
pac-ret specific changes and needs to know if pac-ret is used in the
current frame, 3) signed address may not be representible as void *
(with ilp32 abi).
Using address signing to protect eh return is ineffective because the
instruction sequence in the unwinder that starts from the address
signing and ends with a ret can be used as a return to anywhere gadget.
Using indirect branch istead of ret with bti j landing pads at the
target can reduce the potential of such gadget, which also implies
that __builtin_eh_return should not take a signed address.
This is a big hammer fix to the ABI and API issues: it turns pac-ret
off for the caller completely (not just on the eh return path). To
harden the caller against ROP attacks, it should use indirect branch
instead of ret, this is not attempted so the patch remains small and
backportable.
2020-07-13 Szabolcs Nagy <szabolcs.nagy@arm.com>
gcc/ChangeLog:
PR target/94891
* config/aarch64/aarch64.c (aarch64_return_address_signing_enabled):
Disable return address signing if __builtin_eh_return is used.
Szabolcs Nagy [Tue, 2 Jun 2020 15:44:41 +0000 (16:44 +0100)]
aarch64: fix return address access with pac [PR94891][PR94791]
This is a big hammer fix for __builtin_return_address (PR target/94891)
returning signed addresses (sometimes, depending on wether lr happens
to be signed or not at the time of call which depends on optimizations),
and similarly -pg may pass signed return address to _mcount
(PR target/94791).
At the time of return address expansion we don't know if it's signed or
not so it is done unconditionally.
Szabolcs Nagy [Thu, 2 Jul 2020 16:12:05 +0000 (17:12 +0100)]
aarch64: Fix BTI support in libitm
sjlj.S did not have the GNU property note markup and the BTI c
instructions that are necessary when it is built with branch
protection.
The notes are only added when libitm is built with branch
protection, because old linkers mishandle the note (merge
them incorrectly or emit warnings), the BTI instructions
are added unconditionally.
2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>
libitm/ChangeLog:
* config/aarch64/sjlj.S: Add BTI marking and related definitions,
and add BTI c to function entries.
Szabolcs Nagy [Thu, 2 Jul 2020 16:11:56 +0000 (17:11 +0100)]
aarch64: Fix BTI support in libgcc [PR96001]
lse.S did not have the GNU property note markup and the BTI c
instructions that are necessary when it is built with branch
protection.
The notes are only added when libgcc is built with branch
protection, because old linkers mishandle the note (merge
them incorrectly or emit warnings), the BTI instructions
are added unconditionally.
Note: BTI c is only necessary at function entry if the function
may be called indirectly, currently lse functions are not called
indirectly, but BTI is added for ABI reasons e.g. to allow
linkers later to emit stub code with indirect jump.
2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>
libgcc/ChangeLog:
PR target/96001
* config/aarch64/lse.S: Add BTI marking and related definitions,
and add BTI c to function entries.
Julian Brown [Fri, 5 Jun 2020 21:46:41 +0000 (14:46 -0700)]
openacc: Don't strip TO_PSET/POINTER for enter/exit data
OpenACC 2.6 specifies that the array descriptor (when present) must be
copied to the target before attaching pointers in Fortran. This patch
reverses the stripping of GOMP_MAP_TO_PSET and GOMP_MAP_POINTER that
was introduced by the "OpenACC reference count overhaul" patch.
2020-07-10 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
gcc/
* gimplify.c (gimplify_scan_omp_clauses): Do not strip
GOMP_MAP_TO_PSET/GOMP_MAP_POINTER for OpenACC enter/exit data
directives (see also PR92929).
Julian Brown [Fri, 22 May 2020 11:06:10 +0000 (04:06 -0700)]
openacc: Adjust dynamic reference count semantics
This patch adjusts how dynamic reference counts work so that they match
the semantics of the source program more closely, instead of representing
"excess" reference counts beyond those that represent pointers in the
internal libgomp splay-tree data structure. This allows some corner
cases to be handled more gracefully.
2020-07-10 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* libgomp.h (struct splay_tree_key_s): Change virtual_refcount to
dynamic_refcount.
(struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA.
* oacc-mem.c (acc_map_data): Substitute virtual_refcount for
dynamic_refcount.
(acc_unmap_data): Update comment.
(goacc_map_var_existing, goacc_enter_datum): Adjust for
dynamic_refcount semantics.
(goacc_exit_datum_1, goacc_exit_datum): Re-add some error checking.
Adjust for dynamic_refcount semantics.
(goacc_enter_data_internal): Implement "present" case of dynamic
memory-map handling here. Update "non-present" case for
dynamic_refcount semantics.
(goacc_exit_data_internal): Use goacc_exit_datum_1.
* target.c (gomp_map_vars_internal): Remove
GOMP_MAP_VARS_OPENACC_ENTER_DATA handling. Update for dynamic_refcount
handling.
(gomp_unmap_vars_internal): Remove virtual_refcount handling.
(gomp_load_image_to_device): Substitute dynamic_refcount for
virtual_refcount.
* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Remove XFAILs.
* testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Remove XFAILs and
trace output.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Remove
trace output.
* testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90: New
test.
* testsuite/libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c:
Remove stale comment.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-1.f90: Remove XFAILs.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-2.F90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-3-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust XFAIL.
Julian Brown [Tue, 30 Jun 2020 09:15:56 +0000 (02:15 -0700)]
openacc: Helper functions for enter/exit data using single mapping
This patch factors out the parts of goacc_enter_datum and
goacc_exit_datum that can be shared with goacc_enter_data_internal
and goacc_exit_data_internal respectively (in the next patch),
without overloading function return values or complicating code paths
unnecessarily.
2020-07-10 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* oacc-mem.c (goacc_map_var_existing): New function.
(goacc_enter_datum): Use above function.
(goacc_exit_datum_1): New function.
(goacc_exit_datum): Use above function.
Julian Brown [Thu, 11 Jun 2020 13:43:59 +0000 (06:43 -0700)]
openacc: GOMP_MAP_ATTACH handling in find_group_last
Arrange for GOMP_MAP_ATTACH to be grouped together with a preceding
GOMP_MAP_TO_PSET or other "to" data movement clause, except in cases
where an explicit "attach" clause is used.
2020-07-09 Julian Brown <julian@codesourcery.com>
include/
* gomp-constants.h (gomp_map_kind): Update comment for GOMP_MAP_TO_PSET.
libgomp/
* oacc-mem.c (find_group_last): Group data-movement clauses
(GOMP_MAP_TO_PSET, GOMP_MAP_TO, etc.) together with a subsequent
GOMP_MAP_ATTACH. Allow standalone GOMP_MAP_ATTACH also.
Jakub Jelinek [Mon, 13 Jul 2020 16:25:53 +0000 (18:25 +0200)]
ipa-fnsummary: Fix ICE with switch predicates [PR96130]
The following testcase ICEs since r10-3199.
There is a switch with default label, where the controlling expression has
range just 0..7 and there are case labels for all those 8 values, but
nothing has yet optimized away the default.
Since r10-3199, set_switch_stmt_execution_predicate sets the switch to
default label's edge's predicate to a false predicate and then
compute_bb_predicates propagates the predicates through the cfg, but false
predicates aren't really added. The caller of compute_bb_predicates
in one place handles NULL bb->aux as false predicate:
if (fbi.info)
{
if (bb->aux)
bb_predicate = *(predicate *) bb->aux;
else
bb_predicate = false;
}
else
bb_predicate = true;
but then in two further spots that the patch below is changing
it assumes bb->aux must be non-NULL. Those two spots are guarded by a
condition that is only true if fbi.info is non-NULL, so I think the right
fix is to treat NULL aux as false predicate in those spots too.
Marek Polacek [Fri, 10 Jul 2020 00:44:05 +0000 (20:44 -0400)]
c++: Fix tentative parsing of enum-specifier [PR96077]
Here's an interesting issue: in this code a ) is missing:
enum { E = (2 } e;
but we compile the code anyway, and E is set to 0 in build_enumerator,
which is sneaky.
The problem is that cp_parser_enum_specifier parses tentatively, because
when we see the enum keyword, we don't know yet if we'll find an
enum-specifier, opaque-enum-declaration, or elaborated-enum-specifier.
In this test when we call cp_parser_enumerator_list we're still parsing
tentatively, and as a consequence, parens.require_close (parser) in
cp_parser_primary_expression doesn't report any errors. But we only go
on to parse the enumerator-list after we've seen a {, at which point we
might as well commit -- we know we're dealing with an enum-specifier.
gcc/cp/ChangeLog:
PR c++/96077
* parser.c (cp_parser_enum_specifier): Commit to tentative parse
after we've seen an opening brace.
Richard Biener [Mon, 13 Jul 2020 09:41:16 +0000 (11:41 +0200)]
fix global variable alignment for testcase gcc.dg/torture/pr96133.c
The testcase was errorneously accessing the global variable via a
type that might require bigger alignment than provided. Fix that
via an appropriate attribute.
2020-07-13 Richard Biener <rguenther@suse.de>
PR testsuite/96180
* gcc.dg/torture/pr96133.c: Align global variable.
PR94600: fix volatile access to the whole of a compound object.
The store to the whole of each volatile object was picked apart
like there had been an individual assignment to each of the
fields. Reads were added as part of that; see PR for details.
The reads from volatile memory were a clear bug; individual
stores questionable. A separate patch clarifies the docs.
gcc:
2020-07-09 Richard Biener <rguenther@suse.de>
PR middle-end/94600
* expr.c (expand_constructor): Make a temporary also if we're
storing to volatile memory.
Jakub Jelinek [Thu, 2 Jul 2020 09:38:20 +0000 (11:38 +0200)]
tree-cfg: Fix ICE with switch stmt to unreachable opt and forced labels [PR95857]
The following testcase ICEs, because during the cfg cleanup, we see:
switch (i$e_11) <default: <L12> [33.33%], case -3: <lab2> [33.33%], case 0: <L10> [33.33%], case 2: <lab2> [33.33%]>
...
lab2:
__builtin_unreachable ();
where lab2 is FORCED_LABEL. The way it works, we go through the case labels
and when we reach the first one that points to gimple_seq_unreachable*
basic block, we remove the edge (if any) from the switch bb to the bb
containing the label and bbs reachable only through that edge we've just
removed. Once we do that, we must throw away all other cases that use
the same label (or some other labels from the same bb we've removed the edge
to and the bb). To avoid quadratic behavior, this is not done by walking
all remaining cases immediately before removing, but only when processing
them later.
For normal labels this works, fine, if the label is in a deleted bb, it will
have NULL label_to_block and we handle that case, or, if the unreachable bb
has some other edge to it, only the edge will be removed and not the bb,
and again, find_edge will not find the edge and we only remove the case.
And if a label would be to some other block, that other block wouldn't have
been removed earlier because there would be still an edge from the switch
block.
Now, FORCED_LABEL (and I think DECL_NONLOCAL too) break this, because
those labels aren't removed, but instead moved to some surrounding basic
block. So, when we later process those, when their gimple_seq_unreachable*
basic block is removed, label_to_block will return some unrelated block
(in the testcase the switch bb), so we decide to keep the case which doesn't
seem to be unreachable, but we don't really have an edge from the switch
block to the block the label got moved to.
I thought first about punting in gimple_seq_unreachable* on
FORCED_LABEL/DECL_NONLOCAL labels, but that might penalize even code that
doesn't care, so this instead just makes sure that for
FORCED_LABEL/DECL_NONLOCAL labels that are being removed (and thus moved
randomly) we remember in a hash_set the fact that those labels should be
treated as removed for the purpose of the optimization, and later on
handle those labels that way.
2020-07-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/95857
* tree-cfg.c (group_case_labels_stmt): When removing an unreachable
base_bb, remember all forced and non-local labels on it and later
treat those as if they have NULL label_to_block. Formatting fix.
Fix a comment typo.
Bill Seurer [Fri, 10 Jul 2020 22:30:26 +0000 (17:30 -0500)]
rs6000: Fix __builtin_altivec_mask_for_load to use correct type
gcc/ChangeLog:
PR target/95581
* config/rs6000/rs6000-call.c: Add new type v16qi_ftype_pcvoid.
(altivec_init_builtins) Change __builtin_altivec_mask_for_load to use
v16qi_ftype_pcvoid with correct number of parameters.