For rtx like (eq:HI (V8SI 90) (V8SI 91)), cse will take it as a
boolean value and try to do some optimization. But it is not true for
vector compare, also other places in rtl passes hold the same
assumption.
Peter Bergner [Sat, 8 Aug 2020 16:54:48 +0000 (11:54 -0500)]
rs6000: MMA built-ins reject typedefs of MMA types
We do not allow conversions between the MMA types and other types.
However, we are being too strict in not matching MMA types with
typdefs of those types. Use TYPE_CANONICAL to see through the
types to their canonical types before comparing them.
2020-08-08 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/96530
* config/rs6000/rs6000.c (rs6000_invalid_conversion): Use canonical
types for type comparisons. Refactor code to simplify it.
gcc/testsuite/
PR target/96530
* gcc.target/powerpc/pr96530.c: New test.
Peter Bergner [Thu, 6 Aug 2020 15:03:03 +0000 (10:03 -0500)]
rs6000: Don't ICE when spilling an MMA accumulator
When we spill an accumulator that has a known zero value, LRA will emit
a new (set (reg:PXI ...) 0) insn, but it does not use the mma_xxsetaccz
pattern to do it, leading to an unrecognized insn ICE. The solution here
is to move the xxsetaccz instruction into the movpxi pattern and have the
xxsetaccz pattern call the move pattern.
2020-08-06 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/96446
* config/rs6000/mma.md (*movpxi): Add xxsetaccz generation.
Disable split for zero constant source operand.
(mma_xxsetaccz): Change to define_expand. Call gen_movpxi.
gcc/testsuite/
PR target/96446
* gcc.target/powerpc/pr96446.c: New test.
Jonathan Wakely [Fri, 7 Aug 2020 19:29:11 +0000 (20:29 +0100)]
libstdc++: Fix ambiguous comparisons in __gnu_debug::bitset [PR 96303]
With -pedantic the debug mode bitset has an ambiguous equality
comparison operator, because it tries to compare the non-debug base to
the debug object. The base object can be converted to another debug
bitset, making the same operator== a candidate again.
The fix is to do the comparison on both base objects, so the operator
for the derived type isn't a candidate.
For the inequality operator the same change should be done, but that
operator can be removed entirely for C++20 because it can be synthesized
by the compiler.
I don't think either equality or inequality operators are really needed,
because the public _GLIBCXX_STD_C::bitset base class cam always be
compared using its own comparison operators. I'm not changing that here
though.
libstdc++-v3/ChangeLog:
PR libstdc++/96303
* include/debug/bitset (bitset::operator==): Call _M_base() on
right operand.
(bitset::operator!=): Likewise, but don't define it at all when
default comparisons are supported by the compiler.
* testsuite/23_containers/bitset/operations/96303.cc: New test.
Tamar Christina [Mon, 3 Aug 2020 11:03:17 +0000 (12:03 +0100)]
AArch64: Fix hwasan failure in readline.
My previous fix added an unchecked call to fgets in the new function readline.
fgets can fail when there's an error reading the file in which case it returns
NULL. It also returns NULL when the next character is EOF.
The EOF case is already covered by the existing code but the error case isn't.
This fixes it by returning the empty string on error.
Also I now use strnlen instead of strlen to make sure we never read outside the
buffer.
This was flagged by Matthew Malcomson during his hwasan work.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.c (readline): Check return value fgets.
Tamar Christina [Fri, 17 Jul 2020 12:13:12 +0000 (13:13 +0100)]
AArch64: Add test for -mcpu=native
This adds some tests to the GCC testsuite for testing the
-mcpu=native code.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/aarch64-cpunative.exp: New test.
* gcc.target/aarch64/cpunative/info_0: New test.
* gcc.target/aarch64/cpunative/info_1: New test.
* gcc.target/aarch64/cpunative/info_10: New test.
* gcc.target/aarch64/cpunative/info_11: New test.
* gcc.target/aarch64/cpunative/info_12: New test.
* gcc.target/aarch64/cpunative/info_13: New test.
* gcc.target/aarch64/cpunative/info_14: New test.
* gcc.target/aarch64/cpunative/info_15: New test.
* gcc.target/aarch64/cpunative/info_2: New test.
* gcc.target/aarch64/cpunative/info_3: New test.
* gcc.target/aarch64/cpunative/info_4: New test.
* gcc.target/aarch64/cpunative/info_5: New test.
* gcc.target/aarch64/cpunative/info_6: New test.
* gcc.target/aarch64/cpunative/info_7: New test.
* gcc.target/aarch64/cpunative/info_8: New test.
* gcc.target/aarch64/cpunative/info_9: New test.
* gcc.target/aarch64/cpunative/native_cpu_0.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_1.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_10.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_11.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_12.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_13.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_14.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_15.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_2.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_3.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_4.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_5.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_6.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_7.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_8.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_9.c: New test.
Tamar Christina [Fri, 17 Jul 2020 12:10:28 +0000 (13:10 +0100)]
AArch64: Fix bugs in -mcpu=native detection.
This patch fixes a couple of issues in AArch64's -mcpu=native processing:
The buffer used to read the lines from /proc/cpuinfo is 128 bytes long. While
this was enough in the past with the increase in architecture extensions it is
no longer enough. It results in two bugs:
1) No option string longer than 127 characters is correctly parsed. Features
that are supported are silently ignored.
2) It incorrectly enables features that are not present on the machine:
a) It checks for substring matching instead of full word matching. This makes
it incorrectly detect sb support when ssbs is provided instead.
b) Due to the truncation at the 127 char border it also incorrectly enables
features due to the full feature being cut off and the part that is left
accidentally enables something else.
This breaks -mcpu=native detection on some of our newer system.
The patch fixes these issues by reading full lines up to the \n in a string.
This gives us the full feature line. Secondly it creates a set from this string
to:
1) Reduce matching complexity from O(n*m) to O(n*logm).
2) Perform whole word matching instead of substring matching.
To make this code somewhat cleaner I also changed from using char* to using
std::string and std::set.
Note that I have intentionally avoided the use of ifstream and stringstream
to make it easier to backport. I have also not change the substring matching
for the initial line classification as I cannot find a documented cpuinfo format
which leads me to believe there may be kernels out there that require this which
may be why the original code does this.
I also do not want this to break if the kernel adds a new line that is long and
indents the file by two tabs to keep everything aligned. In short I think an
imprecise match is the right thing here.
Test for this is added as the last thing in this series as it requires some
changes to be made to be able to test this.
arm: Clear canary value after stack_protect_test [PR96191]
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/arm/arm.md (arm_stack_protect_test_insn): Zero out
operand 2 after use.
* config/arm/thumb1.md (thumb1_stack_protect_test_insn): Likewise.
gcc/testsuite/
* gcc.target/arm/stack-protector-1.c: New test.
* gcc.target/arm/stack-protector-2.c: Likewise.
aarch64: Clear canary value after stack_protect_test [PR96191]
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/aarch64/aarch64.md (stack_protect_test_<mode>): Set the
CC register directly, instead of a GPR. Replace the original GPR
destination with an extra scratch register. Zero out operand 3
after use.
(stack_protect_test): Update accordingly.
gcc/testsuite/
PR target/96191
* gcc.target/aarch64/stack-protector-1.c: New test.
* gcc.target/aarch64/stack-protector-2.c: Likewise.
Marek Polacek [Tue, 4 Aug 2020 13:35:25 +0000 (09:35 -0400)]
c++: Template keyword following :: [PR96082]
In r9-4235 I tried to make sure that the template keyword follows
a nested-name-specifier. :: is a valid nested-name-specifier, so
I also have to check 'globalscope' before giving the error.
gcc/cp/ChangeLog:
PR c++/96082
* parser.c (cp_parser_elaborated_type_specifier): Allow
'template' following ::.
gcc/testsuite/ChangeLog:
PR c++/96082
* g++.dg/template/template-keyword3.C: New test.
This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.
This mitigation replaces BLR instructions with a BL to a stub which uses
a BR to jump to the original value. These function stubs are then
appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.
When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.
When optimising for size we use one set of stubs for all functions.
This set of stubs can have human readable names, and we are using
`__call_indirect_x<N>` for register x<N>.
When BTI branch protection is enabled the BLR instruction can jump to a
`BTI c` instruction using any register, while the BR instruction can
only jump to a `BTI c` instruction using the x16 or x17 registers.
Hence, in order to ensure this transformation is safe we mov the value
of the original register into x16 and use x16 for the BR.
As an example when optimising for size:
a
BLR x0
instruction would get transformed to something like
BL __call_indirect_x0
where __call_indirect_x0 labels a thunk that contains
__call_indirect_x0:
MOV X16, X0
BR X16
<speculation barrier>
The first version of this patch used local symbols specific to a
compilation unit to try and avoid relocations.
This was mistaken since functions coming from the same compilation unit
can still be in different sections, and the assembler will insert
relocations at jumps between sections.
On any relocation the linker is permitted to emit a veneer to handle
jumps between symbols that are very far apart. The registers x16 and
x17 may be clobbered by these veneers.
Hence the function stubs cannot rely on the values of x16 and x17 being
the same as just before the function stub is called.
Similar can be said for the hot/cold partitioning of single functions,
so function-local stubs have the same restriction.
This updated version of the patch never emits function stubs for x16 and
x17, and instead forces other registers to be used.
Given the above, there is now no benefit to local symbols (since they
are not enough to avoid dealing with linker intricacies). This patch
now uses global symbols with hidden visibility each stored in their own
COMDAT section. This means stubs can be shared between compilation
units while still avoiding the PLT indirection.
This patch also removes the `__call_indirect_x30` stub (and
function-local equivalent) which would simply jump back to the original
location.
The function-local stubs are emitted to the assembly output file in one
chunk, which means we need not add the speculation barrier directly
after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.
The global stubs are emitted in COMDAT/.linkonce sections by
themselves so that the linker can remove duplicates from multiple object
files. This means they are not emitted in one chunk, and each one must
include the speculation barrier.
Another difference is that since the global stubs are shared across
compilation units we do not know that all functions will be targeting an
architecture supporting the SB instruction.
Rather than provide multiple stubs for each architecture, we provide a
stub that will work for all architectures -- using the DSB+ISB barrier.
This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
instruction. This code sequence is part of the binary interface between
compiler and linker. If this BLR instruction needs to be mitigated, it'd
probably be best to do so in the linker. It seems that the code sequence
for thread-local variable access is unlikely to lead to a Spectre Revalation
Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
It seems that at most only after the last PLT stub a Spectre Revalation
Gadget might appear.
Testing:
Bootstrap and regtest on AArch64
(with BOOT_CFLAGS="-mharden-sls=retbr,blr")
Used a temporary hack(1) in gcc-dg.exp to use these options on every
test in the testsuite, a slight modification to emit the speculation
barrier after every function stub, and a script to check that the
output never emitted a BLR, or unmitigated BR or RET instruction.
Similar on an aarch64-none-elf cross-compiler.
1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
a) Every RET or BR is immediately followed by a speculation barrier.
b) No BLR instruction is emitted by compiler.
* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
New declaration.
* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
stub registers class.
(aarch64_class_max_nregs): Likewise.
(aarch64_register_move_cost): Likewise.
(aarch64_sls_shared_thunks): Global array to store stub labels.
(aarch64_sls_emit_function_stub): New.
(aarch64_create_blr_label): New.
(aarch64_sls_emit_blr_function_thunks): New.
(aarch64_sls_emit_shared_blr_thunks): New.
(aarch64_asm_file_end): New.
(aarch64_indirect_call_asm): New.
(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
(TARGET_ASM_FUNCTION_EPILOGUE): Use
aarch64_sls_emit_blr_function_thunks.
* config/aarch64/aarch64.h (STB_REGNUM_P): New.
(enum reg_class): Add STUB_REGS class.
(machine_function): Introduce `call_via` array for
function-local stub labels.
* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
aarch64_indirect_call_asm to emit code when hardening BLR
instructions.
* config/aarch64/constraints.md (Ucr): New constraint
representing registers for indirect calls. Is GENERAL_REGS
usually, and STUB_REGS when hardening BLR instruction against
SLS.
* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
is also a general register.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.
aarch64: Introduce SLS mitigation for RET and BR instructions
Instructions following RET or BR are not necessarily executed. In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.
Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.
The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.
We add tests for each of the cases where such an instruction was seen.
This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction. We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.
There is one particular case which is slightly tricky. The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against. The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.
Testing:
Bootstrap and regtest done on aarch64-none-linux
Used a temporary hack(1) to use these options on every test in the
testsuite and a script to check that the output never emitted an
unmitigated RET or BR.
1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.
* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
speculation barrier after BR instruction if needs be.
(aarch64_trampoline_init): Handle ptr_mode value & adjust size
of code copied.
(aarch64_sls_barrier): New.
(aarch64_asm_trampoline_template): Add needed barriers.
* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
(TARGET_SB): New.
(TRAMPOLINE_SIZE): Account for barrier.
* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
simple_return, *do_return, *sibcall_insn, *sibcall_value_insn):
Emit barrier if needs be, also account for possible barrier using
"sls_length" attribute.
(sls_length): New attribute.
(length): Determine default using any non-default sls_length
value.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
New test.
* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
New proc.
aarch64: New Straight Line Speculation (SLS) mitigation flags
Here we introduce the flags that will be used for straight line speculation.
The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list of one
or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation barrier
inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function stub
using a BR with a speculation barrier after it.
Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.
Jakub Jelinek [Tue, 4 Aug 2020 09:33:18 +0000 (11:33 +0200)]
veclower: Don't ICE on .VEC_CONVERT calls with no lhs [PR96426]
.VEC_CONVERT is a const internal call, so normally if the lhs is not used,
we'd DCE it far before getting to veclower, but with -O0 (or perhaps
-fno-tree-dce and some other -fno-* options) it can happen.
But as the internal fn needs the lhs to know the type to which the
conversion is done (and I think that is a reasonable representation, having
some magic another argument and having to create constants with that type
looks overkill to me), we just should DCE those calls ourselves.
During veclower, we can't really remove insns, as the callers would be
upset, so this just replaces it with a GIMPLE_NOP.
2020-08-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96426
* tree-vect-generic.c (expand_vector_conversion): Replace .VEC_CONVERT
call with GIMPLE_NOP if there is no lhs.
Jakub Jelinek [Tue, 4 Aug 2020 09:31:44 +0000 (11:31 +0200)]
gimple-fold: Fix ICE in maybe_canonicalize_mem_ref_addr on debug stmt [PR96354]
In debug stmts, we are less strict about what is and what is not accepted
there, so this patch just punts on optimization of a debug stmt rather than
ICEing.
2020-08-04 Jakub Jelinek <jakub@redhat.com>
PR debug/96354
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Add IS_DEBUG
argument. Return false instead of gcc_unreachable if it is true and
get_addr_base_and_unit_offset returns NULL.
(fold_stmt_1) <case GIMPLE_DEBUG>: Adjust caller.
Jakub Jelinek [Mon, 3 Aug 2020 20:55:28 +0000 (22:55 +0200)]
aarch64: Fix up __aarch64_cas16_acq_rel fallback
As mentioned in the PR, the fallback path when LSE is unavailable writes
incorrect registers to the memory if the previous content compares equal
to x0, x1 - it writes copy of x0, x1 from the start of function, but it
should write x2, x3.
2020-08-03 Jakub Jelinek <jakub@redhat.com>
PR target/96402
* config/aarch64/lse.S (__aarch64_cas16_acq_rel): Use x2, x3 instead
of x(tmp0), x(tmp1) in STXP arguments.
Qian Jianhua [Mon, 3 Aug 2020 13:03:25 +0000 (14:03 +0100)]
aarch64: Add A64FX machine model
This patch add support for Fujitsu A64FX, as the first step of adding
A64FX machine model.
A64FX is used in FUJITSU Supercomputer PRIMEHPC FX1000,
PRIMEHPC FX700, and supercomputer Fugaku.
The official microarchitecture information of A64FX can be read at
https://github.com/fujitsu/A64FX.
2020-08-03 Qian jianhua <qianjh@cn.fujitsu.com>
gcc/
* config/aarch64/aarch64-cores.def (a64fx): New core.
* config/aarch64/aarch64-tune.md: Regenerated.
* config/aarch64/aarch64.c (a64fx_prefetch_tune, a64fx_tunings): New.
* doc/invoke.texi: Add a64fx to the list.
One of the problems in this PR was that if we had:
vector_type1 array[] = { vector_value1 };
process_init_element would only treat vector_value1 as initialising
a vector_type1 if they had the same TYPE_MAIN_VARIANT. This has
several problems:
(1) It gives confusing error messages if the vector types are
incompatible. (Tested by gcc.dg/pr96377-1.c.)
(2) It means that we reject code that should be valid with
-flax-vector-conversions. (Tested by gcc.dg/pr96377-2.c.)
(3) On arm and aarch64 targets, it means that we reject some
initializers that mix Advanced SIMD and standard GNU vectors.
These vectors have traditionally had different TYPE_MAIN_VARIANTs
because they have different mangling schemes. (Tested by
gcc.dg/pr96377-[3-6].c.)
(4) It means that we reject SVE initializers that should be valid.
(Tested by gcc.target/aarch64/sve/gnu_vectors_[34].c.)
because applying the binary operator to arm_neon_value1 strips
the "Advanced SIMD type" attributes that were added in that patch.
Stripping the attributes is problematic for other reasons though,
so that still needs to be fixed separately.
gcc/c/
PR c/96377
* c-typeck.c (process_init_element): Split test for whether to
recurse into a record, union or array into...
(initialize_elementwise_p): ...this new function. Don't recurse
into a vector type if the initialization value is also a vector.
Both intrinsics did not handle the case where the va_list object comes
from a ref parameter.
gcc/d/ChangeLog:
PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().
preprocessor: Teach traditional about has_include [PR95889]
Traditional cpp (used by fortran) didn;t know about the new
__has_include__ implementation. Hey, since when did traditional cpp
grow __has_include__? That wasn't in knr!
Julian Brown [Thu, 18 Jun 2020 12:11:08 +0000 (05:11 -0700)]
openacc: Deep copy attach/detach should not affect reference counts
Attach and detach operations are not supposed to affect structural or
dynamic reference counts for OpenACC. Previously they did so, which led to
subtle problems in some circumstances. We can avoid reference-counting
attach/detach operations by extending and slightly repurposing the
do_detach field in target_var_desc. It is now called is_attach to better
reflect its new role.
2020-07-27 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* libgomp.h (struct target_var_desc): Rename do_detach field to
is_attach.
* oacc-mem.c (goacc_exit_datum_1): Add assert. Don't set finalize for
GOMP_MAP_FORCE_DETACH. Update checking to use is_attach field.
(goacc_enter_data_internal): Don't affect reference counts
for attach mappings.
(goacc_exit_data_internal): Don't affect reference counts for detach
mappings.
* target.c (gomp_map_vars_existing): Don't affect reference counts for
attach mappings.
(gomp_map_vars_internal): Set renamed is_attach flag unconditionally to
mark attach mappings.
(gomp_unmap_vars_internal): Use is_attach flag to prevent affecting
reference count for attach mappings.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
test as shouldfail.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
gracefully in no-finalize mode.
Julian Brown [Thu, 2 Jul 2020 21:18:20 +0000 (14:18 -0700)]
openacc: Remove unnecessary detach finalization
The call to gomp_detach_pointer in gomp_unmap_vars_internal does not
need to force finalization, and doing so may mask mismatched pointer
attachments/detachments. This patch removes the forcing.
2020-07-16 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
libgomp/
* target.c (gomp_unmap_vars_internal): Remove unnecessary forcing of
finalization for detach operation.
* testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c:
New test.
coroutines: Correct frame capture of compiler temps [PR95591+4].
When a full expression contains a co_await (or co_yield), this means
that it might be suspended; which would destroy temporary variables
(on a stack for example). However the language guarantees that such
temporaries are live until the end of the expression.
In order to preserve this, we 'promote' temporaries where necessary
so that they are saved in the coroutine frame (which allows them to
remain live potentially until the frame is destroyed). In addition,
sub-expressions that produce control flow (such as TRUTH_AND/OR_IF
or COND_EXPR) must be handled specifically to ensure that await
expressions are properly expanded.
This patch corrects two mistakes in which we were (a) failing to
promote some temporaries and (b) we we failing to sequence DTORs for
the captures properly. This manifests in a number of related (but not
exact duplicate) PRs.
The revised code collects the actions into one place and maps all the
control flow into one form - a benefit of this is that co_returns are
now expanded earlier (which provides an opportunity to address PR95517
in some future patch).
We replace a statement that contains await expression(s) with a bind
scope that has a variable list describing the temporaries that have
been 'promoted' and a statement list that contains a series of cleanup
expressions for each of those. Where we encounter nested conditional
expressions, these are wrapped in a try-finally block with a guard var
for each sub-expression variable that needs a DTOR. The guards are all
declared and initialized to false before the first conditional sub-
expression. The 'finally' block contains a series of if blocks (one
per guard variable) enclosing the relevant DTOR.
Variables listed in a bind scope in this manner are automatically moved
to a coroutine frame version by existing code (so we re-use that rather
than having a separate mechanism).
gcc/cp/ChangeLog:
PR c++/95591
PR c++/95599
PR c++/95823
PR c++/95824
PR c++/95895
* coroutines.cc (struct coro_ret_data): Delete.
(coro_maybe_expand_co_return): Delete.
(co_return_expander): Delete.
(expand_co_returns): Delete.
(co_await_find_in_subtree): Remove unused name.
(build_actor_fn): Remove unused parm, remove handling
for co_return expansion.
(register_await_info): Demote duplicate info message to a
warning.
(coro_make_frame_entry): Move closer to use site.
(struct susp_frame_data): Add fields for final suspend label
and a flag to indicate await expressions with initializers.
(captures_temporary): Delete.
(register_awaits): Remove unused code, update comments.
(find_any_await): New.
(tmp_target_expr_p): New.
(struct interesting): New.
(find_interesting_subtree): New.
(struct var_nest_node): New.
(flatten_await_stmt): New.
(handle_nested_conditionals): New.
(process_conditional): New.
(replace_statement_captures): Rename to...
(maybe_promote_temps): ... this.
(maybe_promote_captured_temps): Delete.
(analyze_expression_awaits): Check for await expressions with
initializers. Simplify handling for truth-and/or-if.
(expand_one_truth_if): Simplify (map cases that need expansion
to COND_EXPR).
(await_statement_walker): Handle CO_RETURN_EXPR. Simplify the
handling for truth-and/or-if expressions.
(register_local_var_uses): Ensure that we create names in the
implementation namespace.
(morph_fn_to_coro): Add final suspend label to suspend frame
callback data and remove it from the build_actor_fn call.
gcc/testsuite/ChangeLog:
PR c++/95591
PR c++/95599
PR c++/95823
PR c++/95824
PR c++/95895
* g++.dg/coroutines/pr95591.C: New test.
* g++.dg/coroutines/pr95599.C: New test.
* g++.dg/coroutines/pr95823.C: New test.
* g++.dg/coroutines/pr95824.C: New test.
Martin Liska [Mon, 27 Jul 2020 10:30:24 +0000 (12:30 +0200)]
expr: build string_constant only for a char type
gcc/ChangeLog:
PR tree-optimization/96058
* expr.c (string_constant): Build string_constant only
for a type that has same precision as char_type_node
and is an integral type.
Jakub Jelinek [Tue, 28 Jul 2020 09:08:29 +0000 (11:08 +0200)]
expander: Fix ICE in maybe_warn_rdwr_sizes [PR96335]
The following testcase ICEs in maybe_warn_rdwr_sizes. The problem is that
the caller uses its fndecl and fntype variables to fill up rdwr_map, and
the fntype in that case is a prototype with the access attribute and all
the checks needed for that performed. But the maybe_warn_rdwr_sizes
function tries to rediscover fndecl/fntype itself and does it differently
from how the caller did (for fndecl get_callee_fndecl and fntype from that
FUNCTION_DECL, otherwise sets fntype to CALL_EXPR_FN's type).
On the testcase, get_callee_fndecl does find a FUNCTION_DECL because
it does STRIP_NOPS in between.
Instead of trying to rediscover those, this patch just passes them down,
like is done in several other functions.
2020-07-28 Jakub Jelinek <jakub@redhat.com>
PR middle-end/96335
* calls.c (maybe_warn_rdwr_sizes): Add FNDECL and FNTYPE arguments,
instead of trying to rediscover them in the body.
(initialize_argument_information): Adjust caller.
Martin Liska [Fri, 24 Jul 2020 12:33:27 +0000 (14:33 +0200)]
Use vec::reserve before vec_safe_grow_cleared is called
gcc/ChangeLog:
PR lto/45375
* symbol-summary.h: Call vec_safe_reserve before grow is called
in order to grow to a reasonable size.
* vec.h (vec_safe_reserve): Add missing function for vl_ptr
type.
Mark Eggleston [Thu, 11 Jun 2020 10:05:40 +0000 (11:05 +0100)]
Fortran : ICE in gfc_check_pointer_assign PR95612
Output an error if the right hand value is a zero sized array or
does not have a symbol tree otherwise continue checking.
2020-07-27 Steven G. Kargl <kargl@gcc.gnu.org>
gcc/fortran/
PR fortran/95612
* expr.c (gfc_check_pointer_assigb): Output an error if
rvalue is a zero sized array or output an error if rvalue
doesn't have a symbol tree.
2020-07-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95612
* gfortran.dg/pr95612.f90: New test.
PR 93567, G edit descriptor uses E instead of F editing in rounding mode UP.
The switch between FMT_E and FMT_F is based on the absolute value.
Set r=0 for rounding toward zero and r = 1 otherwise.
If (exp_d - m) == 1 there is no rounding needed.
libgfortran/ChangeLog:
PR fortran/93567
* io/write_float.def (determine_en_precision): Fix switch between
FMT_E and FMT_F.
gcc/testsuite/ChangeLog:
PR fortran/93567
* gfortran.dg/round_3.f08: Add test cases.
sparc/sparc64: use crtendS.o for default-pie executables [PR96190]
In --enable-default-pie mode compiler should switch from
using crtend.o to crtendS.o. On sparc it is especially important
because crtend.o contains PIC-unfriendly code.
We use GNU_USER_TARGET_ENDFILE_SPEC as a baseline spec to get
crtendS.o instead of crtend.o in !no-pie mode.
gcc:
2020-07-14 Sergei Trofimovich <siarheit@google.com>
PR target/96190
* config/sparc/linux.h (ENDFILE_SPEC): Use GNU_USER_TARGET_ENDFILE_SPEC
to get crtendS.o for !no-pie mode.
* config/sparc/linux64.h (ENDFILE_SPEC): Ditto.
Harald Anlauf [Mon, 6 Jul 2020 16:58:23 +0000 (18:58 +0200)]
PR fortran/95980 - ICE on using sync images with -fcheck=bounds
In SELECT TYPE, the argument may be an incorrectly specified unlimited
polymorphic variable. Avoid a NULL pointer dereference for clean error
recovery.
gcc/fortran/
PR fortran/95980
* match.c (copy_ts_from_selector_to_associate, build_class_sym):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.
* resolve.c (resolve_select_type):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.
Peter Bergner [Wed, 22 Jul 2020 16:44:35 +0000 (11:44 -0500)]
rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode
PR96236 shows a problem where we don't correctly store our 512-bit accumulators
correctly in little-endian mode. The patch below detects when we're doing a
little-endian memory access and stores to the correct memory locations.
Thomas Koenig [Sun, 19 Jul 2020 15:27:45 +0000 (17:27 +0200)]
Always use name from c_interop_kinds_table for -fc-prototypes.
When a user specified a KIND that was a parameter taking the value
of an iso_c_binding KIND, the code used the name of that parameter
to look up the type name. Corrected by always looking it up in
the table of C interop kinds (which was previously done for
non-C-interop types, anyway).
gcc/fortran/ChangeLog:
PR fortran/96220
* dump-parse-tree.c (get_c_type_name): Always use the entries from
c_interop_kinds_table to find the correct C type.
Thomas Koenig [Thu, 23 Jul 2020 15:56:09 +0000 (17:56 +0200)]
Fix handling of implicit_pure by checking if non-pure procedures are called.
Procedures are marked as implicit_pure if they fulfill the criteria of
pure procedures. In this case, a procedure was not marked as not being
implicit_pure which called another procedure, which had not yet been
marked as not being implicit_impure.
Fixed by iterating over all procedures, setting callers of procedures
which are non-pure and non-implicit_pure as non-implicit_pure and
doing this until no more procedure has been changed.
Kito Cheng [Wed, 22 Jul 2020 06:50:40 +0000 (14:50 +0800)]
PR target/96260 - KASAN should work even back-end not porting anything.
- Most KASAN function don't need any porting anything in back-end
except asan stack protection.
- However kernel will given shadow offset when enable asan stack
protection, so eveything in KASAN can work if shadow offset is given.
- Verified with x86 and risc-v.
- Verified with RISC-V linux kernel.
gcc/ChangeLog:
PR target/96260
* asan.c (asan_shadow_offset_set_p): New.
* asan.h (asan_shadow_offset_set_p): Ditto.
* toplev.c (process_options): Allow -fsanitize=kernel-address
even TARGET_ASAN_SHADOW_OFFSET not implemented, only check when
asan stack protection is enabled.