git.ipfire.org Git - thirdparty/gcc.git/log

tree-optimization/96483 - fix ICE in PRE with POLY_INT_CST

This adds a missing case for PRE expression re-materialization.

2020-08-06 Richard Biener <rguenther@suse.de>

PR tree-optimization/96483
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Handle
POLY_INT_CST.

(cherry picked from commit 1f4c8afa1b2dac97f2ee78eacafe6eee246a4dae)

Daily bump.

Backport prefixed instruction tests from master branch.

gcc/testsuite/
2020-08-04 Michael Meissner <meissner@linux.ibm.com>
David Edelsohn <dje.gcc@gmail.com>

* gcc.target/powerpc/prefix-add.c: Backport from 2020-06-27 and
2020-06-26 changes.
* gcc.target/powerpc/prefix-di-constant.c: Backport.
* gcc.target/powerpc/prefix-ds-dq.c: Backport.
* gcc.target/powerpc/prefix-large-dd.c: Backport.
* gcc.target/powerpc/prefix-large-df.c: Backport.
* gcc.target/powerpc/prefix-large-di.c: Backport.
* gcc.target/powerpc/prefix-large-hi.c: Backport.
* gcc.target/powerpc/prefix-large-kf.c: Backport.
* gcc.target/powerpc/prefix-large-qi.c: Backport.
* gcc.target/powerpc/prefix-large-sd.c: Backport.
* gcc.target/powerpc/prefix-large-sf.c: Backport.
* gcc.target/powerpc/prefix-large-si.c: Backport.
* gcc.target/powerpc/prefix-large-udi.c: Backport.
* gcc.target/powerpc/prefix-large-uhi.c: Backport.
* gcc.target/powerpc/prefix-large-uqi.c: Backport.
* gcc.target/powerpc/prefix-large-usi.c: Backport.
* gcc.target/powerpc/prefix-large-v2df.c: Backport.
* gcc.target/powerpc/prefix-large.h: Backport.
* gcc.target/powerpc/prefix-no-update.c: Backport.
* gcc.target/powerpc/prefix-pcrel-dd.c: Backport.
* gcc.target/powerpc/prefix-pcrel-df.c: Backport.
* gcc.target/powerpc/prefix-pcrel-di.c: Backport.
* gcc.target/powerpc/prefix-pcrel-hi.c: Backport.
* gcc.target/powerpc/prefix-pcrel-kf.c: Backport.
* gcc.target/powerpc/prefix-pcrel-qi.c: Backport.
* gcc.target/powerpc/prefix-pcrel-sd.c: Backport.
* gcc.target/powerpc/prefix-pcrel-sf.c: Backport.
* gcc.target/powerpc/prefix-pcrel-si.c: Backport.
* gcc.target/powerpc/prefix-pcrel-udi.c: Backport.
* gcc.target/powerpc/prefix-pcrel-uhi.c: Backport.
* gcc.target/powerpc/prefix-pcrel-uqi.c: Backport.
* gcc.target/powerpc/prefix-pcrel-usi.c: Backport.
* gcc.target/powerpc/prefix-pcrel-v2df.c: Backport.
* gcc.target/powerpc/prefix-pcrel.h: Backport.
* gcc.target/powerpc/prefix-si-constant.c: Backport.
* gcc.target/powerpc/prefix-stack-protect.c: Backport.

c++: Template keyword following :: [PR96082]

In r9-4235 I tried to make sure that the template keyword follows
a nested-name-specifier. :: is a valid nested-name-specifier, so
I also have to check 'globalscope' before giving the error.

gcc/cp/ChangeLog:

PR c++/96082
* parser.c (cp_parser_elaborated_type_specifier): Allow
'template' following ::.

gcc/testsuite/ChangeLog:

PR c++/96082
* g++.dg/template/template-keyword3.C: New test.

(cherry picked from commit 97def1f34c134d78d4423e9ac3e9b262417ea390)

doc: Add @cindex to symver attribute

When looking at the symver attr documentation in html, I found there is no
name to refer to for it.

2020-08-04 Jakub Jelinek <jakub@redhat.com>

* doc/extend.texi (symver): Add @cindex for symver function attribute.

(cherry picked from commit 73b7cb6dfcad1b3e7538ddc2b40cc327d2fe501a)

aarch64: Mitigate SLS for BLR instruction

This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.

This mitigation replaces BLR instructions with a BL to a stub which uses
a BR to jump to the original value.  These function stubs are then
appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.

When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.

When optimising for size we use one set of stubs for all functions.
This set of stubs can have human readable names, and we are using
`__call_indirect_x<N>` for register x<N>.

When BTI branch protection is enabled the BLR instruction can jump to a
`BTI c` instruction using any register, while the BR instruction can
only jump to a `BTI c` instruction using the x16 or x17 registers.
Hence, in order to ensure this transformation is safe we mov the value
of the original register into x16 and use x16 for the BR.

As an example when optimising for size:
a
    BLR x0
instruction would get transformed to something like
    BL __call_indirect_x0
where __call_indirect_x0 labels a thunk that contains
__call_indirect_x0:
    MOV X16, X0
    BR X16
    <speculation barrier>

The first version of this patch used local symbols specific to a
compilation unit to try and avoid relocations.
This was mistaken since functions coming from the same compilation unit
can still be in different sections, and the assembler will insert
relocations at jumps between sections.

On any relocation the linker is permitted to emit a veneer to handle
jumps between symbols that are very far apart.  The registers x16 and
x17 may be clobbered by these veneers.
Hence the function stubs cannot rely on the values of x16 and x17 being
the same as just before the function stub is called.

Similar can be said for the hot/cold partitioning of single functions,
so function-local stubs have the same restriction.

This updated version of the patch never emits function stubs for x16 and
x17, and instead forces other registers to be used.

Given the above, there is now no benefit to local symbols (since they
are not enough to avoid dealing with linker intricacies).  This patch
now uses global symbols with hidden visibility each stored in their own
COMDAT section.  This means stubs can be shared between compilation
units while still avoiding the PLT indirection.

This patch also removes the `__call_indirect_x30` stub (and
function-local equivalent) which would simply jump back to the original
location.

The function-local stubs are emitted to the assembly output file in one
chunk, which means we need not add the speculation barrier directly
after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.

The global stubs are emitted in COMDAT/.linkonce sections by
themselves so that the linker can remove duplicates from multiple object
files.  This means they are not emitted in one chunk, and each one must
include the speculation barrier.

Another difference is that since the global stubs are shared across
compilation units we do not know that all functions will be targeting an
architecture supporting the SB instruction.
Rather than provide multiple stubs for each architecture, we provide a
stub that will work for all architectures -- using the DSB+ISB barrier.

This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
  instruction.  This code sequence is part of the binary interface between
  compiler and linker. If this BLR instruction needs to be mitigated, it'd
  probably be best to do so in the linker. It seems that the code sequence
  for thread-local variable access is unlikely to lead to a Spectre Revalation
  Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
  It seems that at most only after the last PLT stub a Spectre Revalation
  Gadget might appear.

Testing:
  Bootstrap and regtest on AArch64
    (with BOOT_CFLAGS="-mharden-sls=retbr,blr")
  Used a temporary hack(1) in gcc-dg.exp to use these options on every
  test in the testsuite, a slight modification to emit the speculation
  barrier after every function stub, and a script to check that the
  output never emitted a BLR, or unmitigated BR or RET instruction.
  Similar on an aarch64-none-elf cross-compiler.

1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
  a) Every RET or BR is immediately followed by a speculation barrier.
  b) No BLR instruction is emitted by compiler.

(cherry-picked from 96b7f495f9269d5448822e4fc28882edb35a58d7)

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
New declaration.
* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
stub registers class.
(aarch64_class_max_nregs): Likewise.
(aarch64_register_move_cost): Likewise.
(aarch64_sls_shared_thunks): Global array to store stub labels.
(aarch64_sls_emit_function_stub): New.
(aarch64_create_blr_label): New.
(aarch64_sls_emit_blr_function_thunks): New.
(aarch64_sls_emit_shared_blr_thunks): New.
(aarch64_asm_file_end): New.
(aarch64_indirect_call_asm): New.
(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
(TARGET_ASM_FUNCTION_EPILOGUE): Use
aarch64_sls_emit_blr_function_thunks.
* config/aarch64/aarch64.h (STB_REGNUM_P): New.
(enum reg_class): Add STUB_REGS class.
(machine_function): Introduce `call_via` array for
function-local stub labels.
* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
aarch64_indirect_call_asm to emit code when hardening BLR
instructions.
* config/aarch64/constraints.md (Ucr): New constraint
representing registers for indirect calls.  Is GENERAL_REGS
usually, and STUB_REGS when hardening BLR instruction against
SLS.
* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
is also a general register.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.

aarch64: Introduce SLS mitigation for RET and BR instructions

Instructions following RET or BR are not necessarily executed.  In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.

Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.

The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.

We add tests for each of the cases where such an instruction was seen.

This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction.  We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.

There is one particular case which is slightly tricky.  The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against.  The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.

Testing:
  Bootstrap and regtest done on aarch64-none-linux
  Used a temporary hack(1) to use these options on every test in the
  testsuite and a script to check that the output never emitted an
  unmitigated RET or BR.

1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.

(cherry picked from be178ecd5ac1fe1510d960ff95c66d0ff831afe1)

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
speculation barrier after BR instruction if needs be.
(aarch64_trampoline_init): Handle ptr_mode value & adjust size
of code copied.
(aarch64_sls_barrier): New.
(aarch64_asm_trampoline_template): Add needed barriers.
* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
(TARGET_SB): New.
(TRAMPOLINE_SIZE): Account for barrier.
* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
simple_return, *do_return, *sibcall_insn, *sibcall_value_insn):
Emit barrier if needs be, also account for possible barrier using
"sls_length" attribute.
(sls_length): New attribute.
(length): Determine default using any non-default sls_length
value.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
New test.
* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
New proc.

aarch64: New Straight Line Speculation (SLS) mitigation flags

Here we introduce the flags that will be used for straight line speculation.

The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list of one
or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation barrier
inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function stub
using a BR with a speculation barrier after it.

Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.

(cherry-picked from a9ba2a9b77bec7eacaf066801f22d1c366a2bc86)

gcc/ChangeLog:

2020-06-02 Matthew Malcomson <matthew.malcomson@arm.com>

* config/aarch64/aarch64-protos.h (aarch64_harden_sls_retbr_p):
New.
(aarch64_harden_sls_blr_p): New.
* config/aarch64/aarch64.c (enum aarch64_sls_hardening_type):
New.
(aarch64_harden_sls_retbr_p): New.
(aarch64_harden_sls_blr_p): New.
(aarch64_validate_sls_mitigation): New.
(aarch64_override_options): Parse options for SLS mitigation.
* config/aarch64/aarch64.opt (-mharden-sls): New option.
* doc/invoke.texi: Document new option.

aarch64: Add missing clobber for fjcvtzs

gcc/ChangeLog

2020-07-30 Andrea Corallo <andrea.corallo@arm.com>

* config/aarch64/aarch64.md (aarch64_fjcvtzs): Add missing
clobber.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/ChangeLog

2020-07-30 Andrea Corallo <andrea.corallo@arm.com>

* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

(cherry picked from commit d2b86e14c14020f3e119ab8f462e2a91bd7d46e5)

veclower: Don't ICE on .VEC_CONVERT calls with no lhs [PR96426]

.VEC_CONVERT is a const internal call, so normally if the lhs is not used,
we'd DCE it far before getting to veclower, but with -O0 (or perhaps
-fno-tree-dce and some other -fno-* options) it can happen.
But as the internal fn needs the lhs to know the type to which the
conversion is done (and I think that is a reasonable representation, having
some magic another argument and having to create constants with that type
looks overkill to me), we just should DCE those calls ourselves.
During veclower, we can't really remove insns, as the callers would be
upset, so this just replaces it with a GIMPLE_NOP.

2020-08-04 Jakub Jelinek <jakub@redhat.com>

PR middle-end/96426
* tree-vect-generic.c (expand_vector_conversion): Replace .VEC_CONVERT
call with GIMPLE_NOP if there is no lhs.

* gcc.c-torture/compile/pr96426.c: New test.

(cherry picked from commit 95f5a3258dd8a9584f2b10304f79441ef2d4c64c)

gimple-fold: Fix ICE in maybe_canonicalize_mem_ref_addr on debug stmt [PR96354]

In debug stmts, we are less strict about what is and what is not accepted
there, so this patch just punts on optimization of a debug stmt rather than
ICEing.

2020-08-04 Jakub Jelinek <jakub@redhat.com>

PR debug/96354
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Add IS_DEBUG
argument. Return false instead of gcc_unreachable if it is true and
get_addr_base_and_unit_offset returns NULL.
(fold_stmt_1) <case GIMPLE_DEBUG>: Adjust caller.

* g++.dg/opt/pr96354.C: New test.

(cherry picked from commit fabe0ede9db9fa95832b2329d3d6156711905e20)

Daily bump.

aarch64: Fix up __aarch64_cas16_acq_rel fallback

As mentioned in the PR, the fallback path when LSE is unavailable writes
incorrect registers to the memory if the previous content compares equal
to x0, x1 - it writes copy of x0, x1 from the start of function, but it
should write x2, x3.

2020-08-03 Jakub Jelinek <jakub@redhat.com>

PR target/96402
* config/aarch64/lse.S (__aarch64_cas16_acq_rel): Use x2, x3 instead
of x(tmp0), x(tmp1) in STXP arguments.

* gcc.target/aarch64/pr96402.c: New test.

(cherry picked from commit 90b43856fdff7d96d93d22970eca8a86c56e0ddc)

cpp: Do not use @dots for ... tokens in code examples

This prevents a ... token in code examples from being turned into a
single HORIZONTAL ELLIPSIS glyph (e.g. via the HTML … entity).

gcc/ChangeLog:

* doc/cpp.texi (Variadic Macros): Use the exact ... token in
code examples.

(cherry picked from commit 2ac7fe2769890fe4c146da9cfa6d0eabb185d7db)

testsuite: Add -Wno-psabi to gcc.dg/pr96377-[12].c

2020-08-03 Richard Sandiford <richard.sandiford@arm.com>

gcc/testsuite/
* gcc.dg/pr96377-1.c: Add -Wno-psabi.
* gcc.dg/pr96377-2.c: Likewise.

(cherry picked from commit 401070d8b4e90d4363b6566836c94ec7b6f9c5a2)

doc: Add missing comma after octeontx2f95mm

gcc/
* doc/invoke.texi: Add missing comma after octeontx2f95mm entry.

(cherry picked from commit 749abaa755faf47968a9d8817deb8de19e685bbc)

aarch64: Add A64FX machine model

This patch add support for Fujitsu A64FX, as the first step of adding
A64FX machine model.

A64FX is used in FUJITSU Supercomputer PRIMEHPC FX1000,
PRIMEHPC FX700, and supercomputer Fugaku.
The official microarchitecture information of A64FX can be read at
https://github.com/fujitsu/A64FX.

2020-08-03 Qian jianhua <qianjh@cn.fujitsu.com>

gcc/
* config/aarch64/aarch64-cores.def (a64fx): New core.
* config/aarch64/aarch64-tune.md: Regenerated.
* config/aarch64/aarch64.c (a64fx_prefetch_tune, a64fx_tunings): New.
* doc/invoke.texi: Add a64fx to the list.

(cherry picked from commit 02f21aea0679c5cac094a3f575e839d44cb57a39)

c: Fix bogus vector initialisation error [PR96377]

One of the problems in this PR was that if we had:

  vector_type1 array[] = { vector_value1 };

process_init_element would only treat vector_value1 as initialising
a vector_type1 if they had the same TYPE_MAIN_VARIANT.  This has
several problems:

(1) It gives confusing error messages if the vector types are
    incompatible.  (Tested by gcc.dg/pr96377-1.c.)

(2) It means that we reject code that should be valid with
    -flax-vector-conversions.  (Tested by gcc.dg/pr96377-2.c.)

(3) On arm and aarch64 targets, it means that we reject some
    initializers that mix Advanced SIMD and standard GNU vectors.
    These vectors have traditionally had different TYPE_MAIN_VARIANTs
    because they have different mangling schemes.  (Tested by
    gcc.dg/pr96377-[3-6].c.)

(4) It means that we reject SVE initializers that should be valid.
    (Tested by gcc.target/aarch64/sve/gnu_vectors_[34].c.)

(5) After r11-1741-g:31427b974ed7b7dd54e2 we reject:

      arm_neon_type1 array[] = { k ^ arm_neon_value1 };

    because applying the binary operator to arm_neon_value1 strips
    the "Advanced SIMD type" attributes that were added in that patch.
    Stripping the attributes is problematic for other reasons though,
    so that still needs to be fixed separately.

g++.target/aarch64/sve/gnu_vectors_[34].C already pass.

gcc/c/
PR c/96377
* c-typeck.c (process_init_element): Split test for whether to
recurse into a record, union or array into...
(initialize_elementwise_p): ...this new function.  Don't recurse
into a vector type if the initialization value is also a vector.

gcc/testsuite/
PR c/96377
* gcc.dg/pr96377-1.c: New test.
* gcc.dg/pr96377-2.c: Likewise.
* gcc.dg/pr96377-3.c: Likewise.
* gcc.dg/pr96377-4.c: Likewise.
* gcc.dg/pr96377-5.c: Likewise.
* gcc.dg/pr96377-6.c: Likewise.
* gcc.target/aarch64/pr96377-1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_4.c: Likewise.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_3.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_4.C: Likewise.

(cherry picked from commit 7d599ad27b9bcf5165f87710f1abc64bbabd06ae)

libsanitizer: Fix GetPcSpBp determination of SP on 32-bit Solaris/x86

The latest Solaris 11.4/x86 update uncovered a libsanitizer bug that
caused one test to FAIL for 32-bit:

+FAIL: c-c++-common/asan/null-deref-1.c   -O0  output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c   -O1  output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c   -O2  output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c   -O2 -flto  output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c -O2 -flto -flto-partition=none
output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c   -O3 -g  output pattern test
+FAIL: c-c++-common/asan/null-deref-1.c   -Os  output pattern test

I've identified the problem and the fix has just landed in upstream
llvm-project:

https://reviews.llvm.org/D83664

Tested on i386-pc-solaris2.11 and x86_64-pc-linux.gnu.

libsanitizer:
* sanitizer_common/sanitizer_linux.cpp: Cherry-pick llvm-project
revision f0e9b76c3500496f8f3ea7abe6f4bf801e3b41e7.

Daily bump.

d: Fix ICE in expand_intrinsic_vaarg

Both intrinsics did not handle the case where the va_list object comes
from a ref parameter.

gcc/d/ChangeLog:

PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().

gcc/testsuite/ChangeLog:

PR d/96140
* gdc.dg/pr96140.d: New test.

(cherry picked from commit dfc420f8d4492dbf5f45df4fecf93cb9645c0d7b)

Daily bump.

Do not allocate huge array in output_in_order.

We noticed that when analyzing LTRANS memory peak that happens right
after the start:
https://gist.github.com/marxin/223890df4d8d8e490b6b2918b77dacad

In case of chrome, we have symtab->order == 200M, so we allocate
16B * 200M = 3.2GB.

gcc/ChangeLog:

* cgraph.h: Remove leading empty lines.
* cgraphunit.c (enum cgraph_order_sort_kind): Remove
ORDER_UNDEFINED.
(struct cgraph_order_sort): Add constructors.
(cgraph_order_sort::process): New.
(cgraph_order_cmp): New.
(output_in_order): Simplify and push nodes to vector.

(cherry picked from commit 8bd062e8ad44e70be04108232e1ef597fc3b3e3e)

Daily bump.

Tune memcpy and memset for Zen cores.

Based on the collected numbers in PR95435, I suggest the following
tuning changes:

gcc/ChangeLog:

PR target/95435
* config/i386/x86-tune-costs.h: Use libcall for large sizes for
-m32. Start using libcall from 128+ bytes.

(cherry picked from commit dc65aba7a4725d1b464c8c64a5f739ee910e8943)

Re-format zen memcpy/memset costs.

The patch improves readability of the memcpy and memset
expansion strategies.

gcc/ChangeLog:

* config/i386/x86-tune-costs.h: Change code formatting.

(cherry picked from commit da346efd27eca48a8fe2e07d7e18b2c77ead0e2d)

Daily bump.

Update gcc ja.po, sv.po.

* ja.po, sv.po: Update.

preprocessor: Teach traditional about has_include [PR95889]

Traditional cpp (used by fortran) didn;t know about the new
__has_include__ implementation. Hey, since when did traditional cpp
grow __has_include__? That wasn't in knr!

libcpp/
* init.c (builtin_array): Add xref comment.
* traditional.c (fun_like_macro): Add HAS_INCLUDE codes.
gcc/testsuite/
* c-c++-common/cpp/has-include-1-traditional.c: New.

Fortran  : Don't warn for LOGICAL kind conversion PR96319

LOGICAL values will always fit regardless of kind so there
is no need for warnings.

2020-07-29  Mark Eggleston  <markeggleston@gcc.gnu.org>

gcc/fortran/

PR fortran/96319
* intrinsic.c (gfc_convert_type_warn):  Add check for
LOGICAL type so that warnings are not output.

2020-07-29  Mark Eggleston  <markeggleston@gcc.gnu.org>

gcc/testsuite/

PR fortran/96319
* gfortran.dg/pr96319.f90: New test.

(cherry picked from commit 6af8284719d151087a1c1e4da210cc5a9fa4a478)

gcc-changelog: fix combining of arguments.

contrib/ChangeLog:

2020-07-29 Martin Liska <mliska@suse.cz>

* git-backport.py: fix how are ChangeLog paths combined.

openacc: Deep copy attach/detach should not affect reference counts

Attach and detach operations are not supposed to affect structural or
dynamic reference counts for OpenACC. Previously they did so, which led to
subtle problems in some circumstances. We can avoid reference-counting
attach/detach operations by extending and slightly repurposing the
do_detach field in target_var_desc. It is now called is_attach to better
reflect its new role.

2020-07-27  Julian Brown  <julian@codesourcery.com>
    Thomas Schwinge  <thomas@codesourcery.com>

libgomp/
* libgomp.h (struct target_var_desc): Rename do_detach field to
is_attach.
* oacc-mem.c (goacc_exit_datum_1): Add assert.  Don't set finalize for
GOMP_MAP_FORCE_DETACH. Update checking to use is_attach field.
(goacc_enter_data_internal): Don't affect reference counts
for attach mappings.
(goacc_exit_data_internal): Don't affect reference counts for detach
mappings.
* target.c (gomp_map_vars_existing): Don't affect reference counts for
attach mappings.
(gomp_map_vars_internal): Set renamed is_attach flag unconditionally to
mark attach mappings.
(gomp_unmap_vars_internal): Use is_attach flag to prevent affecting
reference count for attach mappings.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark
test as shouldfail.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
gracefully in no-finalize mode.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
(cherry picked from commit bc4ed079dc09a62168699227a794ac52a5b6f6a4)

openacc: Remove unnecessary detach finalization

The call to gomp_detach_pointer in gomp_unmap_vars_internal does not
need to force finalization, and doing so may mask mismatched pointer
attachments/detachments. This patch removes the forcing.

2020-07-16 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>

libgomp/
* target.c (gomp_unmap_vars_internal): Remove unnecessary forcing of
finalization for detach operation.
* testsuite/libgomp.oacc-c-c++-common/structured-detach-underflow.c:
New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
(cherry picked from commit 25bce75c77ec5617c78173d837d3b664c0f20968)

coroutines: Correct frame capture of compiler temps [PR95591+4].

When a full expression contains a co_await (or co_yield), this means
that it might be suspended; which would destroy temporary variables
(on a stack for example).  However the language guarantees that such
temporaries are live until the end of the expression.

In order to preserve this, we 'promote' temporaries where necessary
so that they are saved in the coroutine frame (which allows them to
remain live potentially until the frame is destroyed).  In addition,
sub-expressions that produce control flow (such as TRUTH_AND/OR_IF
or COND_EXPR) must be handled specifically to ensure that await
expressions are properly expanded.

This patch corrects two mistakes in which we were (a) failing to
promote some temporaries and (b) we we failing to sequence DTORs for
the captures properly. This manifests in a number of related (but not
exact duplicate) PRs.

The revised code collects the actions into one place and maps all the
control flow into one form - a benefit of this is that co_returns are
now expanded earlier (which provides an opportunity to address PR95517
in some future patch).

We replace a statement that contains await expression(s) with a bind
scope that has a variable list describing the temporaries that have
been 'promoted' and a statement list that contains a series of cleanup
expressions for each of those.  Where we encounter nested conditional
expressions, these are wrapped in a try-finally block with a guard var
for each sub-expression variable that needs a DTOR.  The guards are all
declared and initialized to false before the first conditional sub-
expression.  The 'finally' block contains a series of if blocks (one
per guard variable) enclosing the relevant DTOR.

Variables listed in a bind scope in this manner are automatically moved
to a coroutine frame version by existing code (so we re-use that rather
than having a separate mechanism).

gcc/cp/ChangeLog:

PR c++/95591
PR c++/95599
PR c++/95823
PR c++/95824
PR c++/95895
* coroutines.cc (struct coro_ret_data): Delete.
(coro_maybe_expand_co_return): Delete.
(co_return_expander): Delete.
(expand_co_returns): Delete.
(co_await_find_in_subtree): Remove unused name.
(build_actor_fn): Remove unused parm, remove handling
for co_return expansion.
(register_await_info): Demote duplicate info message to a
warning.
(coro_make_frame_entry): Move closer to use site.
(struct susp_frame_data): Add fields for final suspend label
and a flag to indicate await expressions with initializers.
(captures_temporary): Delete.
(register_awaits): Remove unused code, update comments.
(find_any_await): New.
(tmp_target_expr_p): New.
(struct interesting): New.
(find_interesting_subtree): New.
(struct var_nest_node): New.
(flatten_await_stmt): New.
(handle_nested_conditionals): New.
(process_conditional): New.
(replace_statement_captures): Rename to...
(maybe_promote_temps): ... this.
(maybe_promote_captured_temps): Delete.
(analyze_expression_awaits): Check for await expressions with
initializers.  Simplify handling for truth-and/or-if.
(expand_one_truth_if): Simplify (map cases that need expansion
to COND_EXPR).
(await_statement_walker): Handle CO_RETURN_EXPR. Simplify the
handling for truth-and/or-if expressions.
(register_local_var_uses): Ensure that we create names in the
implementation namespace.
(morph_fn_to_coro): Add final suspend label to suspend frame
callback data and remove it from the build_actor_fn call.

gcc/testsuite/ChangeLog:

PR c++/95591
PR c++/95599
PR c++/95823
PR c++/95824
PR c++/95895
* g++.dg/coroutines/pr95591.C: New test.
* g++.dg/coroutines/pr95599.C: New test.
* g++.dg/coroutines/pr95823.C: New test.
* g++.dg/coroutines/pr95824.C: New test.

(cherry picked from commit 0f66b8486cea8668020e4bd48f261b760cb579be)

coroutines: co_returns are statements, not expressions.

This corrects an error in the CO_RETURN_EXPR tree
class.

gcc/cp/ChangeLog:

* cp-tree.def (CO_RETURN_EXPR): Correct the class
to use tcc_statement.

(cherry picked from commit 608832716e27ca356ee38d14ae30b3ab525884ea)

Daily bump.

expr: build string_constant only for a char type

gcc/ChangeLog:

PR tree-optimization/96058
* expr.c (string_constant): Build string_constant only
for a type that has same precision as char_type_node
and is an integral type.

(cherry picked from commit 7355a9408b990cdd20db91e2e1ba0b03e801d6a6)

expander: Fix ICE in maybe_warn_rdwr_sizes [PR96335]

The following testcase ICEs in maybe_warn_rdwr_sizes.  The problem is that
the caller uses its fndecl and fntype variables to fill up rdwr_map, and
the fntype in that case is a prototype with the access attribute and all
the checks needed for that performed.  But the maybe_warn_rdwr_sizes
function tries to rediscover fndecl/fntype itself and does it differently
from how the caller did (for fndecl get_callee_fndecl and fntype from that
FUNCTION_DECL, otherwise sets fntype to CALL_EXPR_FN's type).

On the testcase, get_callee_fndecl does find a FUNCTION_DECL because
it does STRIP_NOPS in between.

Instead of trying to rediscover those, this patch just passes them down,
like is done in several other functions.

2020-07-28  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/96335
* calls.c (maybe_warn_rdwr_sizes): Add FNDECL and FNTYPE arguments,
instead of trying to rediscover them in the body.
(initialize_argument_information): Adjust caller.

* gcc.dg/pr96335.c: New test.

(cherry picked from commit f9264b9008386ac3b5c795472c222fa524b127b0)

Daily bump.

Update gcc .po files.

* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po,
ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po,
zh_TW.po: Update.

Allow --with-cpu=power10

Update config.gcc so that we can use --with-cpu=power10.
Also remove "future" from the 64-bit check as Segher suggests.

* config.gcc: Identify power10 as a 64-bit processor and as valid
for --with-cpu and --with-tune.

(cherry picked from commit 71237df0a0b7f0f10cebedcd114fae7ad2aaebcb)

Use vec::reserve before vec_safe_grow_cleared is called

gcc/ChangeLog:

PR lto/45375
* symbol-summary.h: Call vec_safe_reserve before grow is called
in order to grow to a reasonable size.
* vec.h (vec_safe_reserve): Add missing function for vl_ptr
type.

(cherry picked from commit 7f5c0f328eced560a204bb8e3eae0d45795dd235)

Fortran  : ICE in gfc_check_pointer_assign PR95612

Output an error if the right hand value is a zero sized array or
does not have a symbol tree otherwise continue checking.

2020-07-27  Steven G. Kargl  <kargl@gcc.gnu.org>

gcc/fortran/

PR fortran/95612
* expr.c (gfc_check_pointer_assigb): Output an error if
rvalue is a zero sized array or output an error if rvalue
doesn't have a symbol tree.

2020-07-27  Mark Eggleston  <markeggleston@gcc.gnu.org>

gcc/testsuite/

PR fortran/95612
* gfortran.dg/pr95612.f90: New test.

(cherry picked from commit 81072bab8d1e48ee83d9711dcb559ea1e019b351)

Daily bump.

Fortran  : ICE in gfc_check_reshape PR95585

Issue an error where an array is used before its definition
instead of an ICE.

2020-07-26  Steven G. Kargl  <kargl@gcc.gnu.org>

gcc/fortran/

PR fortran/95585
* check.c (gfc_check_reshape): Add check for a value when
the symbol has an attribute flavor FL_PARAMETER.

2020-07-26  Mark Eggleston  <markeggleston@gcc.gnu.org>

gcc/testsuite/

PR fortran/95585
* gfortran.dg/pr95585.f90: New test.

(cherry picked from commit d9aed5f1ccffc019ddf980e349caa3d092755cb4)

Daily bump.

PR 93567, G edit descriptor uses E instead of F editing in rounding mode UP.

The switch between FMT_E and FMT_F is based on the absolute value.
Set r=0 for rounding toward zero and r = 1 otherwise.
If (exp_d - m) == 1 there is no rounding needed.

libgfortran/ChangeLog:

PR fortran/93567
* io/write_float.def (determine_en_precision): Fix switch between
FMT_E and FMT_F.

gcc/testsuite/ChangeLog:

PR fortran/93567
* gfortran.dg/round_3.f08: Add test cases.

(cherry picked from commit aa7e7eff5ec165dc8463a0e74309801b15d1feda)

PR 93592 - Invalid UP/DOWN rounding with EN descriptor.

The fix is obvious (I have added a comment). The tests are probably
an overkill, but it does not hurt.

libgfortran/ChangeLog:

PR fortran/93592
* io/write_float.def (build_float_string): Do not reset
nbefore for FMT_F and FMT_EN.

gcc/testsuite/ChangeLog:

PR fortran/93592
* gfortran.dg/fmt_en.f90: Adjust test.
* gfortran.dg/fmt_en_rd.f90: New test.
* gfortran.dg/fmt_en_rn.f90: New test.
* gfortran.dg/fmt_en_ru.f90: New test.
* gfortran.dg/fmt_en_rz.f90: New test.

(cherry picked from commit 05e0971bcf94a481cbfa2731484f024a67dbd4a5)

Daily bump.

libgo: update to Go 1.14.6 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/243317

Daily bump.

sparc/sparc64: use crtendS.o for default-pie executables [PR96190]

In --enable-default-pie mode compiler should switch from
using crtend.o to crtendS.o. On sparc it is especially important
because crtend.o contains PIC-unfriendly code.

We use GNU_USER_TARGET_ENDFILE_SPEC as a baseline spec to get
crtendS.o instead of crtend.o in !no-pie mode.

gcc:

2020-07-14 Sergei Trofimovich <siarheit@google.com>

PR target/96190
* config/sparc/linux.h (ENDFILE_SPEC): Use GNU_USER_TARGET_ENDFILE_SPEC
to get crtendS.o for !no-pie mode.
* config/sparc/linux64.h (ENDFILE_SPEC): Ditto.

(cherry picked from commit 87891d5eafe8d1de90b9d9b056eca81c508d1c77)

PR fortran/95980 - ICE in get_unique_type_string, at fortran/class.c:485

In SELECT TYPE, the argument may be an incorrectly specified unlimited
CLASS variable. Avoid NULL pointer dereferences for clean error
recovery.

gcc/fortran/
PR fortran/95980
* class.c (gfc_add_component_ref, gfc_build_class_symbol):
Add checks for NULL pointer dereference.
* primary.c (gfc_variable_attr): Likewise.
* resolve.c (resolve_variable, resolve_assoc_var)
(resolve_fl_var_and_proc, resolve_fl_variable_derived)
(resolve_symbol): Likewise.

(cherry picked from commit 70c884a4b82733027ac0e2620d09169b177080d7)

PR fortran/95980 - ICE on using sync images with -fcheck=bounds

In SELECT TYPE, the argument may be an incorrectly specified unlimited
polymorphic variable. Avoid a NULL pointer dereference for clean error
recovery.

gcc/fortran/
PR fortran/95980
* match.c (copy_ts_from_selector_to_associate, build_class_sym):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.
* resolve.c (resolve_select_type):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.

(cherry picked from commit f2151227dfe90a5fe73297c370786be98b0b090f)

PR fortran/96086 - ICE in gfc_match_select_rank, at fortran/match.c:6645

Handle NULL pointer dereference on SELECT RANK with an invalid
assumed-rank array declaration.

gcc/fortran/
PR fortran/96086
* match.c (gfc_match_select_rank): Catch NULL pointer
dereference.
* resolve.c (resolve_assoc_var): Catch NULL pointer dereference
that may occur after an illegal declaration.

(cherry picked from commit 8a0b69f0b089c05d233b8e1a941825b1ceac93bd)

PR fortran/89574 - ICE in conv_function_val, at fortran/trans-expr.c:3792

When checking for an external procedure from the same file, do not
consider symbols from different modules.

gcc/fortran/
PR fortran/89574
* trans-decl.c (gfc_get_extern_function_decl): Check whether a
symbol belongs to a different module.

(cherry picked from commit 28f2a080cc27531a8c78aec9f44aeff4961c2a4c)

rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode

PR96236 shows a problem where we don't correctly store our 512-bit accumulators
correctly in little-endian mode. The patch below detects when we're doing a
little-endian memory access and stores to the correct memory locations.

2020-07-22 Peter Bergner <bergner@linux.ibm.com>

gcc/
PR target/96236
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_mma_builtin): Handle
little-endian memory ordering.

gcc/testsuite/
PR target/96236
* gcc.target/powerpc/mma-double-test.c: Update storing results for
correct little-endian ordering.
* gcc.target/powerpc/mma-single-test.c: Likewise.

(cherry picked from commit ae575662833d70cb7d74b9538096c7becc79af14)

Always use name from c_interop_kinds_table for -fc-prototypes.

When a user specified a KIND that was a parameter taking the value
of an iso_c_binding KIND, the code used the name of that parameter
to look up the type name. Corrected by always looking it up in
the table of C interop kinds (which was previously done for
non-C-interop types, anyway).

gcc/fortran/ChangeLog:

PR fortran/96220
* dump-parse-tree.c (get_c_type_name): Always use the entries from
c_interop_kinds_table to find the correct C type.

(cherry picked from commit 2e1b25350aa96b3f5678a056d0b55bb323c452d9)

Fix handling of implicit_pure by checking if non-pure procedures are called.

Procedures are marked as implicit_pure if they fulfill the criteria of
pure procedures. In this case, a procedure was not marked as not being
implicit_pure which called another procedure, which had not yet been
marked as not being implicit_impure.

Fixed by iterating over all procedures, setting callers of procedures
which are non-pure and non-implicit_pure as non-implicit_pure and
doing this until no more procedure has been changed.

Backport from trunk r11-2215-g3055d879edb1bc2a3923f92a5e681c8f6774fbc3 .

gcc/fortran/ChangeLog:

2020-07-10 Thomas Koenig <tkoenig@gcc.gnu.org>

PR fortran/96018
* frontend-passes.c (gfc_check_externals): Adjust formatting.
(implicit_pure_call): New function.
(implicit_pure_expr): New function.
(gfc_fix_implicit_pure): New function.
* gfortran.h (gfc_fix_implicit_pure): New prototype.
* parse.c (translate_all_program_units): Call gfc_fix_implicit_pure.

gcc-changelog: fix when somebody reverts a backport

contrib/ChangeLog:

* gcc-changelog/git_commit.py: When reverting a backport,
we should print only Revert header.

(cherry picked from commit 02cada26e4783b4bfeaf6512a6c22df24d7a25fc)

gcc-changelog: Fix typo in output

contrib/ChangeLog:

* gcc-changelog/git_update_version.py: Fix typo.

(cherry picked from commit 0c1d1c01039a96c191a7aded40e5df40b14d387a)

Fix ChangeLog entry: reverted backport commit.

PR target/96260 - KASAN should work even back-end not porting anything.

- Most KASAN function don't need any porting anything in back-end
except asan stack protection.

- However kernel will given shadow offset when enable asan stack
protection, so eveything in KASAN can work if shadow offset is given.

- Verified with x86 and risc-v.

- Verified with RISC-V linux kernel.

gcc/ChangeLog:

PR target/96260
* asan.c (asan_shadow_offset_set_p): New.
* asan.h (asan_shadow_offset_set_p): Ditto.
* toplev.c (process_options): Allow -fsanitize=kernel-address
even TARGET_ASAN_SHADOW_OFFSET not implemented, only check when
asan stack protection is enabled.

gcc/testsuite/ChangeLog:

PR target/96260
* gcc.target/riscv/pr91441.c: Update warning message.
* gcc.target/riscv/pr96260.c: New.

(cherry picked from commit 2ca1b6d009b194286c3ec91f9c51cc6b0a475458)

Update BASE-VER after GCC 10.2 release

2020-07-23 Richard Biener <rguenther@suse.de>

* BASE-VER: Set to 10.2.1.

Update ChangeLog and version files for release

Daily bump.

i386: Rename TARGET_USE_XCHG_FOR_ATOMIC_STORE to TARGET_AVOID_MFENCE.

2020-07-21 Uroš Bizjak <ubizjak@gmail.com>

gcc/ChangeLog:

* config/i386/i386.h (TARGET_AVOID_MFENCE):
Rename from TARGET_USE_XCHG_FOR_ATOMIC_STORE.
* config/i386/sync.md (atomic_store<mode>): Update for rename.
* config/i386/x86-tune.def (X86_TUNE_AVOID_MFENCE):
Rename from X86_TUNE_USE_XCHG_FOR_ATOMIC_STORE.

Daily bump.

Regenerate gcc.pot.

* gcc.pot: Regenerate.

Daily bump.

Fix missing dependencies for selftests which occasionally causes failed builds.

gcc/

* Makefile.in (SELFTEST_DEPS): Move before including language makefile
fragments.

(cherry picked from commit b19d8aac15649f31a7588b2634411a1922906ea8)

Daily bump.

S/390: Emit vector alignment hints for z13 if AS accepts them

Squashed with commit f842bdd7a97e9fef7513a266d641cac72d5f97cc

gcc/ChangeLog:

* config.in: Regenerate.
* config/s390/s390.c (print_operand): Emit vector alignment hints
for target z13, if AS accepts them. For other targets the logic
stays the same.
* config/s390/s390.h (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS): Define
macro.
* configure: Regenerate.
* configure.ac: Check HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/align-1.c: Change target architecture
to z13.
* gcc.target/s390/vector/align-2.c: Change target architecture
to z13.

(cherry picked from commit 929fd91ba975eebf9e57f7f092041271dcaf0c34)

Daily bump.

c++: Treat GNU and Advanced SIMD vectors as distinct [PR95726]

This is a release branch version of
r11-1741-g:31427b974ed7b7dd54e28fec595e731bf6eea8ba and
r11-2022-g:efe99cca78215e339ba79f0a900a896b4c0a3d36.

The trunk versions of the patch made GNU and Advanced SIMD vectors
distinct (but inter-convertible) in all cases. However, the
traditional behaviour is that the types are distinct in template
arguments but not otherwise.

Following a suggestion from Jason, this patch puts the check
for different vector types under comparing_specializations.
In order to keep the backport as simple as possible, the patch
hard-codes the name of the attribute in the frontend rather than
adding a new branch-only target hook.

I didn't find a test that tripped the assert on the branch,
even with the --param in the PR, so instead I tested this by
forcing the hash function to only hash the tree code. That made
the static assertion in the test fail without the patch but pass
with it.

This means that the tests pass for unmodified sources even
without the patch (unless you're very unlucky).

gcc/
PR target/95726
* config/aarch64/aarch64.c (aarch64_attribute_table): Add
"Advanced SIMD type".
* config/aarch64/aarch64-builtins.c: Include stringpool.h and
attribs.h.
(aarch64_init_simd_builtin_types): Add an "Advanced SIMD type"
attribute to each Advanced SIMD type.
* config/arm/arm.c (arm_attribute_table): Add "Advanced SIMD type".
* config/arm/arm-builtins.c: Include stringpool.h and attribs.h.
(arm_init_simd_builtin_types): Add an "Advanced SIMD type"
attribute to each Advanced SIMD type.

gcc/cp/
PR target/95726
* typeck.c (structural_comptypes): When comparing template
specializations, differentiate between vectors that have and
do not have an "Advanced SIMD type" attribute.

gcc/testsuite/
PR target/95726
* g++.target/aarch64/pr95726.C: New test.
* g++.target/arm/pr95726.C: Likewise.

fix _mm512_{,mask_}cmp*_p[ds]_mask at -O0 [PR96174]

The _mm512_{,mask_}cmp_p[ds]_mask and also _mm_{,mask_}cmp_s[ds]_mask
intrinsics have an argument which must have a constant passed to it
and so use an inline version only for ifdef __OPTIMIZE__ and have
a #define for -O0. But the _mm512_{,mask_}cmp*_p[ds]_mask intrinsics
don't need a constant argument, they are essentially the first
set with the constant added to them implicitly based on the comparison
name, and so there is no #define version for them (correctly).
But their inline versions are defined in between the first and s[ds]
set and so inside of ifdef __OPTIMIZE__, which means that with -O0
they aren't defined at all.

This patch fixes that by moving those after the #ifdef __OPTIMIZE #else
use #define #endif block.

2020-07-15 Jakub Jelinek <jakub@redhat.com>

PR target/96174
* config/i386/avx512fintrin.h (_mm512_cmpeq_pd_mask,
_mm512_mask_cmpeq_pd_mask, _mm512_cmplt_pd_mask,
_mm512_mask_cmplt_pd_mask, _mm512_cmple_pd_mask,
_mm512_mask_cmple_pd_mask, _mm512_cmpunord_pd_mask,
_mm512_mask_cmpunord_pd_mask, _mm512_cmpneq_pd_mask,
_mm512_mask_cmpneq_pd_mask, _mm512_cmpnlt_pd_mask,
_mm512_mask_cmpnlt_pd_mask, _mm512_cmpnle_pd_mask,
_mm512_mask_cmpnle_pd_mask, _mm512_cmpord_pd_mask,
_mm512_mask_cmpord_pd_mask, _mm512_cmpeq_ps_mask,
_mm512_mask_cmpeq_ps_mask, _mm512_cmplt_ps_mask,
_mm512_mask_cmplt_ps_mask, _mm512_cmple_ps_mask,
_mm512_mask_cmple_ps_mask, _mm512_cmpunord_ps_mask,
_mm512_mask_cmpunord_ps_mask, _mm512_cmpneq_ps_mask,
_mm512_mask_cmpneq_ps_mask, _mm512_cmpnlt_ps_mask,
_mm512_mask_cmpnlt_ps_mask, _mm512_cmpnle_ps_mask,
_mm512_mask_cmpnle_ps_mask, _mm512_cmpord_ps_mask,
_mm512_mask_cmpord_ps_mask): Move outside of __OPTIMIZE__ guarded
section.

* gcc.target/i386/avx512f-vcmppd-3.c: New test.
* gcc.target/i386/avx512f-vcmpps-3.c: New test.

(cherry picked from commit 12d69dbfff9dd5ad4a30b20d1636f5cab6425e8c)

Revert "LTO: pick up -fcf-protection flag for the link step"

This reverts commit 8147c741df97ee02aa64c099c6b360e6a93384e1.

2020-07-15 Richard Biener <rguenther@suse.de>

PR bootstrap/96203
* lto-opts.c: Revert changes.
* lto-wrapper.c: Likewise.

Daily bump.

c++: Make convert_like complain about bad ck_ref_bind again [PR95789]

convert_like issues errors about bad_p conversions at the beginning
of the function, but in the ck_ref_bind case, it only issues them
after we've called convert_like on the next conversion.

This doesn't work as expected since r10-7096 because when we see
a conversion from/to class type in a template, we return early, thereby
missing the error, and a bad_p conversion goes by undetected. That
made the attached test to compile even though it should not.

I had thought that I could just move the ck_ref_bind/bad_p errors
above to the rest of them, but that regressed diagnostics because
expr then wasn't converted yet by the nested convert_like_real call.

So, for bad_p conversions, do the normal processing, but still return
the IMPLICIT_CONV_EXPR to avoid introducing trees that the template
processing can't handle well. This I achieved by adding a wrapper
function.

gcc/cp/ChangeLog:

PR c++/95789
PR c++/96104
PR c++/96179
* call.c (convert_like_real_1): Renamed from convert_like_real.
(convert_like_real): New wrapper for convert_like_real_1.

gcc/testsuite/ChangeLog:

PR c++/95789
PR c++/96104
PR c++/96179
* g++.dg/conversion/ref4.C: New test.
* g++.dg/conversion/ref5.C: New test.
* g++.dg/conversion/ref6.C: New test.

(cherry picked from commit 8e64d182850560dbedfabb88aac90d4fc6155067)

libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvprof

The version of nvprof in CUDA 9.0 causes a hang when used to profile an
OpenACC program.  This is because it calls acc_get_device_type from
a callback called during device initialization, which then attempts
to acquire acc_device_lock while it is already taken, resulting in
deadlock.  This works around the issue by returning acc_device_none
from acc_get_device_type without attempting to acquire the lock when
initialization has not completed yet.

2020-07-14  Tom de Vries  <tom@codesourcery.com>
    Cesar Philippidis  <cesar@codesourcery.com>
    Thomas Schwinge  <thomas@codesourcery.com>
    Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
* oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread):
New variable.
(acc_init_1): Set acc_init_thread to pthread_self ().  Set
acc_init_state to initializing at the start, and to initialized at the
end.
(self_initializing_p): New function.
(acc_get_device_type): Return acc_device_none if called by thread that
is currently executing acc_init_1.
* libgomp.texi (acc_get_device_type): Update documentation.
(Implementation Status and Implementation-Defined Behavior): Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c: New.

(cherry picked from commit b52643ab9004ba8ecea06a399885fe1e04183eda)

ipa-devirt: Fix crash in obj_type_ref_class [PR95114]

The testcase has failed since r9-5035, because obj_type_ref_class
tries to look up an ODR type when no ODR type information is
available.  (The information was available earlier in the
compilation, but was freed during pass_ipa_free_lang_data.)
We then crash dereferencing the null get_odr_type result.

The test passes with -O2.  However, it fails again if -fdump-tree-all
is used, since obj_type_ref_class is called indirectly from the
dump routines.

Other code creates ODR type entries on the fly by passing “true”
as the insert parameter.  But obj_type_ref_class can't do that
unconditionally, since it should have no side-effects when used
from the dumping code.

Following a suggestion from Honza, this patch adds parameters
to say whether the routines are being called from dump routines
and uses those to derive the insert parameter.

gcc/
PR middle-end/95114
* tree.h (virtual_method_call_p): Add a default-false parameter
that indicates whether the function is being called from dump
routines.
(obj_type_ref_class): Likewise.
* tree.c (virtual_method_call_p): Likewise.
* ipa-devirt.c (obj_type_ref_class): Likewise.  Lazily add ODR
type information for the type when the parameter is false.
* tree-pretty-print.c (dump_generic_node): Update calls to
virtual_method_call_p and obj_type_ref_class accordingly.

gcc/testsuite/
PR middle-end/95114
* g++.target/aarch64/pr95114.C: New test.

value-range: Fix handling of POLY_INT_CST anti-ranges [PR96146]

The range infrastructure has code to decompose POLY_INT_CST ranges
to worst-case integer bounds.  However, it had the fundamental flaw
(obvious in hindsight) that it applied to anti-ranges too, meaning
that a range 2+2X would end up with a range of ~[2, +INF], i.e.
[-INF, 1].  This patch decays to varying in that case instead.

I'm still a bit uneasy about this.  ISTM that in terms of
generality:

  SSA_NAME => POLY_INT_CST => INTEGER_CST
           => ADDR_EXPR

I.e. an SSA_NAME could store a POLY_INT_CST and a POLY_INT_CST
could store an INTEGER_CST (before canonicalisation).  POLY_INT_CST
is also “as constant as” ADDR_EXPR (well, OK, only some ADDR_EXPRs
are run-time rather than link-time constants, whereas all POLY_INT_CSTs
are, but still).  So it seems like we should at least be able to treat
POLY_INT_CST as symbolic.  On the other hand, I don't have any examples
in which that would be useful.

gcc/
PR tree-optimization/96146
* value-range.cc (value_range::set): Only decompose POLY_INT_CST
bounds to integers for VR_RANGE.  Decay to VR_VARYING for anti-ranges
involving POLY_INT_CSTs.

gcc/testsuite/
PR tree-optimization/96146
* gcc.target/aarch64/sve/acle/general/pr96146.c: New test.

expr: Unbreak build of mesa [PR96194]

> > The store to the whole of each volatile object was picked apart
> > like there had been an individual assignment to each of the
> > fields.  Reads were added as part of that; see PR for details.
> > The reads from volatile memory were a clear bug; individual
> > stores questionable.  A separate patch clarifies the docs.

This breaks building of mesa on both the trunk and 10 branch.

The problem is that the middle-end may never create temporaries of non-POD
(TREE_ADDRESSABLE) types, those can be only created when the language says
so and thus only the FE is allowed to create those.

This patch just reverts the behavior to what we used to do before for the
stores to volatile non-PODs.  Perhaps we want to do something else, but
definitely we can't create temporaries of the non-POD type.  It is up to
discussions on what should happen in those cases.

2020-07-14  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/96194
* expr.c (expand_constructor): Don't create temporary for store to
volatile MEM if exp has an addressable type.

* g++.dg/opt/pr96194.C: New test.

(cherry picked from commit b1d389d60d1929c7528ef984925ea010e3bf2c1a)

LTO: pick up -fcf-protection flag for the link step

2020-07-14 Matthias Klose <doko@ubuntu.com>

PR lto/95604
* lto-wrapper.c (merge_and_complain): Add decoded options as parameter,
error on different values for -fcf-protection.
(append_compiler_options): Pass -fcf-protection option.
(find_and_merge_options): Add decoded options as parameter,
pass decoded_options to merge_and_complain.
(run_gcc): Pass decoded options to find_and_merge_options.
* lto-opts.c (lto_write_options): Pass -fcf-protection option.

(cherry picked from commit 6a48d12475cdb7375b98277f8bc089715feeeafe)

Daily bump.

rs6000: clean up testsuite power10_hw check

Because the check for power10_hw is not called
check_effective_target_power10_hw, it needs to be looked
for by is-effective-target-keyword. Also reorder things
in is-effective-target to put power10_hw with the other
ppc stuff.

2020-07-13 Aaron Sawdey <acsawdey@linux.ibm.com>

gcc/testsuite/

* lib/target-supports.exp (is-effective-target):
Reorder to put powerpc stuff together.
(is-effective-target-keyword): Add power10_hw.

(cherry picked from commit 94c7c67b82dd7255fde0d7ae42d483336ea1b60b)

rs6000: add effective-target test ppc_mma_hw

Add a test for dejagnu to determine if execution of MMA instructions is
supported in the test environment. Add an execution test to make sure
that __builtin_cpu_supports("mma") is true if we can execute MMA
instructions.

2020-07-13 Aaron Sawdey <acsawdey@linux.ibm.com>

gcc/testsuite/

* lib/target-supports.exp (check_ppc_mma_hw_available):
New function.
(is-effective-target): Add ppc_mma_hw.
(is-effective-target-keyword): Add ppc_mma_hw.
* gcc.target/powerpc/mma-supported.c: New file.
* gcc.target/powerpc/mma-single-test.c: Require ppc_mma_hw.
* gcc.target/powerpc/mma-double-test.c: Require ppc_mma_hw.

(cherry picked from commit 305ab735bd40b52a451851fa6e2177f184eb05d4)

aarch64: Add missing ACLE support for PAC-RET

Define the __ARM_FEATURE_PAC_DEFAULT feature test
macro when PAC-RET branch protection is enabled.

2020-07-13 Szabolcs Nagy <szabolcs.nagy@arm.com>

gcc/ChangeLog:

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
__ARM_FEATURE_PAC_DEFAULT support.

(cherry picked from commit a1faa8e2470b33e92f6274804bf7941fbb6e2d38)

doc: Clarify __builtin_return_address [PR94891]

The expected semantics and valid usage of __builtin_return_address is
not clear since it exposes implementation internals that are normally
not meaningful to portable c code.

This documentation change tries to clarify the semantics in case the
return address is stored in a mangled form. This affects AArch64 when
pointer authentication is used for the return address signing (i.e.
-mbranch-protection=pac-ret).

2020-07-13 Szabolcs Nagy <szabolcs.nagy@arm.com>

gcc/ChangeLog:

PR target/94891
* doc/extend.texi: Update the text for __builtin_return_address.

(cherry picked from commit 6a391e06f953c3390b14020d8cacb6d55f81b2b9)

libgcc: fix the handling of return address mangling [PR94891]

Mangling, currently only used on AArch64 for return address signing,
is an internal representation that should not be exposed via

  __builtin_return_address return value,
  __builtin_eh_return handler argument,
  _Unwind_DebugHook handler argument.

Note that a mangled address might not even fit into a void *, e.g.
with AArch64 ilp32 ABI the return address is stored as 64bit, so
the mangled return address cannot be accessed via _Unwind_GetPtr.

This patch changes the unwinder hooks as follows:

MD_POST_EXTRACT_ROOT_ADDR is removed: root address comes from
__builtin_return_address which is not mangled.

MD_POST_EXTRACT_FRAME_ADDR is renamed to MD_DEMANGLE_RETURN_ADDR,
it now operates on _Unwind_Word instead of void *, so the hook
should work when return address signing is enabled on AArch64 ilp32.
(But for that __builtin_aarch64_autia1716 should be fixed to operate
on 64bit input instead of a void *.)

MD_POST_FROB_EH_HANDLER_ADDR is removed: it is the responsibility of
__builtin_eh_return to do the mangling if necessary.

2020-07-13  Szabolcs Nagy  <szabolcs.nagy@arm.com>

libgcc/ChangeLog:

PR target/94891
* config/aarch64/aarch64-unwind.h (MD_POST_EXTRACT_ROOT_ADDR): Remove.
(MD_POST_FROB_EH_HANDLER_ADDR): Remove.
(MD_POST_EXTRACT_FRAME_ADDR): Rename to ...
(MD_DEMANGLE_RETURN_ADDR): This.
(aarch64_post_extract_frame_addr): Rename to ...
(aarch64_demangle_return_addr): This.
(aarch64_post_frob_eh_handler_addr): Remove.
* unwind-dw2.c (uw_update_context): Demangle return address.
(uw_frob_return_addr): Remove.

(cherry picked from commit b097c7a27fb0796b2653a1d003cbf6b7a69d8961)

aarch64: fix __builtin_eh_return with pac-ret [PR94891]

Currently __builtin_eh_return takes a signed return address, which can
cause ABI and API issues: 1) pointer representation problems if the
address is passed around before eh return, 2) the source code needs
pac-ret specific changes and needs to know if pac-ret is used in the
current frame, 3) signed address may not be representible as void *
(with ilp32 abi).

Using address signing to protect eh return is ineffective because the
instruction sequence in the unwinder that starts from the address
signing and ends with a ret can be used as a return to anywhere gadget.
Using indirect branch istead of ret with bti j landing pads at the
target can reduce the potential of such gadget, which also implies
that __builtin_eh_return should not take a signed address.

This is a big hammer fix to the ABI and API issues: it turns pac-ret
off for the caller completely (not just on the eh return path). To
harden the caller against ROP attacks, it should use indirect branch
instead of ret, this is not attempted so the patch remains small and
backportable.

2020-07-13 Szabolcs Nagy <szabolcs.nagy@arm.com>

gcc/ChangeLog:

PR target/94891
* config/aarch64/aarch64.c (aarch64_return_address_signing_enabled):
Disable return address signing if __builtin_eh_return is used.

gcc/testsuite/ChangeLog:

PR target/94891
* gcc.target/aarch64/return_address_sign_1.c: Update test.
* gcc.target/aarch64/return_address_sign_b_1.c: Likewise.

(cherry picked from commit 2bc95be3bb8c8138e2e87c1c11c84bfede989d61)

aarch64: fix return address access with pac [PR94891][PR94791]

This is a big hammer fix for __builtin_return_address (PR target/94891)
returning signed addresses (sometimes, depending on wether lr happens
to be signed or not at the time of call which depends on optimizations),
and similarly -pg may pass signed return address to _mcount
(PR target/94791).

At the time of return address expansion we don't know if it's signed or
not so it is done unconditionally.

2020-07-13 Szabolcs Nagy <szabolcs.nagy@arm.com>

gcc/ChangeLog:

PR target/94891
PR target/94791
* config/aarch64/aarch64-protos.h (aarch64_return_addr_rtx): Declare.
* config/aarch64/aarch64.c (aarch64_return_addr_rtx): New.
(aarch64_return_addr): Use aarch64_return_addr_rtx.
* config/aarch64/aarch64.h (PROFILE_HOOK): Likewise.

(cherry picked from commit 463a54e5d4956143f81c1f23b91cbd2d93855741)

aarch64: Fix BTI support in libitm

sjlj.S did not have the GNU property note markup and the BTI c
instructions that are necessary when it is built with branch
protection.

The notes are only added when libitm is built with branch
protection, because old linkers mishandle the note (merge
them incorrectly or emit warnings), the BTI instructions
are added unconditionally.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libitm/ChangeLog:

* config/aarch64/sjlj.S: Add BTI marking and related definitions,
and add BTI c to function entries.

(cherry picked from commit 319078dad62eba942d33c8975bdcbb09d1c68ba6)

aarch64: Fix BTI support in libgcc [PR96001]

lse.S did not have the GNU property note markup and the BTI c
instructions that are necessary when it is built with branch
protection.

The notes are only added when libgcc is built with branch
protection, because old linkers mishandle the note (merge
them incorrectly or emit warnings), the BTI instructions
are added unconditionally.

Note: BTI c is only necessary at function entry if the function
may be called indirectly, currently lse functions are not called
indirectly, but BTI is added for ABI reasons e.g. to allow
linkers later to emit stub code with indirect jump.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libgcc/ChangeLog:

PR target/96001
* config/aarch64/lse.S: Add BTI marking and related definitions,
and add BTI c to function entries.

(cherry picked from commit f0f62fa0320762119446893c67cb52934bc5a05e)

aarch64: Fix noexecstack note in libgcc

lse.S did not have GNU stack note, this may cause missing
PT_GNU_STACK in binaries on Linux and FreeBSD.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libgcc/ChangeLog:

* config/aarch64/lse.S: Add stack note.

(cherry picked from commit e73ec755489afc9fcc75dfac6f06ac73e243e72a)