git.ipfire.org Git - thirdparty/gcc.git/log

PR middle-end/105604 - ICE: in tree_to_shwi with vla in struct and sprintf

gcc/ChangeLog:

PR middle-end/105604
* gimple-ssa-sprintf.cc (set_aggregate_size_and_offset): Add comments.
(get_origin_and_offset_r): Remove null handling.  Handle variable array
sizes.
(get_origin_and_offset): Handle null argument here.  Simplify.
(alias_offset): Update comment.
* pointer-query.cc (field_at_offset): Update comment.  Handle members
of variable-length types.

gcc/testsuite/ChangeLog:

PR middle-end/105604
* gcc.dg/Wrestrict-24.c: New test.
* gcc.dg/Wrestrict-25.c: New test.
* gcc.dg/Wrestrict-26.c: New test.

Co-authored-by: Richard Biener <rguenther@suse.de>

c++: *this folding in constexpr call

The code in cxx_eval_call_expression to fold *this was doing the wrong thing
for array decay; we can use cxx_fold_indirect_ref instead.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_fold_indirect_ref): Add default arg.
(cxx_eval_call_expression): Call it.
(cxx_fold_indirect_ref_1): Handle null empty_base.

gcc.misc-tests/outputs.exp: Use link test to check for -gsplit-dwarf support

We have noticed that, when running the GCC testsuite on AArch64
RTEMS 6, we have about 150 tests failing due to a link failure.
When investigating, we found that all the tests were failing
due to the use of -gsplit-dwarf.

On this platform, using -gsplit-dwarf currently causes an error
during the link:

    | /[...]/ld: a.out section `.unexpected_sections' will not fit
    |    in region `UNEXPECTED_SECTIONS'
    | /[...]/ld: region `UNEXPECTED_SECTIONS' overflowed by 56 bytes

The error is a bit cryptic, but the source of the issue is that
the linker does not currently support the sections generated
by -gsplit-dwarf (.debug_gnu_pubnames, .debug_gnu_pubtypes).
This means that the -gsplit-dwarf feature itself really isn't
supported on this platform, at least for the moment.

This commit enhances the -gsplit-dwarf support check to be
a compile-and-link check, rather than just a compile check.
This allows it to properly detect that this feature isn't
supported on platforms such as AArch64 RTEMS where the compilation
works, but not the link.

Tested on aarch64-rtems, where a little over 150 tests are now
passing, instead of failing, as well as on x86_64-linux, where
the results are identical, and where the .log file was also manually
inspected to make sure that the use of the -gsplit-dwarf option
was preserved.

gcc/testsuite/ChangeLog:

* gcc.misc-tests/outputs.exp: Make the -gsplit-dwarf test
a compile-and-link test rather than a compile-only test.

c++: discarded-value and constexpr

I've been thinking for a while that the 'lval' parameter needed a third
value for discarded-value expressions; most importantly,
cxx_eval_store_expression does extra work for an lvalue result, and we also
don't want to do the l->r conversion.

Mostly this is pretty mechanical. Apart from the _store_ fix, I also use
vc_discard for substatements of a STATEMENT_LIST other than a stmt-expr
result, and avoid building _REFs to be ignored in a few other places.

gcc/cp/ChangeLog:

* constexpr.cc (enum value_cat): New. Change all 'lval' parameters
from int to value_cat. Change most false to vc_prvalue, most true
to vc_glvalue, cases where the return value is ignored to
vc_discard.
(cxx_eval_statement_list): Only vc_prvalue for stmt-expr result.
(cxx_eval_store_expression): Only build _REF for vc_glvalue.
(cxx_eval_array_reference, cxx_eval_component_reference)
(cxx_eval_indirect_ref, cxx_eval_constant_expression): Likewise.

c++: constexpr empty base redux [PR105622]

Here calling the constructor for s.__size_ had ctx->ctor for s itself
because cxx_eval_store_expression doesn't create a ctor for the empty field.
Then cxx_eval_call_expression returned the s initializer, and my empty base
overhaul in r13-160 got confused because the type of init is not an empty
class. But that's OK, we should be checking the type of the original LHS
instead. We also want to use initialized_type in the condition, in case
init is an AGGR_INIT_EXPR.

I spent quite a while working on more complex solutions before coming back
to this simple one.

PR c++/105622

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_store_expression): Adjust assert.
Use initialized_type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/no_unique_address14.C: New test.

Add new parameter to vec_perm_const hook for specifying operand mode.

The rationale of the patch is to support vec_perm_expr of the form:
lhs = vec_perm_expr<rhs, mask>
where lhs and rhs are vector types with different lengths but have
same element type. For example, lhs is SVE vector and rhs
is corresponding AdvSIMD vector.

It would also allow to express extract even/odd and interleave operations
with a VEC_PERM_EXPR. The interleave currently has the issue that we have
to artificially widen the inputs with "dont-care" elements.

gcc/ChangeLog:

* target.def (vec_perm_const): Define new parameter op_mode and
update doc.
* doc/tm.texi: Regenerate.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Adjust
vec_perm_const hook to add new parameter op_mode and return false
if result and operand modes do not match.
* config/arm/arm.cc (arm_vectorize_vec_perm_const): Likewise.
* config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Likewise.
* config/ia64/ia64.cc (ia64_vectorize_vec_perm_const): Likewise.
* config/mips/mips.cc (mips_vectorize_vec_perm_const): Likewise.
* config/rs6000/rs6000.cc (rs6000_vectorize_vec_perm_const): Likewise
* config/s390/s390.cc (s390_vectorize_vec_perm_const): Likewise.
* config/sparc/sparc.cc (sparc_vectorize_vec_perm_const): Likewise.
* config/i386/i386-expand.cc (ix86_vectorize_vec_perm_const): Likewise.
* config/i386/i386-expand.h (ix86_vectorize_vec_perm_const): Adjust
prototype.
* config/i386/sse.md (ashrv4di3): Adjust call to vec_perm_const hook.
(ashrv2di3): Likewise.
* optabs.cc (expand_vec_perm_const): Likewise.
* optabs-query.h (can_vec_perm_const_p): Adjust prototype.
* optabs-query.cc (can_vec_perm_const_p): Define new parameter
op_mode and pass it to vec_perm_const hook.
(can_mult_highpart_p): Adjust call to can_vec_perm_const_p.
* match.pd (vec_perm X Y CST): Likewise.
* tree-ssa-forwprop.cc (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.cc (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-generic.cc (lower_vec_perm): Likewise.
* tree-vect-loop-manip.cc (interleave_supported_p): Likewise.
* tree-vect-loop.cc (have_whole_vector_shift): Likewise.
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Likewise.
* tree-vect-slp.cc (can_duplicate_and_interleave_p): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vectorizable_slp_permutation): Likewise.
* tree-vect-stmts.cc (perm_mask_for_reverse): Likewise.
(vectorizable_bswap): Likewise.
(scan_store_can_perm_p): Likewise.
(vect_gen_perm_mask_checked): Likewise.

x86: Document -mcet-switch

When -fcf-protection=branch is used, the compiler will generate jump
tables for switch statements where the indirect jump is prefixed with
the NOTRACK prefix, so it can jump to non-ENDBR targets. Since the
indirect jump targets are generated by the compiler and stored in
read-only memory, this does not result in a direct loss of hardening.
But if the jump table index is attacker-controlled, the indirect jump
may not be constrained by CET.

Document -mcet-switch to generate jump tables for switch statements with
ENDBR and skip the NOTRACK prefix for indirect jump. This option should
be used when the NOTRACK prefix is disabled.

PR target/104816
* config/i386/i386.opt: Remove Undocumented.
* doc/invoke.texi: Document -mcet-switch.

amdgcn: Add gfx90a support

This adds architecture options and multilibs for the AMD GFX90a GPUs.
It also tidies up some of the ISA selection code, and corrects a few small
mistake in the gfx908 naming.

gcc/ChangeLog:

* config.gcc (amdgcn): Accept --with-arch=gfx908 and gfx90a.
* config/gcn/gcn-opts.h (enum gcn_isa): New.
(TARGET_GCN3): Use enum gcn_isa.
(TARGET_GCN3_PLUS): Likewise.
(TARGET_GCN5): Likewise.
(TARGET_GCN5_PLUS): Likewise.
(TARGET_CDNA1): New.
(TARGET_CDNA1_PLUS): New.
(TARGET_CDNA2): New.
(TARGET_CDNA2_PLUS): New.
(TARGET_M0_LDS_LIMIT): New.
(TARGET_PACKED_WORK_ITEMS): New.
* config/gcn/gcn.cc (gcn_isa): Change to enum gcn_isa.
(gcn_option_override): Recognise CDNA ISA variants.
(gcn_omp_device_kind_arch_isa): Support gfx90a.
(gcn_expand_prologue): Make m0 init optional.
Add support for packed work items.
(output_file_start): Support gfx90a.
(gcn_hsa_declare_function_name): Support gfx90a metadata.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS):Add __CDNA1__ and
__CDNA2__.
* config/gcn/gcn.md (<su>mulsi3_highpart): Use TARGET_GCN5_PLUS.
(<su>mulsi3_highpart_imm): Likewise.
(<su>mulsidi3): Likewise.
(<su>mulsidi3_imm): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx90a.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90a): New.
(main): Support gfx90a.
* config/gcn/t-gcn-hsa: Add gfx90a multilib.
* config/gcn/t-omp-device: Add gfx90a isa.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add
EF_AMDGPU_MACH_AMDGCN_GFX90a.
(gcn_gfx90a_s): New.
(isa_hsa_name): Support gfx90a.
(isa_code): Likewise.

amdgcn: Remove LLVM 9 assembler/linker support

The minimum required LLVM version is now 13.0.1, and is enforced by configure.

gcc/ChangeLog:

* config.in: Regenerate.
* config/gcn/gcn-hsa.h (X_FIJI): Delete.
(X_900): Delete.
(X_906): Delete.
(X_908): Delete.
(S_FIJI): Delete.
(S_900): Delete.
(S_906): Delete.
(S_908): Delete.
(NO_XNACK): New macro.
(NO_SRAM_ECC): New macro.
(SRAMOPT): Keep only v4 variant.
(HSACO3_SELECT_OPT): Delete.
(DRIVER_SELF_SPECS): Delete.
(ASM_SPEC): Remove LLVM 9 support.
* config/gcn/gcn-valu.md
(gather<mode>_insn_2offsets<exec>): Remove assembler bug workaround.
(scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
* config/gcn/gcn.cc (output_file_start): Remove LLVM 9 support.
(print_operand_address): Remove assembler bug workaround.
* config/gcn/mkoffload.cc (EF_AMDGPU_XNACK_V3): Delete.
(EF_AMDGPU_SRAM_ECC_V3): Delete.
(SET_XNACK_ON): Delete v3 variants.
(SET_XNACK_OFF): Delete v3 variants.
(TEST_XNACK): Delete v3 variants.
(SET_SRAM_ECC_ON): Delete v3 variants.
(SET_SRAM_ECC_ANY): Delete v3 variants.
(SET_SRAM_ECC_OFF): Delete v3 variants.
(SET_SRAM_ECC_UNSUPPORTED): Delete v3 variants.
(TEST_SRAM_ECC_ANY): Delete v3 variants.
(TEST_SRAM_ECC_ON): Delete v3 variants.
(copy_early_debug_info): Remove v3 support.
(main): Remove v3 support.
* configure: Regenerate.
* configure.ac: Replace all GCN feature checks with a version check.

libiberty: remove FINAL and OVERRIDE from ansidecl.h

libiberty's ansidecl.h provides macros FINAL and OVERRIDE to allow
virtual functions to be labelled with the C++11 "final" and "override"
specifiers, but with empty implementations on pre-C++11 C++ compilers.

We've used the macros in many places in GCC, but as of as of GCC 11
onwards GCC has required a C++11 compiler, such as GCC 4.8 or later.
On the assumption that any such compiler correctly implements "final"
and "override", I've simplified GCC's codebase by replacing all uses of
the FINAL and OVERRIDE macros in GCC's source tree with the lower-case
specifiers (via commits r13-690-gff171cb13df671 and
r13-716-g8473ef7be60443)

The macros are reportedly not used anywhere in binutils-gdb.

This patch completes this transition for GCC by eliminating the macros
from ansidecl.h.

include/ChangeLog:
* ansidecl.h: Drop macros OVERRIDE and FINAL.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Optimize double word negation of zero extended values on x86.

It's not uncommon for GCC to convert between a (zero or one) Boolean
value and a (zero or all ones) mask value, possibly of a wider type,
using negation.

Currently on x86_64, the following simple test case:
__int128 foo(unsigned long x) { return -(__int128)x; }

compiles with -O2 to:

        movq    %rdi, %rax
        xorl    %edx, %edx
        negq    %rax
        adcq    $0, %rdx
        negq    %rdx
        ret

with this patch, which adds an additional peephole2 to i386.md,
we instead generate the improved:

        movq    %rdi, %rax
        negq    %rax
        sbbq    %rdx, %rdx
        ret

[and likewise for the (DImode) long long version using -m32.]
A peephole2 is appropriate as the double word negation and the
operation providing the xor are typically only split after combine.

In fact, the new peephole2 sequence:
;; Convert:
;;   xorl %edx, %edx
;;   negl %eax
;;   adcl $0, %edx
;;   negl %edx
;; to:
;;   negl %eax
;;   sbbl %edx, %edx    // *x86_mov<mode>cc_0_m1

is nearly identical to (and placed immediately after) the existing:
;; Convert:
;;   mov %esi, %edx
;;   negl %eax
;;   adcl $0, %edx
;;   negl %edx
;; to:
;;   xorl %edx, %edx
;;   negl %eax
;;   sbbl %esi, %edx

One potential objection/concern is that "sbb? %reg,%reg" may possibly be
incorrectly perceived as a false register dependency on older hardware,
much like "xor? %reg,%reg" may be perceived as a false dependency on
really old hardware.  This doesn't currently appear to be a concern
for the i386 backend's *x86_move<mode>cc_0_m1 as shown by the following
test code:

int bar(unsigned int x, unsigned int y) {
  return x > y ? -1 : 0;
}

which currently generates a "naked" sbb:
        cmp     esi, edi
        sbb     eax, eax
        ret

If anyone does potentially encounter a stall, it would easy to add
a splitter or peephole2 controlled by a tuning flag to insert an additional
xor to break the false dependency chain (when not optimizing for size),
but I don't believe this is required on recent microarchitectures.

2022-05-24 Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (peephole2): Convert xor;neg;adc;neg,
i.e. a double word negation of a zero extended operand, to
neg;sbb.

gcc/testsuite/ChangeLog
* gcc.target/i386/neg-zext-1.c: New test case for -m32.
* gcc.target/i386/neg-zext-2.c: New test case for -m64.

PR tree-optimization/105668: Provide vcond_mask_v1tiv1ti pattern.

This patch is an alternate/supplementary fix to PR tree-optimization/105668
that provides a vcond_mask_v1titi optab/define_expand to the i386 backend.
An undocumented feature/bug of GCC's vectorization is that any target that
provides a vec_cmpeq<mode><mode> has to also provide a matching
vcond_mask<mode><mode>.  This backend patch preserves the status quo,
rather than fixes the underlying problem.

One aspect of this clean-up is that ix86_expand_sse_movcc provides
fallback implementations using pand/pandn/por that effectively make
V2DImode and V1TImode vcond_mask available on any TARGET_SSE2, not
just TARGET_SSE4_2.  This allows a simplification as V2DI mode can
be handled by using a VI_128 mode iterator instead of a VI124_128
mode iterator, and instead this define_expand is effectively renamed
to provide a V1TImode vcond_mask expander (as V1TI isn't in VI_128).

2022-05-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR tree-optimization/105668
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Support
V1TImode, just like V2DImode.
* config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
Use VI_128 mode iterator instead of VI124_128 to include V2DI.
(vcond_mask_v2div2di): Delete.
(vcond_mask_v1tiv1ti): New define_expand.

gcc/testsuite/ChangeLog
PR tree-optimization/105668
* gcc.target/i386/pr105668.c: New test case.

Minor improvement to genpreds.cc

This simple patch implements Richard Biener's suggestion in comment #6
of PR tree-optimization/52171 (from February 2013) that the insn-preds
code generated by genpreds can avoid using strncmp when matching constant
strings of length one.

The effect of this patch is best explained by the diff of insn-preds.cc:
<       if (!strncmp (str + 1, "g", 1))
---
>       if (str[1] == 'g')
3104c3104
<       if (!strncmp (str + 1, "m", 1))
---
>       if (str[1] == 'm')
3106c3106
<       if (!strncmp (str + 1, "c", 1))
---
>       if (str[1] == 'c')
...

The equivalent optimization is performed by GCC (but perhaps not by the
host compiler), but generating simpler/smaller code may encourage further
optimizations (such as use of a switch statement).

2022-05-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* genpreds.cc (write_lookup_constraint_1): Avoid generating a call
to strncmp for strings of length one.

c++: set TYPE_CANONICAL for more template types

When forming a class template specialization, lookup_template_class
uses structural equality for the specialized type whenever one of its
template arguments uses structural equality.  This is the sensible thing
to do in a vacuum, but given that we already effectively deduplicate class
specializations via the type_specializations table, we ought to be able
to safely assume that each class specialization is unique and therefore
canonical, regardless of the canonicity of the template arguments.

To that end this patch makes us use the canonical type machinery for all
type specializations, except for the case where a PARM_DECL appears in
the template arguments (this special case was recently added by
r12-3766-g72394d38d929c7).

Additionally, this patch makes us use the canonical type machinery for
TEMPLATE_TEMPLATE_PARMs and BOUND_TEMPLATE_TEMPLATE_PARMs, by extending
canonical_type_parameter appropriately.  A comment in tsubst says it's
unsafe to set TYPE_CANONICAL for a lowered TEMPLATE_TEMPLATE_PARM, but
I'm not sure this is true anymore.  According to Jason, this comment
(from r120341) became obsolete when later that year r129844 started to
substitute the template parms of ttps.  Note that r10-7817-ga6f400239d792d
recently changed process_template_parm to clear TYPE_CANONICAL for
TEMPLATE_TEMPLATE_PARM consistent with the tsubst comment; this patch
changes both functions to set instead of clear TYPE_CANONICAL for ttps.

These changes improve compile time of template-heavy code by around 10%
for me (with a release compiler).  For instance, compile time for the
libstdc++ test std/ranges/adaptors/all.cc drops from 1.45s to 1.25s, and
for the range-v3 test test/view/zip.cpp from 5.38s to 4.88s.  The total
number of calls to structural_comptypes for the latter test drops from
10.5M to 1.8M.  Memory use is unaffected (as expected).

The new testcase verifies we check the r12-3766 PARM_DECL special case
in bind_template_template_parm too.

gcc/cp/ChangeLog:

* cp-tree.h (any_template_arguments_need_structural_equality_p):
Declare.
* pt.cc (struct ctp_hasher): Define.
(ctp_table): Define.
(canonical_type_parameter): Use it.
(process_template_parm): Set TYPE_CANONICAL for
TEMPLATE_TEMPLATE_PARM too.
(lookup_template_class_1): Remove now outdated comment for the
any_template_arguments_need_structural_equality_p test.
(tsubst) <case TEMPLATE_TEMPLATE_PARM, etc>: Don't specifically
clear TYPE_CANONICAL for ttps.  Set TYPE_CANONICAL on the
substituted type later.
(any_template_arguments_need_structural_equality_p): Return
true for any_targ_node.  Don't return true just because a
template argument uses structural equality.  Add comment for
the PARM_DECL special case.
(rewrite_template_parm): Set TYPE_CANONICAL on the rewritten
parm's type later.
* tree.cc (bind_template_template_parm): Set TYPE_CANONICAL
when safe to do so.
* typeck.cc (structural_comptypes) [check_alias]: Increment
processing_template_decl before checking
dependent_alias_template_spec_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-52830a.C: New test.

d: add 'final' and 'override' to gcc/d/*.cc 'visit' impls

gcc/d/ChangeLog:
* decl.cc: Add "final" and "override" to all "visit" vfunc decls
as appropriate.
* expr.cc: Likewise.
* toir.cc: Likewise.
* typeinfo.cc: Likewise.
* types.cc: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Cache Management Operation instructions testcases

This commit adds testcases about CMO instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: New test.
* gcc.target/riscv/cmo-zicbom-2.c: New test.
* gcc.target/riscv/cmo-zicbop-1.c: New test.
* gcc.target/riscv/cmo-zicbop-2.c: New test.
* gcc.target/riscv/cmo-zicboz-1.c: New test.
* gcc.target/riscv/cmo-zicboz-2.c: New test.

RISC-V: Cache Management Operation instructions

This commit adds cbo.clea, cbo.flush, cbo.inval, cbo.zero, prefetch.i,
prefetch.r and prefetch.w instructions.

diff with the previous version:
We use unspec_volatile instead of unspec for those cache operations.
We use UNSPECV instead of UNSPEC and move them to unspecv.

gcc/ChangeLog:

* config/riscv/predicates.md (imm5_operand): Add a new operand type for
prefetch instructions.
* config/riscv/riscv-builtins.cc (AVAIL): Add new AVAILs for CMO ISA
Extensions.
(RISCV_ATYPE_SI): New.
(RISCV_ATYPE_DI): New.
* config/riscv/riscv-ftypes.def (0): New.
(1): New.
* config/riscv/riscv.md (riscv_clean_<mode>): New.
(riscv_flush_<mode>): New.
(riscv_inval_<mode>): New.
(riscv_zero_<mode>): New.
(prefetch): New.
(riscv_prefetchi_<mode>): New.
* config/riscv/riscv-cmo.def: New file.

RISC-V: Add mininal support for Zicbo[mzp]

This commit adds minimal support for 'Zicbom','Zicboz' and 'Zicbop' extensions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zicbom, zicboz, zicbop extensions.
* config/riscv/riscv-opts.h (MASK_ZICBOZ): New.
(MASK_ZICBOM): New.
(MASK_ZICBOP): New.
(TARGET_ZICBOZ): New.
(TARGET_ZICBOM): New.
(TARGET_ZICBOP): New.
* config/riscv/riscv.opt (riscv_zicmo_subext): New.

tree-vect-slp-patterns.cc: add 'final' and 'override' to vect_pattern::build impls

gcc/ChangeLog:
* tree-vect-slp-patterns.cc: Add "final" and "override" to
vect_pattern::build impls as appropriate.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

ipa: add 'final' and 'override' to call_summary_base vfunc impls

gcc/ChangeLog:
* ipa-cp.cc: Add "final" and "override" to call_summary_base vfunc
implementations, removing redundant "virtual" as appropriate.
* ipa-fnsummary.h: Likewise.
* ipa-modref.cc: Likewise.
* ipa-param-manipulation.cc: Likewise.
* ipa-profile.cc: Likewise.
* ipa-prop.h: Likewise.
* ipa-pure-const.cc: Likewise.
* ipa-reference.cc: Likewise.
* ipa-sra.cc: Likewise.
* symbol-summary.h: Likewise.
* symtab-thunks.cc: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Revert "Mitigate -Wmaybe-uninitialized in expmed.cc."

This reverts commit c5c523723149112d117a6d3b259dfd95b032a545.

Mitigate -Wmaybe-uninitialized in expmed.cc.

It's the warning I see every time I build GCC:

In file included from /home/marxin/Programming/gcc/gcc/coretypes.h:478,
                 from /home/marxin/Programming/gcc/gcc/expmed.cc:26:
In function ‘poly_uint16 mode_to_bytes(machine_mode)’,
    inlined from ‘typename if_nonpoly<typename T::measurement_type>::type GET_MODE_SIZE(const T&) [with T = scalar_int_mode]’ at /home/marxin/Programming/gcc/gcc/machmode.h:647:24,
    inlined from ‘rtx_def* emit_store_flag_1(rtx, rtx_code, rtx, rtx, machine_mode, int, int, machine_mode)’ at /home/marxin/Programming/gcc/gcc/expmed.cc:5728:56:
/home/marxin/Programming/gcc/gcc/machmode.h:550:49: warning: ‘*(unsigned int*)((char*)&int_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))’ may be used uninitialized [-Wmaybe-uninitialized]
  550 |           ? mode_size_inline (mode) : mode_size[mode]);
      |                                                 ^~~~
/home/marxin/Programming/gcc/gcc/expmed.cc: In function ‘rtx_def* emit_store_flag_1(rtx, rtx_code, rtx, rtx, machine_mode, int, int, machine_mode)’:
/home/marxin/Programming/gcc/gcc/expmed.cc:5657:19: note: ‘*(unsigned int*)((char*)&int_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))’ was declared here
5657 |   scalar_int_mode int_mode;
      |                   ^~~~~~~~

Can we please mitigate it?

gcc/ChangeLog:

* expmed.cc (emit_store_flag_1): Mitigate -Wmaybe-uninitialized
warning.

Extend --with-zstd documentation

The patch that was so far added for documenting --with-zstd is pretty
minimal:
  - it refers to undocumented options --with-zstd-include and
    --with-zstd-lib;
  - it suggests that --with-zstd can be used without an argument;
  - it does not clarify how this option applies to cross-compilation.

How about adding the same details as for the --with-isl,
--with-isl-include, --with-isl-lib options, mutatis mutandis? This patch
does that.

PR other/105527

gcc/ChangeLog:

* doc/install.texi (Configuration): Add more details about --with-zstd.
Document --with-zstd-include and --with-zstd-lib

Signed-off-by: Bruno Haible <bruno@clisp.org>

middle-end/105711 - properly handle CONST_INT when expanding bitfields

This is another place where we fail to pass down the mode of a
CONST_INT.

2022-05-24 Richard Biener <rguenther@suse.de>

PR middle-end/105711
* expmed.cc (extract_bit_field_as_subreg): Add op0_mode parameter
and use it.
(extract_bit_field_1): Pass down the mode of op0 to
extract_bit_field_as_subreg.

* gcc.target/i386/pr105711.c: New testcase.

OpenMP: Support nowait with Fortran [PR105378]

Fortran part to C/C++/libgomp
commit r13-724-gb43836914bdc2a37563cf31359b2c4803bfe4374

gcc/fortran/

PR c/105378
* openmp.cc (gfc_match_omp_taskwait): Accept nowait.

gcc/testsuite/

PR c/105378
* gfortran.dg/gomp/taskwait-depend-nowait-1.f90: New.

libgomp/

PR c/105378
* libgomp.texi (OpenMP 5.1): Set 'taskwait nowait' to 'Y'.
* testsuite/libgomp.fortran/taskwait-depend-nowait-1.f90: New.

RISC-V: Inhibit FP <--> int register moves via tune param

Under extreme register pressure, compiler can use FP <--> int
moves as a cheap alternate to spilling to memory.
This was seen with SPEC2017 FP benchmark 507.cactu:
ML_BSSN_Advect.cc:ML_BSSN_Advect_Body()

| fmv.d.x fa5,s9 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1
| .LVL325:
| ld s9,184(sp) # _12469, %sfp
| ...
| .LVL339:
| fmv.x.d s4,fa5 # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1
|

The FMV instructions could be costlier (than stack spill) on certain
micro-architectures, thus this needs to be a per-cpu tunable
(default being to inhibit on all existing RV cpus).

Testsuite run with new test reports 10 failures without the fix
corresponding to the build variations of pr105666.c

| === gcc Summary ===
|
| # of expected passes 123318   (+10)
| # of unexpected failures 34       (-10)
| # of unexpected successes 4
| # of expected failures 780
| # of unresolved testcases 4
| # of unsupported tests 2796

gcc/ChangeLog:

* config/riscv/riscv.cc: (struct riscv_tune_param): Add
  fmv_cost.
(rocket_tune_info): Add default fmv_cost 8.
(sifive_7_tune_info): Ditto.
(thead_c906_tune_info): Ditto.
(optimize_size_tune_info): Ditto.
(riscv_register_move_cost): Use fmv_cost for int<->fp moves.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr105666.c: New test.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

openmp: Add taskwait nowait depend support [PR105378]

This patch adds support for (so far C/C++)
  #pragma omp taskwait nowait depend(...)
directive, which is like
  #pragma omp task depend(...)
  ;
but slightly optimized on the library side, so that it creates
the task only for the purpose of dependency tracking and doesn't actually
schedule it and wait for it when the dependencies are satisfied, instead
makes its dependencies satisfied right away.

2022-05-24  Jakub Jelinek  <jakub@redhat.com>

PR c/105378
gcc/
* omp-builtins.def (BUILT_IN_GOMP_TASKWAIT_DEPEND_NOWAIT): New
builtin.
* gimplify.cc (gimplify_omp_task): Diagnose taskwait with nowait
clause but no depend clauses.
* omp-expand.cc (expand_taskwait_call): Use
BUILT_IN_GOMP_TASKWAIT_DEPEND_NOWAIT rather than
BUILT_IN_GOMP_TASKWAIT_DEPEND if nowait clause is present.
gcc/c/
* c-parser.cc (OMP_TASKWAIT_CLAUSE_MASK): Add nowait clause.
gcc/cp/
* parser.cc (OMP_TASKWAIT_CLAUSE_MASK): Add nowait clause.
gcc/testsuite/
* c-c++-common/gomp/taskwait-depend-nowait-1.c: New test.
libgomp/
* libgomp_g.h (GOMP_taskwait_depend_nowait): Declare.
* libgomp.map (GOMP_taskwait_depend_nowait): Export at GOMP_5.1.1.
* task.c (empty_task): New function.
(gomp_task_run_post_handle_depend_hash): Declare earlier.
(gomp_task_run_post_handle_depend): Declare.
(GOMP_task): Optimize fn == empty_task if there is nothing to wait
for.
(gomp_task_run_post_handle_dependers): Optimize task->fn == empty_task.
(GOMP_taskwait_depend_nowait): New function.
* testsuite/libgomp.c-c++-common/taskwait-depend-nowait-1.c: New test.

tree-optimization/100221 - improve DSE a bit

When facing multiple PHI defs and one feeding the other we can
postpone processing uses of one and thus can proceed.

2022-05-20 Richard Biener <rguenther@suse.de>

PR tree-optimization/100221
* tree-ssa-dse.cc (contains_phi_arg): New function.
(dse_classify_store): Postpone PHI defs that feed another PHI in defs.

* gcc.dg/tree-ssa/ssa-dse-44.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-45.c: Likewise.

tree-optimization/105629 - spaceship recognition regression

With the extra GENERIC folding we now do to
(unsigned int) __v._M_value & 1 != (unsigned int) __v._M_value
we end up with a sign-extending conversion to unsigned int
rather than the sign-conversion to unsigned char we expect.
Relaxing that fixes the regression.

2022-05-23 Richard Biener <rguenther@suse.de>

PR tree-optimization/105629
* tree-ssa-phiopt.cc (spaceship_replacement): Allow
a sign-extending conversion.

testsuite/rs6000: Adjust gcc.target/powerpc/pr78604.c [PR105706]

Commit r13-707 adjusts the below gimple:

  iftmp.7_4 = _1 < _2 ? val2_7(D) : val1_8(D);

to

  _3 = _1 >= _2;
  iftmp.7_4 = _3 ? val1_8(D) : val2_7(D);

and result in one more vect_model_simple_cost dumping for each
function.  Need to adjust the match count accordingly.

PR testsuite/105706

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr78604.c: Adjust.

rs6000: Skip debug insns for union [PR105627]

As PR105627 exposes, pass analyze_swaps should skip debug
insn when doing unionfind_union. One debug insn can use
several pseudos, if we take debug insn into account, we can
union those insns defining them and generate some unexpected
unions.

Based on the assumption that it's impossible to have one
pseudo which is defined by one debug insn but is used by one
nondebug insn, we just asserts debug insn never shows up in
function union_defs.

PR target/105627

gcc/ChangeLog:

* config/rs6000/rs6000-p8swap.cc (union_defs): Assert def_insn can't
be a debug insn.
(union_uses): Skip debug use_insn.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr105627.c: New test.

Daily bump.

x86: Avoid uninitialized variable in PR target/104441 test

PR target/104441
* gcc.target/i386/pr104441-1a.c (load8bit_4x4_avx2): Initialize
src23.

RISC-V: Enable TARGET_SUPPORTS_WIDE_INT

This is at par with other major arches such as aarch64, i386, s390 ...

gcc/ChangeLog

* config/riscv/predicates.md (const_0_operand): Remove
const_double.
* config/riscv/riscv.cc (riscv_rtx_costs): Add check for
CONST_DOUBLE.
* config/riscv/riscv.h (TARGET_SUPPORTS_WIDE_INT): New define.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

test plugins: use "final" and "override" directly, rather than via macros

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/analyzer_gil_plugin.c: Replace uses of "FINAL" and
"OVERRIDE" with "final" and "override".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

jit: use 'final' and 'override' where appropriate

gcc/jit/ChangeLog:
* jit-recording.h: Add "final" and "override" to all vfunc
implementations that were missing them, as appropriate.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: use 'final' and 'override' where appropriate

gcc/analyzer/ChangeLog:
* call-info.cc: Add "final" and "override" to all vfunc
implementations that were missing them, as appropriate.
* engine.cc: Likewise.
* region-model.cc: Likewise.
* sm-malloc.cc: Likewise.
* supergraph.h: Likewise.
* svalue.cc: Likewise.
* varargs.cc: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[x86_64]: Zhaoxin lujiazui enablement

This patch fix Zhaoxin CPU vendor ID detection problem and add zhaoxin
"lujiazui" processor support. Currently gcc can't recognize Zhaoxin CPU
(vendor ID "CentaurHauls" and "Shanghai") if user use -march=native option,
which is confusing for users. This patch enables -march=native in zhaoxin
family 7th processor and -march/-mtune=lujiazui, costs and tunning are set
according to the characteristics of the processor.
We add a new md file to describe lujiazui pipeline.

Testing:
Bootstrap is ok, and no regressions for i386/x86-64 testsuite.

Background:
Related Zhaoxin linux kernel patch can be found at:
https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bdffb@zhaoxin.com/

Related Zhaoxin glibc patch can be found at:
https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Detect
the specific type of Zhaoxin CPU, and return Zhaoxin CPU name.
(cpu_indicator_init): Handle Zhaoxin processors.
* common/config/i386/i386-common.cc: Add lujiazui.
* common/config/i386/i386-cpuinfo.h (enum processor_vendor): Add
VENDOR_ZHAOXIN.
(enum processor_types): Add ZHAOXIN_FAM7H.
(enum processor_subtypes): Add ZHAOXIN_FAM7H_LUJIAZUI.
* config.gcc: Add lujiazui.
* config/i386/cpuid.h (signature_SHANGHAI_ebx): Add
Signatures for zhaoxin
(signature_SHANGHAI_ecx): Ditto.
(signature_SHANGHAI_edx): Ditto.
* config/i386/driver-i386.cc (host_detect_local_cpu): Let
-march=native recognize lujiazui processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add lujiazui.
* config/i386/i386-options.cc (m_LUJIAZUI): New_definition.
* config/i386/i386.h (enum processor_type): Ditto.
* config/i386/i386.md: Add lujiazui.
* config/i386/x86-tune-costs.h (struct processor_costs): Add
lujiazui costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add lujiazui.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Add lujiazui Tunnings.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_USE_INCDEC): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
* doc/extend.texi: Add details about lujiazui.
* doc/invoke.texi: Add details about lujiazui.
* config/i386/lujiazui.md: Introduce lujiazui cpu and include new md file.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Test -arch=lujiauzi and -tune=lujiazui.
* g++.target/i386/mv32.C: Ditto.

Signed-off-by: mayshao <mayshao-oc@zhaoxin.com>

testsuite: mallign: Handle word size of 1 byte

This patch fixes a spurious warning for the pru-unknown-elf target:
gcc/testsuite/gcc.dg/mallign.c:12:27: warning: ignoring return value of 'malloc' declared with attribute 'warn_unused_result' [-Wunused-result]

For 8-bit targets the resulting mask ignores all bits in the value
returned by malloc. Fix by first checking the target word size.

gcc/testsuite/ChangeLog:

* gcc.dg/mallign.c: Skip check if sizeof(word)==1.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

demangler: C++ modules support

This adds demangling support for C++ modules. A new 'W' component
along with augmented behaviour of 'S' components.

include/
* demangle.h (enum demangle_component_type): Add module components.
libiberty/
* cp-demangle.c (d_make_comp): Adjust.
(d_name, d_prefix): Adjust subst handling. Add module handling.
(d_maybe_module_name): New.
(d_unqualified_name): Add incoming module parm. Handle it. Adjust all callers.
(d_special_name): Add 'GI' support.
(d_count_template_scopes): Adjust.
(d_print_comp_inner): Print module.
* testsuite/demangle-expected: New test cases

tilepro: fix missing ARRAY_SIZE macro

gcc/ChangeLog:

* config/tilepro/gen-mul-tables.cc (ARRAY_SIZE): Add new macro.

Remove forward_propagate_into_cond

This is a first cleanup opportunity from the COND_EXPR gimplification
which allows us to remove now redundant forward_propagate_into_cond.

2022-05-23 Richard Biener <rguenther@suse.de>

* tree-ssa-forwprop.cc (forward_propagate_into_cond): Remove.
(pass_forwprop::execute): Do not propagate into COND_EXPR conditions.

Remove is_gimple_condexpr

This removes is_gimple_condexpr, note the vectorizer via patterns
still creates COND_EXPRs with embedded GENERIC conditions and has
a reference to the function in comments. Otherwise is_gimple_condexpr
is now equal to is_gimple_val.

2022-05-16 Richard Biener <rguenther@suse.de>

* gimple-expr.cc (is_gimple_condexpr): Remove.
* gimple-expr.h (is_gimple_condexpr): Likewise.
* gimplify.cc (gimplify_expr): Remove is_gimple_condexpr usage.
* tree-if-conv.cc (set_bb_predicate): Likewie.
(add_to_predicate_list): Likewise.
(gen_phi_arg_condition): Likewise.
(predicate_scalar_phi): Likewise.
(predicate_statements): Likewise.

Force the selection operand of a GIMPLE COND_EXPR to be a register

This goes away with the selection operand allowed to be a GENERIC
tcc_comparison tree.  It keeps those for vectorizer pattern recog,
those are short lived and removing this instance is a bigger task.

The patch doesn't yet remove dead code and functionality, that's
left for a followup.  Instead the patch makes sure to produce
valid GIMPLE IL and continue to optimize COND_EXPRs where the
previous IL allowed and the new IL showed regressions in the testsuite.

2022-05-16  Richard Biener  <rguenther@suse.de>

* gimple-expr.cc (is_gimple_condexpr): Equate to is_gimple_val.
* gimplify.cc (gimplify_pure_cond_expr): Gimplify the condition
as is_gimple_val.
* gimple-fold.cc (valid_gimple_rhs_p): Simplify.
* tree-cfg.cc (verify_gimple_assign_ternary): Likewise.
* gimple-loop-interchange.cc (loop_cand::undo_simple_reduction):
Build the condition of the COND_EXPR separately.
* tree-ssa-loop-im.cc (move_computations_worker): Likewise.
* tree-vect-generic.cc (expand_vector_condition): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Likewise.
* vr-values.cc (simplify_using_ranges::simplify): Likewise.
* tree-vect-patterns.cc: Add comment indicating we are
building invalid COND_EXPRs and why.
* omp-expand.cc (expand_omp_simd): Gimplify the condition
to the COND_EXPR separately.
(expand_omp_atomic_cas): Note part that should be unreachable
now.
* tree-ssa-forwprop.cc (forward_propagate_into_cond): Adjust
condition for valid replacements.
* tree-if-conv.cc (predicate_bbs): Simulate previous
re-folding of the condition in folded COND_EXPRs which
is necessary because of unfolded GIMPLE_CONDs in the IL
as in for example gcc.dg/fold-bopcond-1.c.
* gimple-range-gori.cc (gori_compute::condexpr_adjust):
Handle that the comparison is now in the def stmt of
the select operand.  Required by gcc.dg/pr104526.c.

* gcc.dg/gimplefe-27.c: Adjust.
* gcc.dg/gimplefe-45.c: Likewise.
* gcc.dg/pr101145-2.c: Likewise.
* gcc.dg/pr98211.c: Likewise.
* gcc.dg/torture/pr89595.c: Likewise.
* gcc.dg/tree-ssa/divide-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-lim-12.c: Likewise.

OpenMP: Handle descriptors in target's firstprivate [PR104949]

For allocatable/pointer arrays, a firstprivate to a device
not only needs to privatize the descriptor but also the actual
data. This is implemented as:
firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
where the address of x in device memory is saved in hostaddrs[i]
by libgomp and the middle end actually passes hostaddrs[i]' to
attach.

As side effect, has_device_addr(array_desc) had to be changed:
before, it was converted to firstprivate in the front end; now
it is handled in omp-low.cc as has_device_addr requires a shallow
firstprivate (not touching the data pointer) while the normal
firstprivate requires (now) a deep firstprivate.

gcc/fortran/ChangeLog:

PR fortran/104949
* f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine.
* trans-openmp.cc (gfc_omp_array_size): New.
(gfc_trans_omp_variable_list): Never turn has_device_addr
to firstprivate.
* trans.h (gfc_omp_array_size): New.

gcc/ChangeLog:

PR fortran/104949
* langhooks-def.h (lhd_omp_array_size): New.
(LANG_HOOKS_OMP_ARRAY_SIZE): Define.
(LANG_HOOKS_DECLS): Add it.
* langhooks.cc (lhd_omp_array_size): New.
* langhooks.h (struct lang_hooks_for_decls): Add hook.
* omp-low.cc (scan_sharing_clauses, lower_omp_target):
Handle GOMP_MAP_FIRSTPRIVATE for array descriptors.

libgomp/ChangeLog:

PR fortran/104949
* target.c (gomp_map_vars_internal, copy_firstprivate_data):
Support attach for GOMP_MAP_FIRSTPRIVATE.
* testsuite/libgomp.fortran/target-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.

Some additional ix86_rtx_costs clean-ups: NEG, AND, andn and pandn.

Double-word NOT requires two operations, but double-word NEG requires
three operations.  Using SSE, vector NOT requires a pxor with -1, but
AND of NOT is cheap thanks to the existence of pandn.  There's also some
legacy (aka incorrect) logic explicitly testing for DImode [independently
of TARGET_64BIT] in determining the cost of logic operations that's not
required.

2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) <case AND>: Split from
XOR/IOR case.  Account for two instructions for double-word
operations.  In case of vector pandn, account for single
instruction.  Likewise for integer andn with TARGET_BMI.
<case NOT>: Vector NOT requires more than 1 instruction (pxor).
<case NEG>: Double-word negation requires 3 instructions.

RISC-V: Fix canonical extension order (K and J)

This commit fixes canonical extension order to follow the RISC-V ISA
Manual draft-20210402-1271737 or later.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_supported_std_ext):
Fix "K" extension prefix to be placed before "J".
* config/riscv/arch-canonicalize: Likewise.

Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>

Increase move cost between mask and gpr.

kmovd only uses port5 which is often the bottleneck of
performance. Also from latency perspective, spill and reload mostly
could be STLF or even MRN which only take 1 cycle.

So the patch increase move cost between gpr and mask to be the same as
gpr <-> sse register.

gcc/ChangeLog:

* config/i386/x86-tune-costs.h (skylake_cost): Increase gpr
<-> mask cost from 5 to 6.
(icelake_cost): Ditto.

gcc/testsuite/ChangeLog:
* gcc.target/i386/spill_to_mask-1.c: New test.

Daily bump.

testsuite: Skip vectorize tests for PRU

PRU has single-cycle constant cost for any jump, and it cannot
vectorise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/gen-vect-11.c: For PRU target, skip the
vectorizing checks in tree dumps.
* gcc.dg/tree-ssa/gen-vect-11a.c: Ditto.
* gcc.dg/tree-ssa/gen-vect-2.c: Ditto.
* gcc.dg/tree-ssa/gen-vect-25.c: Ditto.
* gcc.dg/tree-ssa/gen-vect-26.c: Ditto.
* gcc.dg/tree-ssa/gen-vect-28.c: Ditto.
* gcc.dg/tree-ssa/gen-vect-32.c: Ditto.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

testsuite: Adjust pr91088.c for default_packed targets

PR ipa/91088

gcc/testsuite/ChangeLog:

* gcc.dg/ipa/pr91088.c: Adjust member offset checks to
accommodate targets which pack structures by default.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

testsuite: Skip gcc.dg/pr46647.c for PRU

Like AVR and Cris, PRU has no alignment requirements. Thus it is
also affected by PR53535.

PR middle-end/53535

gcc/testsuite/ChangeLog:

* gcc.dg/pr46647.c: Skip for pru target.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

testsuite: Skip ifcvt-4.c for PRU

PRU has no condition code and conditional moves.

gcc/testsuite/ChangeLog:

* gcc.dg/ifcvt-4.c: Skip for PRU.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

testsuite: Mark extra warnings for default_packed

If the target uses packed structs by default, there are no trailing
padding bytes allocated. Hence extra warnings are emitted.

gcc/testsuite/ChangeLog:

* gcc.dg/Warray-bounds-48-novec.c: Add expected warnings
if target packs the structs by default.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

Daily bump.

testsuite: add missing dg-require-effective-target fpic

Require effective target fpic for newly added test.

gcc/testsuite/
* g++.dg/ext/visibility/visibility-local-extern1.C: Add missing
dg-require-effective-target fpic.

libstdc++: Reduce <random> test iterations for simulators

Some of these tests take several minutes on a simulator like cris-elf,
so we can conditionally run fewer iterations. The testDiscreteDist
helper already supports custom sizes so we just need to make use of that
when { target simulator } matches.

The relevant code is sufficiently tested on other targets, so we're not
losing anything by only running a small number of iterators for sims.

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/random/bernoulli_distribution/operators/values.cc:
Run fewer iterations for simulator targets.
* testsuite/26_numerics/random/binomial_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/discrete_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/geometric_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/negative_binomial_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/poisson_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/uniform_int_distribution/operators/values.cc:
Likewise.

AArch64: Improve rotate patterns

Improve and generalize rotate patterns. Rotates by more than half the
bitwidth of a register are canonicalized to rotate left. Many existing
shift patterns don't handle this case correctly, so add rotate left to
the shift iterator and convert rotate left into ror during assembly
output. Add missing zero_extend patterns for shifted BIC, ORN and EON.

gcc/
* config/aarch64/aarch64.md
(and_<SHIFT:optab><mode>3_compare0): Support rotate left.
(and_<SHIFT:optab>si3_compare0_uxtw): Likewise.
(<LOGICAL:optab>_<SHIFT:optab><mode>3): Likewise.
(<LOGICAL:optab>_<SHIFT:optab>si3_uxtw): Likewise.
(one_cmpl_<optab><mode>2): Likewise.
(<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3): Likewise.
(<LOGICAL:optab>_one_cmpl_<SHIFT:optab>sidi_uxtw): New pattern.
(eor_one_cmpl_<SHIFT:optab><mode>3_alt): Support rotate left.
(eor_one_cmpl_<SHIFT:optab>sidi3_alt_ze): Likewise.
(and_one_cmpl_<SHIFT:optab><mode>3_compare0): Likewise.
(and_one_cmpl_<SHIFT:optab>si3_compare0_uxtw): Likewise.
(and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse): Likewise.
(and_<SHIFT:optab><mode>3nr_compare0): Likewise.
(*<optab>si3_insn_uxtw): Use SHIFT_no_rotate.
(rolsi3_insn_uxtw): New pattern.
* config/aarch64/iterators.md (SHIFT): Add rotate left.
(SHIFT_no_rotate): Add new iterator.
(SHIFT:shift): Print rotate left as ror.
(is_rotl): Add test for left rotate.

gcc/testsuite/
* gcc.target/aarch64/ror_2.c: New test.
* gcc.target/aarch64/ror_3.c: New test.

AArch64: Cleanup CPU option processing code

The --with-cpu/--with-arch configure option processing not only checks valid
arguments but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.
This isn't used however since a --with-cpu is translated into a -mcpu option
which is processed as if written on the command-line (so TARGET_CPU_DEFAULT
is never accessed).

So remove all the complex processing and bitmask, and just validate the
option. Fix a bug that always reports valid architecture extensions as invalid.
As a result the CPU processing in aarch64.c can be simplified.

gcc/
* config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch
processing. Add support for architectural extensions.
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove
AARCH64_CPU_DEFAULT_FLAGS.
(TARGET_CPU_NBITS): Remove.
(TARGET_CPU_MASK): Remove.
* config/aarch64/aarch64.cc (AARCH64_CPU_DEFAULT_FLAGS): Remove define.
(get_tune_cpu): Assert CPU is always valid.
(get_arch): Assert architecture is always valid.
(aarch64_override_options): Cleanup CPU selection code and simplify logic.
(aarch64_option_restore): Remove unnecessary checks on tune.

Use "final" and "override" directly, rather than via macros

As of GCC 11 onwards we have required a C++11 compiler, such as GCC 4.8
or later. On the assumption that any such compiler correctly implements
"final" and "override", this patch updates the source tree to stop using
the FINAL and OVERRIDE macros from ansidecl.h, in favor of simply using
"final" and "override" directly.

libcpp/ChangeLog:
* lex.cc: Replace uses of "FINAL" and "OVERRIDE" with "final" and
"override".

gcc/analyzer/ChangeLog:
* analyzer-pass.cc: Replace uses of "FINAL" and "OVERRIDE" with
"final" and "override".
* call-info.h: Likewise.
* checker-path.h: Likewise.
* constraint-manager.cc: Likewise.
* diagnostic-manager.cc: Likewise.
* engine.cc: Likewise.
* exploded-graph.h: Likewise.
* feasible-graph.h: Likewise.
* pending-diagnostic.h: Likewise.
* region-model-impl-calls.cc: Likewise.
* region-model.cc: Likewise.
* region-model.h: Likewise.
* region.h: Likewise.
* sm-file.cc: Likewise.
* sm-malloc.cc: Likewise.
* sm-pattern-test.cc: Likewise.
* sm-sensitive.cc: Likewise.
* sm-signal.cc: Likewise.
* sm-taint.cc: Likewise.
* state-purge.h: Likewise.
* store.cc: Likewise.
* store.h: Likewise.
* supergraph.h: Likewise.
* svalue.h: Likewise.
* trimmed-graph.h: Likewise.
* varargs.cc: Likewise.

gcc/c-family/ChangeLog:
* c-format.cc: Replace uses of "FINAL" and "OVERRIDE" with "final"
and "override".
* c-pretty-print.h: Likewise.

gcc/cp/ChangeLog:
* cxx-pretty-print.h: Replace uses of "FINAL" and "OVERRIDE" with
"final" and "override".
* error.cc: Likewise.

gcc/jit/ChangeLog:
* jit-playback.h: Replace uses of "FINAL" and "OVERRIDE" with
"final" and "override".
* jit-recording.cc: Likewise.
* jit-recording.h: Likewise.

gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc: Replace uses of
"FINAL" and "OVERRIDE" with "final" and "override".
* config/aarch64/aarch64-sve-builtins-functions.h: Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc: Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc: Likewise.
* diagnostic-path.h: Likewise.
* digraph.cc: Likewise.
* gcc-rich-location.h: Likewise.
* gimple-array-bounds.cc: Likewise.
* gimple-loop-versioning.cc: Likewise.
* gimple-range-cache.cc: Likewise.
* gimple-range-cache.h: Likewise.
* gimple-range-fold.cc: Likewise.
* gimple-range-fold.h: Likewise.
* gimple-range-tests.cc: Likewise.
* gimple-range.h: Likewise.
* gimple-ssa-evrp.cc: Likewise.
* input.cc: Likewise.
* json.h: Likewise.
* read-rtl-function.cc: Likewise.
* tree-complex.cc: Likewise.
* tree-diagnostic-path.cc: Likewise.
* tree-ssa-ccp.cc: Likewise.
* tree-ssa-copy.cc: Likewise.
* tree-vrp.cc: Likewise.
* value-query.h: Likewise.
* vr-values.h: Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libgomp: Add new runtime routines omp_target_memcpy_async and omp_target_memcpy_rect_async

This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.

In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:

int depobj_count
omp_depend_t *depobj_list

integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)

The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:

"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)

"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)

Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).

gcc/ChangeLog:

* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.

libgomp/ChangeLog:

* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.

libgcc: use __builtin_clz and __builtin_ctz in libbid

This patch replaces libbid's implementations of clz and ctz for 32 and
64 bits inputs which used several masks, and switches to the
corresponding builtins. This will provide a better implementation,
especially on targets with clz/ctz instructions.

2022-05-06 Christophe Lyon <christophe.lyon@arm.com>

libgcc/config/libbid/ChangeLog:

* bid_binarydecimal.c (CLZ32_MASK16): Delete.
(CLZ32_MASK8): Delete.
(CLZ32_MASK4): Delete.
(CLZ32_MASK2): Delete.
(CLZ32_MASK1): Delete.
(clz32_nz): Use __builtin_clz.
(ctz32_1bit): Delete.
(ctz32): Use __builtin_ctz.
(CLZ64_MASK32): Delete.
(CLZ64_MASK16): Delete.
(CLZ64_MASK8): Delete.
(CLZ64_MASK4): Delete.
(CLZ64_MASK2): Delete.
(CLZ64_MASK1): Delete.
(clz64_nz): Use __builtin_clzl.
(ctz64_1bit): Delete.
(ctz64): Use __builtin_ctzl.

libgcc: Add support for HF mode (aka _Float16) in libbid

This patch adds support for trunc and extend operations between HF
mode (_Float16) and Decimal Floating Point formats (_Decimal32,
_Decimal64 and _Decimal128).

For simplicity we rely on the implicit conversions inserted by the
compiler between HF and SD/DF/TF modes. The existing bid*_to_binary*
and binary*_to_bid* functions are non-trivial and at this stage it is
not clear if there is a performance-critical use case involving _Float16
and _Decimal* formats.

The patch also adds two executable tests, to make sure the right
functions are called, available (link phase) and functional.

Tested on aarch64 and x86_64. The number of symbol matches in the
testcases includes the .global XXX to avoid having to match different
call instructions for different targets.

2022-05-04 Christophe Lyon <christophe.lyon@arm.com>

libgcc/ChangeLog:

* Makefile.in (D32PBIT_FUNCS): Add _hf_to_sd and _sd_to_hf.
(D64PBIT_FUNCS): Add _hf_to_dd and _dd_to_hf.
(D128PBIT_FUNCS): Add _hf_to_td _td_to_hf.

libgcc/config/libbid/ChangeLog:

* bid_gcc_intrinsics.h (LIBGCC2_HAS_HF_MODE): Define according to
__LIBGCC_HAS_HF_MODE__.
(BID_HAS_HF_MODE): Define.
(HFtype): Define.
(__bid_extendhfsd): New prototype.
(__bid_extendhfdd): Likewise.
(__bid_extendhftd): Likewise.
(__bid_truncsdhf): Likewise.
(__bid_truncddhf): Likewise.
(__bid_trunctdhf): Likewise.
* _dd_to_hf.c: New file.
* _hf_to_dd.c: New file.
* _hf_to_sd.c: New file.
* _hf_to_td.c: New file.
* _sd_to_hf.c: New file.
* _td_to_hf.c: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/convert-dfp-2.c: New test.
* gcc.dg/torture/convert-dfp.c: New test.

testsuite: Add C++ unwinding tests with Decimal Floating-Point

These tests exercise exception handling with Decimal Floating-Point
type.

dfp-1.C and dfp-2.C check that thrown objects of such types are
properly caught, whether when using C++ classes (decimalXX) or via GCC
mode attributes.

dfp-saves-aarch64.C checks that such objects are properly restored,
and has to use the mode attribute trick because objects of decimalXX
class type cannot be assigned to a register variable.

2022-05-03 Christophe Lyon <christophe.lyon@arm.com>

gcc/testsuite/
* g++.dg/eh/dfp-1.C: New test.
* g++.dg/eh/dfp-2.C: New test.
* g++.dg/eh/dfp-saves-aarch64.C: New test.

testsuite: enable more BID DFP tests for AArch64

Some tests for the BID format are currently restricted to i?86 and
x86_64, but they also pass on AArch64, so this patch enables them.

Since all these tests are related to the BID format, it seems useful
to introduce a new effective-target (dfp_bid) instead of adding
aarch64 to the current target list.

2022-04-28 Christophe Lyon <christophe.lyon@arm.com>

gcc/
* doc/sourcebuild.texi (Decimal floating point attributes): Document
dfp_bid effective-target.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_dfp_bid): New.
* gcc.dg/dfp/bid-non-canonical-d128-1.c: Use dfp_bid
effective-target.
* gcc.dg/dfp/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-4.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-2.c: Likewise.

testsuite: Add new tests for DFP under aarch64/aapcs64

This patch copies all existing tests involving float/double/long
double types and replaces them with _Decimal32/_Decimal64/_Decimal128.
I thought it would be clearer/easier to maintain to do it this way
rather than adding tests for DFP types in the existing testcases,
except for func-ret-1.c and func-ret-3.c.

This makes sure all cases tested for traditional floating-point are
equally tested for decimal floating-point.

The patch also adds a test involving loading DFP values from memory.

2022-03-31 Christophe Lyon <christophe.lyon@arm.com>

gcc/testsuite/
* gcc.target/aarch64/aapcs64/aapcs64.exp: Support new dfp*.c tests.
* gcc.target/aarch64/aapcs64/func-ret-1.c: Add DFP tests.
* gcc.target/aarch64/aapcs64/func-ret-3.c: Add DFP tests.
* gcc.target/aarch64/aapcs64/type-def.h: Add DFP types.
* gcc.target/aarch64/aapcs64/dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/ice_dfp_5.c: New test.
* gcc.target/aarch64/aapcs64/test_align_dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/test_align_dfp-4.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_1.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_10.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_11.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_12.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_13.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_14.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_15.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_16.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_17.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_18.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_19.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_2.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_20.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_21.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_22.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_23.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_24.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_25.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_26.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_27.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_3.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_5.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_6.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_7.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_8.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_9.c: New test.
* gcc.target/aarch64/aapcs64/test_quad_double_dfp.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-10.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-11.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-12.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-13.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-14.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-16.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-2.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-3.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-4.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-5.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-6.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-8.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-9.c: New test.

testsuite:: Fix pr39986.c testcase for AArch64

The testcase in c-c++-common/dfp/pr39986.c detects if DFP constants
are correctly emitted in the assembly. However, AArch64 uses .word
instead of the expected .long directive. With this patch, we now
accept both.

2022-03-31 Christophe Lyon <christophe.lyon@arm.com>

gcc/testsuite/
* c-c++-common/dfp/pr39986.c: Accept .word directive.

libgcc: enable DFP for AArch64

DFP support on AArch64 relies on libgcc, so enable its DFP routines
for all AArch64 targets.

2022-03-31 Christophe Lyon <christophe.lyon@arm.com>

libgcc/
* config.host: Add t-dfprules to AArch64 targets.

libgcc: Enable XF mode conversions to/from DFP modes only if supported

Some targets do not support XF mode (eg AArch64), so don't build the
corresponding to/from DFP modes convertion routines if
__LIBGCC_HAS_XF_MODE__ is not defined.

2022-03-31 Christophe Lyon <christophe.lyon@arm.com>

libgcc/config/libbid/
* _dd_to_xf.c: Check __LIBGCC_HAS_XF_MODE__.
* _sd_to_xf.c: Likewise.
* _td_to_xf.c: Likewise.
* _xf_to_dd.c: Likewise.
* _xf_to_sd.c: Likewise.
* _xf_to_td.c: Likewise.

aarch64: Add backend support for DFP

This patch updates the aarch64 backend as needed to support DFP modes
(SD, DD and TD).

Changes v1->v2:

* Drop support for DFP modes in
  aarch64_gen_{load||store}[wb]_pair as these are only used in
  prologue/epilogue where DFP modes are not used.  Drop the
  changes to the corresponding patterns in aarch64.md, and
  useless GPF_PAIR iterator.

* In aarch64_reinterpret_float_as_int, handle DDmode the same way
  as DFmode (needed in case the representation of the
  floating-point value can be loaded using mov/movk.

* In aarch64_float_const_zero_rtx_p, reject constants with DFP
  mode: when X is zero, the callers want to emit either '0' or
  'zr' depending on the context, which is not the way 0.0 is
  represented in DFP mode (in particular fmov d0, #0 is not right
  for DFP).

* In aarch64_legitimate_constant_p, accept DFP

2022-03-31  Christophe Lyon  <christophe.lyon@arm.com>

gcc/
* config/aarch64/aarch64.cc
(aarch64_split_128bit_move): Handle DFP modes.
(aarch64_mode_valid_for_sched_fusion_p): Likewise.
(aarch64_classify_address): Likewise.
(aarch64_legitimize_address_displacement): Likewise.
(aarch64_reinterpret_float_as_int): Likewise.
(aarch64_float_const_zero_rtx_p): Likewise.
(aarch64_can_const_movi_rtx_p): Likewise.
(aarch64_anchor_offset): Likewise.
(aarch64_secondary_reload): Likewise.
(aarch64_rtx_costs): Likewise.
(aarch64_legitimate_constant_p): Likewise.
(aarch64_gimplify_va_arg_expr): Likewise.
(aapcs_vfp_sub_candidate): Likewise.
(aarch64_vfp_is_call_or_return_candidate): Likewise.
(aarch64_output_scalar_simd_mov_immediate): Likewise.
(aarch64_gen_adjusted_ldpstp): Likewise.
(aarch64_scalar_mode_supported_p): Accept DFP modes if enabled.
* config/aarch64/aarch64.md
(movsf_aarch64): Use SFD iterator and rename into
mov<mode>_aarch64.
(movdf_aarch64): Use DFD iterator and rename into
mov<mode>_aarch64.
(movtf_aarch64): Use TFD iterator and rename into
mov<mode>_aarch64.
(split pattern for move TF mode): Use TFD iterator.
* config/aarch64/iterators.md
(GPF_TF_F16_MOV): Add DFP modes.
(SFD, DFD, TFD): New iterators.
(GPF_TF): Add DFP modes.
(TX, DX, DX2): Likewise.

aarch64: Enable DFP (Decimal Floating-point) (BID format)

This patch enables DFP support on aarch64, by updating config/dfp.m4
and regenerating the involved configure scripts.
We enable the BID format.

2022-03-31 Christophe Lyon <christophe.lyon@arm.com>

config/
* dfp.m4: Add aarch64 support.
gcc/
* configure: Regenerate.
libdecnumber/
* configure: Regenerate.
libgcc/
* configure: Regenerate.

Disable snapshots from gcc-9

GCC 9 nears its end.

2022-05-20 Richard Biener <rguenther@suse.de>

maintainer-scripts/
* crontab: Disable snapshots from the gcc-9 branch.

Daily bump.

libstdc++: Avoid including <cstdint> for std::char_traits

We should prefer the __UINT_LEAST16_TYPE__ and __UINT_LEAST32_TYPE__
macros, if available, so that we don't need all of <cstdint> in every
header that uses std::char_traits.

libstdc++-v3/ChangeLog:

* include/bits/char_traits.h: Only include <cstdint> when
necessary.
* include/std/stacktrace: Use __UINTPTR_TYPE__ instead of
uintptr_t.
* src/c++11/cow-stdexcept.cc: Include <stdint.h>.
* src/c++17/floating_to_chars.cc: Likewise.
* testsuite/20_util/assume_aligned/1.cc: Include <cstdint>.
* testsuite/20_util/assume_aligned/3.cc: Likewise.
* testsuite/20_util/shared_ptr/creation/array.cc: Likewise.

libstdc++: Only include <ext/atomicity.h> for COW string

Since the COW std::string was moved to its own header, we don't need the
atomic dispatch helpers in the definition of std::__cxx11::string. Move
the inclusion of the <ext/atomicity.h> header to <bits/cow_string.h>
where it's needed.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h: Do not include <ext/atomicity.h>
here.
* include/bits/cow_string.h: Include it here.

libstdc++: Ensure pmr aliases work without <memory_resource>

Currently the alias templates for std::pmr::vector, std::pmr::string
etc. are defined using a forward declaration for polymorphic_allocator.
This means you can't actually use the alias templates unless you also
include <memory_resource>. The rationale for that is that it's a fairly
large header, and most users don't need it. This isn't uncontroversial
though, and LWG 3681 questions whether it's even conforming.

This change adds a new <bits/memory_resource.h> header with the minimum
needed to use polymorphic_allocator and the std::pmr container aliases.
Including <memory_resource> is still necessary to use the program-wide
resource objects, or the pool resources or monotonic buffer resource.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/memory_resource.h: New file.
* include/std/deque: Include <bits/memory_resource.h>.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/map: Likewise.
* include/std/memory_resource (pmr::memory_resource): Move to
new <bits/memory_resource.h> header.
(pmr::polymorphic_allocator): Likewise.
* include/std/regex: Likewise.
* include/std/set: Likewise.
* include/std/stacktrace: Likewise.
* include/std/string: Likewise.
* include/std/unordered_map: Likewise.
* include/std/unordered_set: Likewise.
* include/std/vector: Likewise.
* testsuite/21_strings/basic_string/types/pmr_typedefs.cc:
Remove <memory_resource> header and check construction.
* testsuite/23_containers/deque/types/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/forward_list/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/list/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/map/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multimap/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multiset/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/set/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/unordered_map/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_set/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/vector/pmr_typedefs.cc: Likewise.
* testsuite/28_regex/match_results/pmr_typedefs.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic-tuple.C: Qualify function to avoid ADL
finding std::make_tuple.

PR middle-end/98865: Expand X*Y as X&-Y when Y is [0,1].

The patch is a revised solution for PR middle-end/98865 incorporating
the feedback/suggestions from Richard Biener's review here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593928.html
Most significantly, this patch now performs the transformation/optimization
during RTL expansion, where the target's rtx_costs can be used to determine
whether the original multiplication (that may potentially be implemented by
a shift or lea) is cheaper than a negation and a bit-wise and.

Previously the expression (x>>63)*y would be compiled with -O2 as
        shrq    $63, %rdi
        movq    %rdi, %rax
        imulq   %rsi, %rax

but with this patch now produces:
        sarq    $63, %rdi
        movq    %rdi, %rax
        andq    %rsi, %rax

Likewise the expression (x>>63)*135 [that appears in a hot-spot of the
Botan AES-128 benchmark] was previously:

        shrq    $63, %rdi
        leaq    (%rdi,%rdi,8), %rdx
        movq    %rdx, %rax
        salq    $4, %rax
        subq    %rdx, %rax

now becomes:
        movq    %rdi, %rax
        sarq    $63, %rax
        andl    $135, %eax

2022-05-19  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/98865
* expr.cc (expand_expr_real_2) [MULT_EXPR]:  Expand X*Y as X&Y
when both X and Y are [0, 1], X*Y as X&-Y when Y is [0,1] and
likewise X*Y as -X&Y when X is [0,1] using tree_nonzero_bits.

gcc/testsuite/ChangeLog
PR middle-end/98865
* gcc.target/i386/pr98865.c: New test case.

[PATCH, rs6000] Remove the (no longer used) BTC defines.

These defines are no longer used once the rs6000 built-in
reworks were completed.   Time to remove them.

There was a reference to RS6000_BTC_SPECIAL in a TODO comment
in rs6000-builtins.def.  That comment remains, but I have updated
the comment to refer to "SPECIAL" processing, instead of having it
refer directly to the _BTC_SPECIAL macro.

2022-05-18  Will Schmidt  <will_schmidt@vnet.ibm.com>

gcc/
* config/rs6000/rs6000-builtins.def: Rephrase
to remove RS6000_BTC_SPECIAL from comment.
* config/rs6000/rs6000.h (RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
RS6000_BTM_ALWAYS): Delete.

libstdc++: Implement LWG 3683 for pmr::polymorphic_allocator

This issue has recently been moved to Tentatively Ready, and seems
uncontroversial. This allows equality comparison with types that are
convertible to pmr::polymorphic_allocator, which fail deduction for the
existing equality operator.

libstdc++-v3/ChangeLog:

* include/std/memory_resource (polymorphic_allocator): Add
non-template equality operator, as proposed for LWG 3683.
* testsuite/20_util/polymorphic_allocator/lwg3683.cc: New test.

Fix OMP CAS expansion with separate condition

When forcing the condition to be split out from COND_EXPRs I see
a runtime failure of libgomp.fortran/atomic-19.f90 which can be
reduced to

  !$omp atomic update, compare, capture
  if (x == 69_2 - r) x = 6_8
    v = x

being miscompiled, the difference being

-  _13 = .ATOMIC_COMPARE_EXCHANGE (_9, _10, _11, 4, 0, 0);
-  _14 = IMAGPART_EXPR <_13>;
-  _15 = REALPART_EXPR <_13>;
-  _16 = _14 != 0 ? _11 : _15;
-  _2 = (integer(kind=4)) _16;
-  v_17 = _2;
+  _14 = .ATOMIC_COMPARE_EXCHANGE (_10, _11, _12, 4, 0, 0);
+  _15 = IMAGPART_EXPR <_14>;
+  _16 = REALPART_EXPR <_14>;
+  _2 = (logical(kind=1)) _15;
+  _3 = (integer(kind=4)) _16;
+  v_17 = _3;

where one can see a missing COND_EXPR.  It seems to be a latent
issue to me given the code can be exercised, it just maybe misses
a 'need_new' testcase combined with 'cond_stmt'.  Appearantly
the if (cond_stmt) code is just to avoid creating a temporary
(and possibly to preserve the condition compute if used elsewhere
since the original stmt is going to be deleted).  The following
makes the failure go away for me in my patched tree and it
also survives libgomp and gomp testing in an unpatched tree.

2022-05-13  Richard Biener  <rguenther@suse.de>

* omp-expand.cc (expand_omp_atomic_cas): Do not short-cut
computation of the new value.

Remove get_or_alloc_expression_id

This function is no longer needed.

2022-05-19 Richard Biener <rguenther@suse.de>

* tree-ssa-pre.cc (get_or_alloc_expression_id): Remove.
(add_to_value): Use get_expression_id.
(bitmap_insert_into_set): Likewise.
(bitmap_value_insert_into_set): Likewise.

[Ada] Avoid copy operation for returns involving function calls

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Constant>: Deal with
a constant related to a return in a function specially.
* gcc-interface/trans.cc (Call_to_gnu): Use return slot optimization
if the target is a return object.
(gnat_to_gnu) <N_Object_Declaration>: Deal with a constant related
to a return in a function specially.

[Ada] Do not give warnings for compiler-generated entities by default

The rationale is that these entities are almost always the result of
expansion activities in the front-end, over which the user has very
limited control. These warnings can be restored by means of -gnatD.

gcc/ada/

* gcc-interface/utils.cc (gnat_pushdecl): Also set TREE_NO_WARNING
on the decl if Comes_From_Source is false for the associated node.

[Ada] Small housekeeping work in gnat_gimplify_expr

This alphabetizes the large switch statement, removes a useless nested
switch statement, an artificial fall through and adds a default return.

No functional changes.

gcc/ada/

* gcc-interface/trans.cc (gnat_gimplify_expr): Tidy up.

[Ada] Add support for "simd" function attribute

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Function>: Also call
process_attributes for built-in functions.
(prepend_one_attribute_pragma): Deal with "simd" attribute.
* gcc-interface/utils.cc (handle_simd_attribute): New function.
(gnat_internal_attribute_table): Add entry for "simd" attribute.

[Ada] Fix internal error on unchecked union with component clauses (2)

The issue arises when the unchecked union contains both a fixed part and
a variant part, and is subject to a full representation clause covering
all the components in all the variants, when the component clauses do not
align the variant boundaries with byte boundaries consistently.

gcc/ada/

* gcc-interface/decl.cc (components_to_record): Use NULL recursively
as P_GNU_REP_LIST for the innermost variant level in the unchecked
union case with a fixed part.

[Ada] Do not set Current_Error_Node to a node without location

The message "No source file position information available" is displayed
in the bugbox when Current_Error_Node has no location, which is useless.

gcc/ada/

* gcc-interface/trans.cc (gnat_to_gnu): Do not set Current_Error_Node
to a node without location.

[Ada] Fix internal error on semi-circular record types

The front-end properly computes a linear elaboration order for them, but
there was a loophole in the handling of the delayed case.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Access_Subtype>: And
skip the elaboration of the designated subtype when that of its base
type has been delayed.

[Ada] Fix for internal error on semi-circular record aggregate

This creates a couple of record subtypes pointing to each other through
access subtypes, and we break the circularity at the latter subtypes.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Record_Subtype>: If
it is a special subtype designated by an access subtype, then defer
the completion of incomplete types.

[Ada] Adjust copyright line

gcc/ada/

* gcc-interface/ada-tree.h, gcc-interface/ada.h,
gcc-interface/gadaint.h, gcc-interface/targtyps.cc: Adjust
copyright line.

[Ada] Preserve unchecked conversion of string constant

This makes it possible to pass the result to a C function directly.

gcc/ada/

* gcc-interface/utils.cc (unchecked_convert): Do not fold a string
constant if the target type is pointer to character.

[Ada] Remove redundant marking of illegal pragma with error posted

We flag illegal pragma Elaborate with a call to Error_Msg on the pragma
argument, which in turn calls Set_Error_Posted on the enclosing
statement, i.e. on the pragma itself. The explicit call to
Set_Error_Posted on the pragma itself was redundant.

Cleanup related to handling of illegal code when detecting uninitialized
scalar objects.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Remove redundant call to
Set_Error_Posted.

[Ada] Fix continuation message without a prior error

When resolution of an expanded name fails, we call routine
Error_Missing_With_Of_Known_Unit which emits an error continuation
message (i.e. an error string starting with \\). However, for error
continuations to work properly there must be some prior error, because
continuation itself doesn't set flags like Serious_Errors_Detected.

Without these flags the problematic statement is not marked with
Error_Posted, which in turn is needed to prevent cascaded errors.

In particular, when unresolved procedure call uses a direct name or an
extended name with an unknown prefix, e.g.:

Unknown (1, 2, 3);
Unknown.Call (1, 2, 3);

then the N_Procedure_Call statements are marked with Error_Posted. But
when a call uses an extended name with a known prefix we failed to flag
the N_Procedure_Call with Error_Posted.

Found while improving the robustness of a feature that detects
uninitialized scalar objects.

gcc/ada/

* sem_ch8.adb (Find_Expanded_Name): Emit a main error message
before adding a continuation with the call to
Error_Missing_With_Of_Known_Unit.

[Ada] Mark Requires_Transient_Scope as Inline

The predicate is now a simple disjunction of two other predicates.

gcc/ada/

* sem_util.ads (Requires_Transient_Scope): Add pragma Inline.

[Ada] Avoid internal compiler error for illegal Predicate_Failure aspect spec

gcc/ada/

* sem_ch13.adb (Build_Predicate_Functions): If a semantic error
has been detected then ignore Predicate_Failure aspect
specifications in the same way as is done for CodePeer and
SPARK. This avoids an internal compiler error if
Ancestor_Predicate_Function_Called is True but Result_Expr is
not an N_And_Then node (and is therefore unsuitable as an
argument in a call to Left_Opnd).

[Ada] Fix spurious violations of No_Secondary_Stack restriction

Now that finalization and return on the secondary stack are decoupled, the
transient scopes created because of the former need not necessarily manage
the secondary stack and trigger a violation of the associated restriction.

gcc/ada/

* exp_ch7.adb (Wrap_Transient_Declaration): Propagate Uses_Sec_Stack
to enclosing function if it does not return on the secondary stack.
* exp_ch6.adb (Expand_Call_Helper): Call Establish_Transient_Scope
with Manage_Sec_Stack set to True only when necessary.
* sem_res.adb (Resolve_Call): Likewise.
(Resolve_Entry_Call): Likewise.

[Ada] Ignore Predicate_Failure in CodePeer mode

gcc/ada/

* sem_ch13.adb (Build_Predicate_Function): Ignore predicate
failure in CodePeer mode.

[Ada] Fix compilation of raise-gcc.c with -DSTANDALONE under windows

This is needed in particular by GNAT LLVM builds.

gcc/ada/

* raise-gcc.c: Fix compilation with -DSTANDALONE under windows.

[Ada] Preserve and reuse original type in Narrow_Large_Operation

Instead of using that of Original_Node (N) after rewriting, which does not
work if N had previously been rewritten.

gcc/ada/

* exp_ch4.adb (Narrow_Large_Operation): Preserve and reuse Etype.