]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
11 months agoipa: Don't disable function parameter analysis for fat LTO
H.J. Lu [Tue, 27 Aug 2024 20:11:39 +0000 (13:11 -0700)] 
ipa: Don't disable function parameter analysis for fat LTO

Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects.  Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

PR ipa/116410
* ipa-modref.cc (analyze_parms): Always analyze function parameter
for LTO.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
11 months ago[PR target/115921] Improve reassociation for rv64
Jeff Law [Tue, 3 Sep 2024 12:45:30 +0000 (06:45 -0600)] 
[PR target/115921] Improve reassociation for rv64

As Jovan pointed out in pr115921, we're not reassociating expressions like this
on rv64:

(x & 0x3e) << 12

It generates something like this:

        li      a5,258048
        slli    a0,a0,12
        and     a0,a0,a5

We have a pattern that's designed to clean this up.  Essentially reassociating
the operations so that we don't need to load the constant resulting in
something like this:

        andi    a0,a0,63
        slli    a0,a0,12

That pattern wasn't working for certain constants due to its condition. The
condition is trying to avoid cases where this kind of reassociation would
hinder shadd generation on rv64.  That condition was just written poorly.

This patch tightens up that condition in a few ways.  First, there's no need to
worry about shadd cases if ZBA is not enabled.  Second we can't use shadd if
the shift value isn't 1, 2 or 3.  Finally rather than open-coding one of the
tests, we can use an existing operand predicate.

The net is we'll start performing this transformation in more cases on rv64
while still avoiding reassociation if it would spoil shadd generation.

PR target/115921
gcc/
* config/riscv/riscv.md (reassociate bitwise ops): Tighten test for
cases we do not want reassociate.

gcc/testsuite/
* gcc.target/riscv/pr115921.c: New test.

11 months agoZen5 tuning part 1: avoid FMA chains
Jan Hubicka [Tue, 3 Sep 2024 11:38:33 +0000 (13:38 +0200)] 
Zen5 tuning part 1: avoid FMA chains

testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.

While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths.  This may be different
with the mobile version though.

On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:

real_t s323(struct args_t * func_args)
{

//    recurrences
//    coupled recurrence

    initialise_arrays(__func__);
    gettimeofday(&func_args->t1, NULL);

    for (int nl = 0; nl < iterations/2; nl++) {
        for (int i = 1; i < LEN_1D; i++) {
            a[i] = b[i-1] + c[i] * d[i];
            b[i] = a[i] + c[i] * e[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }

    gettimeofday(&func_args->t2, NULL);
    return calc_checksum(__func__);
}

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
znver5.
(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.

11 months agoLTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]
Tobias Burnus [Tue, 3 Sep 2024 10:02:23 +0000 (12:02 +0200)] 
LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]

When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming
sufficient partiations, e.g., via -flto-partition=max), output_offload_tables
wrote the output tables once per fork.

PR lto/116535

gcc/ChangeLog:

* lto-cgraph.cc (output_offload_tables): Remove offload_ frees.
* lto-streamer-out.cc (lto_output): Make call to it depend on
lto_get_out_decl_state ()->output_offload_tables_p.
* lto-streamer.h (struct lto_out_decl_state): Add
output_offload_tables_p field.
* tree-pass.h (ipa_write_optimization_summaries): Add bool argument.
* passes.cc (ipa_write_summaries_1): Add bool
output_offload_tables_p arg.
(ipa_write_summaries): Update call.
(ipa_write_optimization_summaries): Accept output_offload_tables_p.

gcc/lto/ChangeLog:

* lto.cc (stream_out): Update call to
ipa_write_optimization_summaries to pass true for first partition.

11 months agoMAINTAINERS: Update my email address
Szabolcs Nagy [Mon, 2 Sep 2024 12:53:52 +0000 (13:53 +0100)] 
MAINTAINERS: Update my email address

* MAINTAINERS: Update my email address and add myself to DCO.

11 months agotree-optimization/116575 - avoid ICE with SLP mask_load_lane
Richard Biener [Tue, 3 Sep 2024 07:23:20 +0000 (09:23 +0200)] 
tree-optimization/116575 - avoid ICE with SLP mask_load_lane

The following avoids performing re-discovery with single lanes in
the attempt to for the use of mask_load_lane as rediscovery will
fail since a single lane of a mask load will appear permuted which
isn't supported.

PR tree-optimization/116575
* tree-vect-slp.cc (vect_analyze_slp): Properly compute
the mask argument for vect_load/store_lanes_supported.
When the load is masked for now avoid rediscovery.

* gcc.dg/vect/pr116575.c: New testcase.

11 months agoi386: Fix vfpclassph non-optimizied intrin
Haochen Jiang [Mon, 2 Sep 2024 07:00:22 +0000 (15:00 +0800)] 
i386: Fix vfpclassph non-optimizied intrin

The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.

The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. That problem also happened in AVX10.2 testcases.
I will write a seperate patch to fix that.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.

11 months agoDo not assert NUM_POLY_INT_COEFFS != 1 early
Richard Biener [Tue, 3 Sep 2024 08:40:41 +0000 (10:40 +0200)] 
Do not assert NUM_POLY_INT_COEFFS != 1 early

The following moves the assert on NUM_POLY_INT_COEFFS != 1 after
INTEGER_CST processing.

* fold-const.cc (poly_int_binop): Move assert on
NUM_POLY_INT_COEFFS after INTEGER_CST processing.

11 months agolower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering [PR116501]
Jakub Jelinek [Tue, 3 Sep 2024 08:20:44 +0000 (10:20 +0200)] 
lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering [PR116501]

The following testcase is miscompiled.  The problem is in the last_ovf step.
The second operand has signed _BitInt(513) type but has the MSB clear,
so range_to_prec returns 512 for it (i.e. it fits into unsigned
_BitInt(512)).  Because of that the last step actually doesn't need to get
the most significant bit from the second operand, but the code was deciding
what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0,
otherwise sign-extend the last processed bit; but that in this case was set.
We don't want to treat the positive operand as if it was negative regardless
of the bit below that precision, and precN >= 0 indicates that the operand
is in the [0, inf) range.

2024-09-03  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/116501
* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
In the last_ovf case, use build_zero_cst operand not just when
TYPE_UNSIGNED (typeN), but also when precN >= 0.

* gcc.dg/torture/bitint-73.c: New test.

11 months agoada: Add kludge for quirk of ancient 32-bit ABIs to previous change
Eric Botcazou [Fri, 23 Aug 2024 15:06:00 +0000 (17:06 +0200)] 
ada: Add kludge for quirk of ancient 32-bit ABIs to previous change

Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double
scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly
may give the wrong answer for them.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Add kludge
to cope with ancient 32-bit ABIs.

11 months agoada: Plug loophole exposed by previous change
Eric Botcazou [Fri, 23 Aug 2024 07:44:06 +0000 (09:44 +0200)] 
ada: Plug loophole exposed by previous change

The change causes more temporaries to be created at call sites for unaligned
actual parameters, thus revealing that the machinery does not properly deal
with unconstrained nominal subtypes for them.

gcc/ada/

* gcc-interface/trans.cc (create_temporary): Deal with types whose
size is self-referential by allocating the maximum size.

11 months agoada: Fix internal error with Atomic Volatile_Full_Access object
Eric Botcazou [Thu, 22 Aug 2024 19:18:15 +0000 (21:18 +0200)] 
ada: Fix internal error with Atomic Volatile_Full_Access object

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, unfortunately without adjusting the implementation.

gcc/ada/

* gcc-interface/trans.cc (get_atomic_access): Deal specifically with
nodes that are both Atomic and Volatile_Full_Access in Ada 2012.

11 months agoada: Pass unaligned record components by copy in calls on all platforms
Eric Botcazou [Tue, 20 Aug 2024 20:59:58 +0000 (22:59 +0200)] 
ada: Pass unaligned record components by copy in calls on all platforms

This has historically been done only on platforms requiring the strict
alignment of memory references, but this can arguably be considered as
being mandated by the language on all of them.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Take into
account the alignment of the field on all platforms.

11 months agoada: Fix internal error on pragma pack with discriminated record component
Eric Botcazou [Tue, 20 Aug 2024 08:34:45 +0000 (10:34 +0200)] 
ada: Fix internal error on pragma pack with discriminated record component

When updating the size after making a packable type in gnat_to_gnu_field,
we fail to clear it again when it is not constant.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size
after updating it if it is not constant.

11 months agoada: Simplify Note_Uplevel_Bound procedure
Marc Poulhiès [Fri, 9 Aug 2024 16:08:01 +0000 (18:08 +0200)] 
ada: Simplify Note_Uplevel_Bound procedure

The procedure Note_Uplevel_Bound was implemented as a custom expression
tree walk. This change replaces this custom tree traversal by a more
idiomatic use of Traverse_Proc.

gcc/ada/

* exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor
to use the generic Traverse_Proc.
(Check_Static_Type): Adjust calls to Note_Uplevel_Bound as the
previous second parameter was unused, so removed.

11 months agoada: Transform Length attribute references for non-Strict overflow mode.
Steve Baird [Wed, 21 Aug 2024 00:35:24 +0000 (17:35 -0700)] 
ada: Transform Length attribute references for non-Strict overflow mode.

The non-strict overflow checking code does a better job of eliminating
overflow checks if given an expression consisting only of predefined
operators (including relationals), literals, identifiers, and conditional
expressions. If it is both feasible and useful, rewrite a
Length attribute reference as such an expression. "Feasible" means
"index type is same type as attribute reference type, so we can rewrite without
using type conversions". "Useful" means "Overflow_Mode is something other than
Strict, so there is value in making overflow check elimination easier".

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference): If it makes sense
to do so, then rewrite a Length attribute reference as an
equivalent conditional expression.

11 months agoada: Do not warn for partial access to Atomic Volatile_Full_Access objects
Eric Botcazou [Thu, 22 Aug 2024 20:54:02 +0000 (22:54 +0200)] 
ada: Do not warn for partial access to Atomic Volatile_Full_Access objects

The initial implementation of the GNAT aspect/pragma Volatile_Full_Access
made it incompatible with Atomic, because it was not decided whether the
read-modify-write sequences generated by Volatile_Full_Access would need
to be implemented atomically when Atomic was also specified, which would
have required a compare-and-swap primitive from the target architecture.

But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic
in the process, answering the above question by the negative, so the
incompatibility between Volatile_Full_Access and Atomic was lifted in
Ada 2012 as well, but the implementation was not entirely adjusted.

In Ada 2012, it does not make sense to warn for the partial access to an
Atomic object if the object is also declared Volatile_Full_Access, since
the object will be accessed as a whole in this case (like in Ada 2022).

gcc/ada/

* sem_res.adb (Is_Atomic_Ref_With_Address): Rename into...
(Is_Atomic_Non_VFA_Ref_With_Address): ...this and adjust the
implementation to exclude Volatile_Full_Access objects.
(Resolve_Indexed_Component): Adjust to above renaming.
(Resolve_Selected_Component): Likewise.

11 months agoada: Reject illegal array aggregates as per AI22-0106.
Steve Baird [Mon, 19 Aug 2024 21:58:38 +0000 (14:58 -0700)] 
ada: Reject illegal array aggregates as per AI22-0106.

Implement the new legality rules of AI22-0106 which (as discussed in the AI)
are needed to disallow constructs whose semantics would otherwise be poorly
defined.

gcc/ada/

* sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new
legality rules of AI11-0106. Add code to avoid cascading error
messages.

11 months agoada: Fix Finalize_Storage_Only bug in b-i-p calls
Bob Duff [Thu, 22 Aug 2024 16:32:00 +0000 (12:32 -0400)] 
ada: Fix Finalize_Storage_Only bug in b-i-p calls

Do not pass null for the Collection parameter when
Finalize_Storage_Only is in effect. If the collection
is null in that case, we will blow up later when we
deallocate the object.

gcc/ada/

* exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call):
Remove Finalize_Storage_Only from the code that checks whether to
pass null to the Collection parameter. Having done that, we don't
need to check for Is_Library_Level_Entity, because
No_Heap_Finalization requires that. And if we ever change
No_Heap_Finalization to allow nested access types, we will still
want to pass null. Note that the comment "Such a type lacks a
collection." is incorrect in the case of Finalize_Storage_Only;
such types have a collection.

11 months agoSVE intrinsics: Fold constant operands for svmul.
Jennifer Schmitz [Fri, 30 Aug 2024 14:16:43 +0000 (07:16 -0700)] 
SVE intrinsics: Fold constant operands for svmul.

This patch implements constant folding for svmul by calling
gimple_folder::fold_const_binary with tree_code MULT_EXPR.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svmul_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
Try constant folding.

gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_mul_1.c: New test.

11 months agoSVE intrinsics: Fold constant operands for svdiv.
Jennifer Schmitz [Fri, 30 Aug 2024 14:03:49 +0000 (07:03 -0700)] 
SVE intrinsics: Fold constant operands for svdiv.

This patch implements constant folding for svdiv:
The new function aarch64_const_binop was created, which - in contrast to
int_const_binop - does not treat operations as overflowing. This function is
passed as callback to vector_const_binop from the new gimple_folder
method fold_const_binary, if the predicate is ptrue or predication is _x.
From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as
tree_code.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Try constant folding.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_const_binary.
* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
New function to fold binary SVE intrinsics without overflow.
(gimple_folder::fold_const_binary): New helper function for
constant folding of SVE intrinsics.

gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_div_1.c: New test.

11 months agoSVE intrinsics: Refactor const_binop to allow constant folding of intrinsics.
Jennifer Schmitz [Fri, 30 Aug 2024 13:56:52 +0000 (06:56 -0700)] 
SVE intrinsics: Refactor const_binop to allow constant folding of intrinsics.

This patch sets the stage for constant folding of binary operations for SVE
intrinsics:
In fold-const.cc, the code for folding vector constants was moved from
const_binop to a new function vector_const_binop. This function takes a
function pointer as argument specifying how to fold the vector elements.
The intention is to call vector_const_binop from the backend with an
aarch64-specific callback function.
The code in const_binop for folding operations where the first operand is a
vector constant and the second argument is an integer constant was also moved
into vector_const_binop to to allow folding of binary SVE intrinsics where
the second operand is an integer (_n).
To allow calling poly_int_binop from the backend, the latter was made public.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* fold-const.h: Declare vector_const_binop.
* fold-const.cc (const_binop): Remove cases for vector constants.
(vector_const_binop): New function that folds vector constants
element-wise.
(int_const_binop): Remove call to wide_int_binop.
(poly_int_binop): Add call to wide_int_binop.

11 months agoHandle mixing REALPART/IMAGPART with other components in SLP groups
Richard Biener [Mon, 2 Sep 2024 13:12:58 +0000 (15:12 +0200)] 
Handle mixing REALPART/IMAGPART with other components in SLP groups

The following makes sure we handle a SLP load/store group from
a structure with complex and scalar members.  This for example
happens in gcc.target/i386/pr106010-9a.c.

* tree-vect-slp.cc (vect_build_slp_tree_1): Handle mixing
all of handled components besides ARRAY_RANGE_REF, drop
handling of INDIRECT_REF.

11 months agoCorrectly handle store IFNs in vect_get_vector_types_for_stmt
Richard Biener [Mon, 2 Sep 2024 09:16:12 +0000 (11:16 +0200)] 
Correctly handle store IFNs in vect_get_vector_types_for_stmt

Currently vect_get_vector_types_for_stmt only special-cases
IFN_MASK_STORE but there are now very many variants and simply
passing analysis without setting *VECTYPE will ICE duing SLP
discovery (noticed with IFN_SCATTER_STORE).  The following
properly uses internal_store_fn_p.  I also noticed we're
unnecessarily handing those again to determine the scalar type
but there should always be a data reference for them.

* tree-vect-stmts.cc (vect_get_vector_types_for_stmt):
Handle all internal_store_fn_p the same.  Remove special-casing
for the scalar_type of IFN_MASK_STORE.

11 months agoi386: Support partial vectorized V2BF/V4BF smaxmin
Levy Hsu [Tue, 27 Aug 2024 04:52:20 +0000 (14:22 +0930)] 
i386: Support partial vectorized V2BF/V4BF smaxmin

This patch supports sminmax for partial vectorized V2BF/V4BF.

gcc/ChangeLog:

* config/i386/mmx.md (<code><mode>3): New define_expand for V2BF/V4BFsmaxmin

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: New test.

11 months agoi386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt
Levy Hsu [Mon, 26 Aug 2024 01:16:30 +0000 (10:46 +0930)] 
i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions.

gcc/ChangeLog:

* config/i386/mmx.md (VBF_32_64): New mode iterator for partial vectorized V2BF/V4BF.
(<insn><mode>3): New define_expand for plusminusmultdiv.
(sqrt<mode>2): New define_expand for sqrt.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: New test.
* gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: New test.

11 months agoRISC-V: Support form 1 of integer scalar .SAT_ADD
Pan Li [Thu, 29 Aug 2024 03:25:44 +0000 (11:25 +0800)] 
RISC-V: Support form 1 of integer scalar .SAT_ADD

This patch would like to support the scalar signed ssadd pattern
for the RISC-V backend.  Aka

Form 1:
  #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))                  \
  sat_s_add_##T##_fmt_1 (T x, T y)             \
  {                                            \
    T sum = (UT)x + (UT)y;                     \
    return (x ^ y) < 0                         \
      ? sum                                    \
      : (sum ^ x) >= 0                         \
        ? sum                                  \
        : x < 0 ? MIN : MAX;                   \
  }

DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

Before this patch:
  10   â”‚ sat_s_add_int64_t_fmt_1:
  11   â”‚     mv   a5,a0
  12   â”‚     add  a0,a0,a1
  13   â”‚     xor  a1,a5,a1
  14   â”‚     not  a1,a1
  15   â”‚     xor  a4,a5,a0
  16   â”‚     and  a1,a1,a4
  17   â”‚     blt  a1,zero,.L5
  18   â”‚     ret
  19   â”‚ .L5:
  20   â”‚     srai a5,a5,63
  21   â”‚     li   a0,-1
  22   â”‚     srli a0,a0,1
  23   â”‚     xor  a0,a5,a0
  24   â”‚     ret

After this patch:
  10   â”‚ sat_s_add_int64_t_fmt_1:
  11   â”‚     add  a2,a0,a1
  12   â”‚     xor  a1,a0,a1
  13   â”‚     xor  a5,a0,a2
  14   â”‚     srli a5,a5,63
  15   â”‚     srli a1,a1,63
  16   â”‚     xori a1,a1,1
  17   â”‚     and  a5,a5,a1
  18   â”‚     srai a4,a0,63
  19   â”‚     li   a3,-1
  20   â”‚     srli a3,a3,1
  21   â”‚     xor  a3,a3,a4
  22   â”‚     neg  a4,a5
  23   â”‚     and  a3,a3,a4
  24   â”‚     addi a5,a5,-1
  25   â”‚     and  a0,a2,a5
  26   â”‚     or   a0,a0,a3
  27   â”‚     ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func
decl for expanding ssadd.
* config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func
impl to gen the max int rtx.
(riscv_expand_ssadd): Add new func impl to expand the ssadd.
* config/riscv/riscv.md (ssadd<mode>3): Add new pattern for
signed integer .SAT_ADD.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_arith_data.h: Add test data.
* gcc.target/riscv/sat_s_add-1.c: New test.
* gcc.target/riscv/sat_s_add-2.c: New test.
* gcc.target/riscv/sat_s_add-3.c: New test.
* gcc.target/riscv/sat_s_add-4.c: New test.
* gcc.target/riscv/sat_s_add-run-1.c: New test.
* gcc.target/riscv/sat_s_add-run-2.c: New test.
* gcc.target/riscv/sat_s_add-run-3.c: New test.
* gcc.target/riscv/sat_s_add-run-4.c: New test.
* gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 months agoDaily bump.
GCC Administrator [Tue, 3 Sep 2024 00:21:29 +0000 (00:21 +0000)] 
Daily bump.

11 months agoMIPS: Support vector reduc for MSA
YunQiang Su [Mon, 26 Aug 2024 00:45:36 +0000 (08:45 +0800)] 
MIPS: Support vector reduc for MSA

We have SHF.fmt and HADD_S/U.fmt with MSA, which can be used for
vector reduc.

For min/max for U8/S8, we can
SHF.B W1, W0, 0xb1  # swap byte inner every half
MIN.B W1, W1, W0
SHF.H W2, W1, 0xb1  # swap half inner every word
MIN.B W2, W2, W1
SHF.W W3, W2, 0xb1  # swap word inner every doubleword
MIN.B W4, W3, W2
SHF.W W4, W4, 0x4e  # swap the two doubleword
MIN.B W4, W4, W3

For plus of S8/U8, we can use HADD
HADD.H W0, W0, W0
HADD.W W0, W0, W0
HADD.D W0, W0, W0
SHF.W W1, W0, 0x4e  # swap the two doubleword
ADDV.D W1, W1, W0
COPY_S.B  T0, W1      # COPY_U.B for U8

We can do similar for S16/U16/S32/U32/S64/U64/FLOAT/DOUBLE.

gcc

* config/mips/mips-msa.md: (MSA_NO_HADD): we have HADD for
S8/U8/S16/U16/S32/U32 only.
(reduc_smin_scal_<mode>): New define pattern.
(reduc_smax_scal_<mode>): Ditto.
(reduc_umin_scal_<mode>): Ditto.
(reduc_umax_scal_<mode>): Ditto.
(reduc_plus_scal_<mode>): Ditto.
(reduc_plus_scal_v4si): Ditto.
(reduc_plus_scal_v8hi): Ditto.
(reduc_plus_scal_v16qi): Ditto.
(reduc_<optab>_scal_<mode>): Ditto.
* config/mips/mips-protos.h: New function mips_expand_msa_reduc.
* config/mips/mips.cc: New function mips_expand_msa_reduc.
* config/mips/mips.md: Define any_bitwise iterator.

gcc/testsuite:

* gcc.target/mips/msa-reduc.c: New tests.

11 months agotestsuite: Fix optimize_one.c FAIL on i686-linux
Jakub Jelinek [Mon, 2 Sep 2024 18:14:49 +0000 (20:14 +0200)] 
testsuite: Fix optimize_one.c FAIL on i686-linux

The test FAILs on i686-linux because -mfpmath=sse is used without
-msse2 being enabled.

2024-09-02  Jakub Jelinek  <jakub@redhat.com>

* gcc.target/i386/optimize_one.c: Add -msse2 to dg-options.

11 months ago[libstdc++-v3] [testsuite] improve future/*/poll.cc calibration
Alexandre Oliva [Mon, 2 Sep 2024 14:32:03 +0000 (11:32 -0300)] 
[libstdc++-v3] [testsuite] improve future/*/poll.cc calibration

30_threads/future/members/poll.cc has calibration code that, on
systems with very low clock resolution, may spuriously fail to run.
Even when it does run, low resolution and reasonable
timeouts limit severely the viability of increasing the loop counts so
as to reduce measurement noise, so we end up with very noisy results.

On various vxworks targets, high iteration count (low-noise)
measurements confirmed that some of the operations that we expected to
be up to 100x slower than the fastest ones can run a little slower
than that and, with significant noise, may seem to be even slower,
comparatively.

Bump the factors up to 200x, so that we have plenty of margin over
measured results.

for  libstdc++-v3/ChangeLog

* testsuite/30_threads/future/members/poll.cc: Factor out
calibration, and run it unconditionally.  Lower its
strictness.  Bump wait_until_*'s slowness factor.

11 months ago[libstdc++] [testsuite] avoid async.cc loss of precision [PR91486]
Alexandre Oliva [Mon, 2 Sep 2024 14:31:59 +0000 (11:31 -0300)] 
[libstdc++] [testsuite] avoid async.cc loss of precision [PR91486]

When we get to test_pr91486_wait_until(), we're about 10s past the
float_steady_clock epoch.  This is enough for the 1s delta for the
timeout to come out slightly lower when the futex-less wait_until
converts the deadline from float_steady_clock to __clock_t.  So we may
wake up a little too early, and end up looping one extra time to sleep
for e.g. another 954ns until we hit the deadline.

Each iteration calls float_steady_clock::now(), bumping the call_count
that we VERIFY() at the end of the subtest.  Since we expect at most 3
calls, and we're going to have at the very least 3 on futex-less
targets (one in the test proper, one before wait_until_impl to compute
the deadline, and one after wait_until_impl to check whether the
deadline was hit), any such imprecision that causes an extra iteration
will reach 5 and cause the test to fail.

Initializing the epoch in the beginning of the test makes such
spurious fails due to loss of precision far less likely.  I don't
suppose allowing for an extra couple of calls would be desirable.

While at that, I'm annotating unused status variables as such.

for  libstdc++-v3/ChangeLog

PR libstdc++/91486
* testsuite/30_threads/async/async.cc
(test_pr91486_wait_for): Mark status as unused.
(test_pr91486_wait_until): Likewise.  Initialize epoch later.

11 months ago[testsuite] add linkonly to dg-additional-sources [PR115295]
Alexandre Oliva [Mon, 2 Sep 2024 14:31:51 +0000 (11:31 -0300)] 
[testsuite] add linkonly to dg-additional-sources [PR115295]

The D testsuite shows it was a mistake to assume that
dg-additional-sources are never to be used for compilation tests.
Even if an output file is specified for compilation, extra module
files can be named and used in the compilation without being flagged
as errors.

Introduce a 'linkonly' flag for dg-additional-sources, and use it in
pr95401.cc and other vector tests that default to run, so that its
additional sources get discarded when vector tests downgrade to
compile-only.  This reverts previous workarounds for this very
circumstance, that relied on being able to run vector tests anyway,
even after failing to detect runtime or hardware vector support.

for  gcc/ChangeLog

PR d/115295
* doc/sourcebuild.texi (dg-additional-sources): Add linkonly.

for  gcc/testsuite/ChangeLog

PR d/115295
* g++.dg/vect/pr95401.cc: Add linkonly to dg-additional-sources.
* g++.dg/vect/pr68762-1.cc: Likewise.
* g++.dg/vect/simd-clone-3.cc: Likewise.
* g++.dg/vect/simd-clone-5.cc: Likewise.
* gcc.dg/vect/vect-simd-clone-10.c: Likewise.  Drop dg-do run.
* gcc.dg/vect/vect-simd-clone-12.c: Likewise.  Likewise.
* lib/gcc-defs.exp (additional_sources_omit_on_compile): New.
(dg-additional-sources): Add to it on linkonly.
(dg-additional-files-options): Omit select sources on compile.

11 months agoamdgcn: Remove TARGET_GCN5_PLUS
Andrew Stubbs [Tue, 6 Aug 2024 16:00:21 +0000 (16:00 +0000)] 
amdgcn: Remove TARGET_GCN5_PLUS

Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so
we can make that code unconditional, and remove all the "else" cases.

The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS,
TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also
redundant and can be made unconditional.

The naming of the "gcc_version" attribute has been confusing since the "rdna"
attribute was added and this makes it worse, so it has been renamed to "cdna".

The add-with-carry assembler mnemonics no longer have two forms, so '%^' can be
removed.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (TARGET_GCN5_PLUS): Delete.
(TARGET_GLOBAL_ADDRSPACE): Delete.
(TARGET_FLAT_OFFSETS): Delete.
(TARGET_EXPLICIT_CARRY): Delete.
(TARGET_MULTIPLY_IMMEDIATE): Delete.
* config/gcn/gcn-valu.md (*mov<mode>): Rename "gcn_version" to "cdna".
(*mov<mode>_4reg): Likewise.
(@mov<mode>_sgprbase): Likwise.
(gather<mode>_insn_1offset<exec>): Likewise.
(gather<mode>_insn_1offset_ds<exec>): Likewise.
(gather<mode>_insn_2offsets<exec>): Likewise.
(scatter<mode>_insn_1offset<exec_scatter>): Likewise.
(scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise.
(scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
(gather<mode>_insn_1offset<exec>): Remove TARGET_FLAT_OFFSETS
conditionals.
(scatter<mode>_insn_1offset<exec_scatter>): Likewise.
(scatter<mode>_insn_1offset<exec_scatter>): Likewise.
(add<mode>3<exec_clobber>): Use "_co" instead of "%^".
(add<mode>3_dup<exec_clobber>): Likewise.
(add<mode>3_vcc<exec_vcc>): Likewise.
(add<mode>3_vcc_dup<exec_vcc>): Likewise.
(addc<mode>3<exec_vcc>): Likewise.
(sub<mode>3<exec_clobber>): Likewise.
(sub<mode>3_vcc<exec_vcc>): Likewise.
(subc<mode>3<exec_vcc>): Likewise.
(*plus_carry_dpp_shr_<mode>): Likewise.
(*plus_carry_in_dpp_shr_<mode>): Likewise.
* config/gcn/gcn.cc (gcn_flat_address_p): Remove TARGET_FLAT_OFFSETS
conditionals.
(gcn_addr_space_legitimate_address_p): Likewise.
(gcn_addr_space_legitimize_address): Likewise.
(gcn_expand_scalar_to_vector_address): Likewise.
(print_operand_address): Likewise, and TARGET_GLOBAL_ADDRSPACE also.
(print_operand): Remove "%^" operand code.
Remove TARGET_GLOBAL_ADDRSPACE assertion.
* config/gcn/gcn.h (STACK_ADDR_SPACE): Remove GCN5 conditional.
* config/gcn/gcn.md (gcn_version): Rename attribute ...
(cdna): ... to this, and remove the gcn3 and gcn5 values.
(enabled): Replace old "gcn_version" logic with new "cdna" logic.
(*mov<mode>_insn): Rename "gcn_version" to "cdna".
(*movti_insn): Likewise.
(addsi3): Use "_co" instead of "%^".
(addsi3_scalar_carry): Likewise.
(addsi3_scalar_carry_cst): Likewise.
(addcsi3_scalar): Likewise.
(addcsi3_scalar_zero): Likewise.
(addptrdi3): Likewise.
(subsi3): Likewise.
(<su>mulsi3_highpart): Remove TARGET_MULTIPLY_IMMEDIATE conditions.
(<su>mulsi3_highpart_reg): Remove "gcn_version" attribute.
(muldi3): Likewise.
(atomic_fetch_<bare_mnemonic><mode>): Likewise.
(atomic_<bare_mnemonic><mode>): Likewise.
(sync_compare_and_swap<mode>_insn): Likewise.
(atomic_load<mode>): Likewise.
(atomic_store<mode>): Likewise.
(atomic_exchange<mode>): Likewise.
(<su>mulsi3_highpart_imm): Remove both TARGET_MULTIPLY_IMMEDIATE and
"gcn_version".
(<su>mulsidi3): Likewise.
(<su>mulsidi3_imm): Likewise.

11 months agoamdgcn: Remove TARGET_GCN3
Andrew Stubbs [Tue, 6 Aug 2024 15:37:36 +0000 (15:37 +0000)] 
amdgcn: Remove TARGET_GCN3

The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific
code and features can be removed from the back-end.

gcc/ChangeLog:

* config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3.
(TARGET_GCN3): Delete.
(TARGET_GCN3_PLUS): Delete.
(TARGET_M0_LDS_LIMIT): Delete.
* config/gcn/gcn-valu.md
(gather<mode>_insn_1offset<exec>): Remove TARGET_GCN3 from conditions.
(*<reduc_op>_dpp_shr_<mode>): Likewise.
* config/gcn/gcn.cc (enum gcn_isa): Change default to ISA_GCN5.
(gcn_expand_prologue): Remove TARGET_M0_LDS_LIMIT feature.
(gcn_expand_reduc_scalar): Remove TARGET_GCN3 conditions.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Remove TARGET_GCN3.

11 months agoamdgcn: remove gfx803 "Fiji" support
Andrew Stubbs [Mon, 5 Aug 2024 15:14:17 +0000 (15:14 +0000)] 
amdgcn: remove gfx803 "Fiji" support

The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and
hasn't worked properly with the drivers since about ROCm 4.

This patch removes the device from GCC options and documentation, and removes
the direct mentions from the internals.

The TARGET_GCN3 support in the back-end is now unused and can be removed (in a
follow-up patch).

gcc/ChangeLog:

* config.gcc (amdgcn-*-*): Remove "fiji" from with_arch checks.
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative.
(NO_XNACK): Likewise.
(NO_SRAM_ECC): Likewise.
(ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC.
* config/gcn/gcn-opts.h (enum processor_type): Remove PROCESSOR_FIJI.
(TARGET_FIJI): Delete.
* config/gcn/gcn.cc (gcn_option_override): Remove Fiji.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise.
* config/gcn/gcn.opt (gpu_type): Likewise.
(march, mtune): Change default to PROCESSOR_VEGA10.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete.
(copy_early_debug_info): Remove elf_flags_actual.
Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally.
(get_arch): Remove Fiji.
(main): Remove gfx803.
* config/gcn/t-omp-device
(omp-device-properties-gcn): Remove fiji and gfx803.
* doc/install.texi (amdgcn*-*-*): Remove fiji and special instructions.
* doc/invoke.texi: Remove fiji.

libgomp/ChangeLog:

* libgomp.texi: Remove fiji and gfx803.
* testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803.
* testsuite/libgomp.c/declare-variant-4-fiji.c: Removed.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed.

11 months agoPR modula2/116557 Remove physical address from the GPL header comment
Gaius Mulley [Mon, 2 Sep 2024 12:29:25 +0000 (13:29 +0100)] 
PR modula2/116557 Remove physical address from the GPL header comment

This patch removes the physical address from all the header comments
in the m2 subdirectory.  The physical address is replaced with the
text "You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>." instead.

gcc/m2/ChangeLog:

PR modula2/116557
* gm2-lang.cc: Replace physical address with URL in GPL header.
* gm2-lang.h: Ditto.
* images/LICENSE.IMG: Ditto.
* m2-tree.def: Ditto.
* mc-boot/GIndexing.cc: Ditto.
* mc-boot/Gkeyc.cc: Ditto.
* mc-boot/Glists.cc: Ditto.
* mc-boot/GmcComp.cc: Ditto.
* mc-boot/GmcDebug.cc: Ditto.
* mc-boot/GmcFileName.cc: Ditto.
* mc-boot/GmcMetaError.cc: Ditto.
* mc-boot/GmcOptions.cc: Ditto.
* mc-boot/GmcPreprocess.cc: Ditto.
* mc-boot/GmcPretty.cc: Ditto.
* mc-boot/GmcPrintf.cc: Ditto.
* mc-boot/GmcQuiet.cc: Ditto.
* mc-boot/GmcReserved.cc: Ditto.
* mc-boot/GmcSearch.cc: Ditto.
* mc-boot/GmcStack.cc: Ditto.
* mc/Indexing.mod: Ditto.
* mc/keyc.mod: Ditto.
* mc/lists.mod: Ditto.
* mc/mcComp.mod: Ditto.
* mc/mcDebug.mod: Ditto.
* mc/mcFileName.mod: Ditto.
* mc/mcMetaError.mod: Ditto.
* mc/mcOptions.mod: Ditto.
* mc/mcPreprocess.mod: Ditto.
* mc/mcPretty.mod: Ditto.
* mc/mcPrintf.mod: Ditto.
* mc/mcQuiet.mod: Ditto.
* mc/mcReserved.mod: Ditto.
* mc/mcSearch.mod: Ditto.
* mc/mcStack.mod: Ditto.
* tools-src/buildpg: Ditto.
* tools-src/calcpath: Ditto.
* tools-src/checkmeta.py: Ditto.
* tools-src/def2doc.py: Ditto.
* tools-src/makeSystem: Ditto.
* tools-src/tidydates.py: Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
11 months agolibsupc++: Fix handling of m68k extended real in <compare>
Andreas Schwab [Mon, 2 Sep 2024 08:43:20 +0000 (10:43 +0200)] 
libsupc++: Fix handling of m68k extended real in <compare>

PR libstdc++/116513
* libsupc++/compare (_S_fp_bits) [__fmt == _M68k_80bit]: Shift
padding out of exponent word.

11 months agotestsuite: Rename scanltranstree.exp -> scanltrans.exp
Alex Coplan [Wed, 28 Aug 2024 13:53:11 +0000 (14:53 +0100)] 
testsuite: Rename scanltranstree.exp -> scanltrans.exp

Since r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
added scan-ltrans-rtl* variants to scanltranstree.exp, it no longer
makes sense to have "tree" in the name.  This renames the file
accordingly and updates users.

libatomic/ChangeLog:

* testsuite/lib/libatomic.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libgomp/ChangeLog:

* testsuite/lib/libgomp.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libitm/ChangeLog:

* testsuite/lib/libitm.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libphobos/ChangeLog:

* testsuite/lib/libphobos-dg.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libvtv/ChangeLog:

* testsuite/lib/libvtv.exp: Load scanltrans.exp instead of
scanltranstree.exp.

gcc/testsuite/ChangeLog:

* gcc.dg-selftests/dg-final.exp: Load scanltrans.exp instead of
scanltranstree.exp.
* lib/gcc-dg.exp: Likewise.
* lib/scanltranstree.exp: Rename to ...
* lib/scanltrans.exp: ... this.

11 months agoRename gimple_asm_input_p to gimple_asm_basic_p
Richard Sandiford [Mon, 2 Sep 2024 08:56:56 +0000 (09:56 +0100)] 
Rename gimple_asm_input_p to gimple_asm_basic_p

Following on from the earlier tree rename, this patch renames
gimple_asm_input_p to gimple_asm_basic_p, and similarly for
related names.

gcc/
* doc/gimple.texi (gimple_asm_basic_p): Document.
(gimple_asm_set_basic): Likewise.
* gimple.h (GF_ASM_INPUT): Rename to...
(GF_ASM_BASIC): ...this.
(gimple_asm_set_input): Rename to...
(gimple_asm_set_basic): ...this.
(gimple_asm_input_p): Rename to...
(gimple_asm_basic_p): ...this.
* cfgexpand.cc (expand_asm_stmt): Update after above renaming.
* gimple.cc (gimple_asm_clobbers_memory_p): Likewise.
* gimplify.cc (gimplify_asm_expr): Likewise.
* ipa-icf-gimple.cc (func_checker::compare_gimple_asm): Likewise.
* tree-cfg.cc (stmt_can_terminate_bb_p): Likewise.

11 months agoRename ASM_INPUT_P to ASM_BASIC_P
Richard Sandiford [Mon, 2 Sep 2024 08:56:56 +0000 (09:56 +0100)] 
Rename ASM_INPUT_P to ASM_BASIC_P

ASM_INPUT_P is so named because it causes the eventual rtl insn
pattern to be a top-level ASM_INPUT rather than an ASM_OPERANDS.
However, this name has caused confusion, partly due to earlier
documentation.  The name also sounds related to ASM_INPUTS but
is for a different piece of state.

This patch renames it to ASM_BASIC_P, with the inverse meaning
an extended asm.  ("Basic asm" is the term used in extend.texi.)

gcc/
* doc/generic.texi (ASM_BASIC_P): Document.
* tree.h (ASM_INPUT_P): Rename to...
(ASM_BASIC_P): ...this.
(ASM_VOLATILE_P, ASM_INLINE_P): Reindent.
* gimplify.cc (gimplify_asm_expr): Update after above renaming.
* tree-core.h (tree_base): Likewise.

gcc/c/
* c-typeck.cc (build_asm_expr): Rename ASM_INPUT_P to ASM_BASIC_P.

gcc/cp/
* pt.cc (tsubst_stmt): Rename ASM_INPUT_P to ASM_BASIC_P.
* parser.cc (cp_parser_asm_definition): Likewise.

gcc/d/
* toir.cc (IRVisitor): Rename ASM_INPUT_P to ASM_BASIC_P.

gcc/jit/
* jit-playback.cc (playback::block::add_extended_asm):  Rename
ASM_INPUT_P to ASM_BASIC_P.

gcc/m2/
* gm2-gcc/m2block.cc (flush_pending_note): Rename ASM_INPUT_P
to ASM_BASIC_P.
* gm2-gcc/m2statement.cc (m2statement_BuildAsm): Likewise.

11 months agolto/lto.cc: Fix build with not HAVE_WORKING_FORK
Tobias Burnus [Mon, 2 Sep 2024 08:29:36 +0000 (10:29 +0200)] 
lto/lto.cc: Fix build with not HAVE_WORKING_FORK

gcc/lto/ChangeLog:

* lto.cc: Add missing HAVE_WORKING_FORK.

11 months agolto-wrapper: Honor -save-temps for ltrans' makefile
Tobias Burnus [Mon, 2 Sep 2024 08:28:29 +0000 (10:28 +0200)] 
lto-wrapper: Honor -save-temps for ltrans' makefile

gcc/ChangeLog:

* lto-wrapper.cc (run_gcc): Honor -save-temps for
makefile name.

11 months agoada: Diagnose too large size clause on floating-point type
Eric Botcazou [Tue, 20 Aug 2024 15:40:41 +0000 (17:40 +0200)] 
ada: Diagnose too large size clause on floating-point type

The problem is that the size clause changes the floating-point format used
for the type, but it must not when this format is the widest format that is
supported in hardware on the target.  Instead a padding type must be built
and the associated warning given.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity): Cap the Esize of a
floating-point type to the size of the widest format supported in
hardware if it is explicity defined.

11 months agoada: Create usage entry for -gnatw_l
Viljar Indus [Wed, 21 Aug 2024 08:57:58 +0000 (11:57 +0300)] 
ada: Create usage entry for -gnatw_l

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst: update
documentation for the -gnatw_l switch.
* usage.adb: Add -gnatw_l entry.
* gnat_ugn.texi: Regenerate.

11 months agoada: Fix standard output stream for gnatcmd output
Ronan Desplanques [Wed, 21 Aug 2024 13:10:38 +0000 (15:10 +0200)] 
ada: Fix standard output stream for gnatcmd output

Before this patch, the gnat command sent to standard error pieces of
information that are a better match for standard output. This patch
makes this information go to standard output.

gcc/ada/

* gnatcmd.adb (GNATCmd): Fix standard output stream.

11 months agoada: Fix minor issues in -gnaty0's documentation
Ronan Desplanques [Tue, 20 Aug 2024 13:46:22 +0000 (15:46 +0200)] 
ada: Fix minor issues in -gnaty0's documentation

Before this patch, the documentation of -gnaty0 used 0-based indexing
for column numbers while 1-based indexing is used everywhere else. This
patch makes this documentation use 1-based indexing, and also adds a
missing parenthesis.

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix
minor issues.
* gnat_ugn.texi: Regenerate.

11 months agoada: Documentation for generic type inference
Bob Duff [Sun, 18 Aug 2024 23:13:46 +0000 (19:13 -0400)] 
ada: Documentation for generic type inference

...plus minor improvements to existing documentation.

gcc/ada/

* doc/gnat_rm/gnat_language_extensions.rst: I assume "extended set
of extensions" was a typo for "experimental set of extensions",
because "extended extensions" is repetitive and redundant. "in
addition" clarifies that the one subsumes the other. Add a
reminder at the start of each subsection about what switch/pragma
enables what extensions. Add new section about "Inference of
Dependent Types in Generic Instantiations".
* gnat_rm.texi: Regenerate.

11 months agoada: Small fixes for FreeBSD
Patrick Bernardi [Fri, 16 Aug 2024 23:00:23 +0000 (19:00 -0400)] 
ada: Small fixes for FreeBSD

Size of pthread data types now need to be defined for FreeBSD ports.
Traceback support for AArch64 FreeBSD is now defined.

gcc/ada/

* s-oscons-tmplt.c: Define sizes of pthread data types on FreeBSD.
* tracebak.c: Use GCC unwinder and adjust PC appropriately on
aarch64-freebsd.

11 months agoada: Also reset scope for some nested declaration
Marc Poulhiès [Thu, 8 Aug 2024 11:36:37 +0000 (13:36 +0200)] 
ada: Also reset scope for some nested declaration

When changing the scope for entities found in the entry body that is
mutated into a procedure, the compiler needs to look deeper than only
the top level entities as expansion may produce object declarations
which scopes are also the entry. For example, the tree after expansion
may look like:

  procedure This_Is_An_Entry_Proc is
     ...
     O1 : Typ := do
       TMP1 : OTyp := ...;
       ...
     in TMP1;

O1's scope needs to be reset to This_Is_An_Entry_Proc, but so does
TMP1's scope.

This change also fix a small oversight where
N_Implicit_Label_Declaration scope must be reset and its content
skipped.

gcc/ada/

* exp_ch9.adb (Reset_Scopes_To): Adjust comment.
(Reset_Scopes_To.Reset_Scope): Adjust the scope reset for object
declaration. In particular, visit the children nodes if any. Also
extend the handling of other declarations to
N_Implicit_Label_Declaration.

11 months agoada: Cleanup expansion of object declarations
Piotr Trojanek [Thu, 8 Aug 2024 15:06:09 +0000 (17:06 +0200)] 
ada: Cleanup expansion of object declarations

Replace repeated calls to Sloc with uses of local constant Loc.

Code cleanup; behavior is unaffected.

gcc/ada/

* exp_ch3.adb (Expand_N_Object_Declaration): Replace calls to Sloc
with uses of Loc; turn variable Prag into constant.

11 months agoada: Remove repeated guards in validity checks
Piotr Trojanek [Thu, 8 Aug 2024 08:59:57 +0000 (10:59 +0200)] 
ada: Remove repeated guards in validity checks

Routine Insert_Valid_Check only applies checks when Expr_Known_Valid
query returns False; there is no need to call this query before
inserting checks.

Code cleanup; behavior is unaffected.

gcc/ada/

* exp_imgv.adb (Expand_User_Defined_Enumeration_Image)
(Expand_Image_Attribute): Remove redundant guards.

11 months agoranger: Fix up range computation for CLZ [PR116486]
Jakub Jelinek [Mon, 2 Sep 2024 07:44:09 +0000 (09:44 +0200)] 
ranger: Fix up range computation for CLZ [PR116486]

The initial CLZ gimple-range-op.cc implementation handled just the
case where second argument to .CLZ is equal to prec, but in
r15-1014 I've added also handling of the -1 case.  As the following
testcase shows, incorrectly though for the case where the first argument
has [0,0] range.  If the second argument is prec, then the result should
be [prec,prec] and that was handled correctly, but when the second argument
is -1, the result should be [-1,-1] but instead it was incorrectly computed
as [prec-1,prec-1] (when second argument is prec, mini is 0 and maxi is
prec, while when second argument is -1, mini is -1 and maxi is prec-1).

Fixed thusly (the actual handling is then similar to the CTZ [0,0] case).

2024-09-02  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/116486
* gimple-range-op.cc (cfn_clz::fold_range): If lh is [0,0]
and mini is -1, return [-1,-1] range rather than [prec-1,prec-1].

* gcc.dg/bitint-109.c: New test.

11 months agoload and store-lanes with SLP
Richard Biener [Fri, 5 Jul 2024 08:35:08 +0000 (10:35 +0200)] 
load and store-lanes with SLP

The following is a prototype for how to represent load/store-lanes
within SLP.  I've for now settled with having a single load node
with multiple permute nodes acting as selection, one for each loaded lane
and a single store node fed from all stored lanes.  For

  for (int i = 0; i < 1024; ++i)
    {
      a[2*i] = b[2*i] + 7;
      a[2*i+1] = b[2*i+1] * 3;
    }

you have the following SLP graph where I explain how things are set
up and code-generated:

t.c:23:21: note:   SLP graph after lowering permutations:
t.c:23:21: note:   node 0x50dc8b0 (max_nunits=1, refcnt=1) vector(4) int
t.c:23:21: note:   op template: *_6 = _7;
t.c:23:21: note:        stmt 0 *_6 = _7;
t.c:23:21: note:        stmt 1 *_12 = _13;
t.c:23:21: note:        children 0x50dc488 0x50dc6e8

This is the store node, it's marked with ldst_lanes = true during
SLP discovery.  This node code-generates

  vect_array.65[0] = vect__7.61_29;
  vect_array.65[1] = vect__13.62_28;
  MEM <int[8]> [(int *)vectp_a.63_27] = .STORE_LANES (vect_array.65);

...
t.c:23:21: note:   node 0x50dc520 (max_nunits=4, refcnt=2) vector(4) int
t.c:23:21: note:   op: VEC_PERM_EXPR
t.c:23:21: note:        stmt 0 _5 = *_4;
t.c:23:21: note:        lane permutation { 0[0] }
t.c:23:21: note:        children 0x50dc948
t.c:23:21: note:   node 0x50dc780 (max_nunits=4, refcnt=2) vector(4) int
t.c:23:21: note:   op: VEC_PERM_EXPR
t.c:23:21: note:        stmt 0 _11 = *_10;
t.c:23:21: note:        lane permutation { 0[1] }
t.c:23:21: note:        children 0x50dc948

These are the selection nodes, marked with ldst_lanes = true.
They code generate nothing.

t.c:23:21: note:   node 0x50dc948 (max_nunits=4, refcnt=3) vector(4) int
t.c:23:21: note:   op template: _5 = *_4;
t.c:23:21: note:        stmt 0 _5 = *_4;
t.c:23:21: note:        stmt 1 _11 = *_10;
t.c:23:21: note:        load permutation { 0 1 }

This is the load node, marked with ldst_lanes = true (the load
permutation is only accurate when taking into account the lane permute
in the selection nodes).  It code generates

  vect_array.58 = .LOAD_LANES (MEM <int[8]> [(int *)vectp_b.56_33]);
  vect__5.59_31 = vect_array.58[0];
  vect__5.60_30 = vect_array.58[1];

This scheme allows to leave code generation in vectorizable_load/store
mostly as-is.

While this should support both load-lanes and (masked) store-lanes
the decision to do either is done during SLP discovery time and
cannot be reversed without altering the SLP tree - as-is the SLP
tree is not usable for non-store-lanes on the store side, the
load side is OK representation-wise but will very likely fail
permute handling as the lowering to deal with the two input vector
restriction isn't done - but of course since the permute node is
marked as to be ignored that doesn't work out.  So I've put
restrictions in place that fail vectorization if a load/store-lane
SLP tree is later classified differently by get_load_store_type.

I'll note that for example gcc.target/aarch64/sve/mask_struct_store_3.c
will not get SLP store-lanes used because the full store SLPs just
fine though we then fail to handle the "splat" load-permutation

t2.c:5:21: note:   node 0x4db2630 (max_nunits=4, refcnt=2) vector([4,4]) int
t2.c:5:21: note:   op template: _6 = *_5;
t2.c:5:21: note:        stmt 0 _6 = *_5;
t2.c:5:21: note:        stmt 1 _6 = *_5;
t2.c:5:21: note:        stmt 2 _6 = *_5;
t2.c:5:21: note:        stmt 3 _6 = *_5;
t2.c:5:21: note:        load permutation { 0 0 0 0 }

the load permute lowering code currently doesn't consider it worth
lowering single loads from a group (or in this case not grouped loads).
The expectation is the target can handle this by two interleaves with
itself.

So what we see here is that while the explicit SLP representation is
helpful in some cases, in cases like this it would require changing
it when we make decisions how to vectorize.  My idea is that this
all will change a lot when we re-do SLP discovery (for loops) and
when we get rid of non-SLP as I think vectorizable_* should be
allowed to alter the SLP graph during analysis.

The patch also removes the code cancelling SLP if we can use
load/store-lanes from the main loop vector analysis code and
re-implements it as re-discovering the SLP instance with
forced single-lane splits so SLP load/store-lanes scheme can be
used.

This is now done after SLP discovery and SLP pattern recog are
complete to not disturb the latter but per SLP instance instead
of being a global decision on the whole loop.

This is a behavioral change that for example shows in
gcc.dg/vect/slp-perm-6.c on ARM where we formerly used SLP permutes
but now a mix of SLP without permutes and load/store lanes.  The
previous flaky heuristic is now flaky in a different way.

Testing on RISC-V and aarch64 reveal several testcases that require
adjustment as to now expect SLP even when load/store lanes are being
used.  If in doubt I've adjusted them to the final expectation which
will lead to one or two new FAILs where we still do the SLP cancelling.
I have a followup that implements that while remaining in SLP that's
in final testing.

Note that gcc.dg/vect/slp-42.c and gcc.dg/vect/pr68445.c will FAIL
on aarch64 with SVE because for some odd reason vect_stridedN
is true for any N for check_effective_target_vect_fully_masked
targets but SVE cannot do ld8 while risc-v can.

I have not bothered to adjust target tests that now fail assembly-scan.

* tree-vectorizer.h (_slp_tree::ldst_lanes): New flag to mark
load, store and permute nodes.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize ldst_lanes.
(vect_build_slp_instance): For stores iff the target prefers
store-lanes discover single-lane sub-groups, do not perform
interleaving lowering but mark the node with ldst_lanes.
Also allow i == 0 - fatal failure - for splitting up a store group
when we're not doing single-lane discovery already.
(vect_lower_load_permutations): When the target supports
load lanes and the loads all fit the pattern split out
a single level of permutes only and mark the load and
permute nodes with ldst_lanes.
(vectorizable_slp_permutation_1): Handle the load-lane permute
forwarding of vector defs.
(vect_analyze_slp): After SLP pattern recog is finished see if
there are any SLP instances that would benefit from using
load/store-lanes and re-discover those with forced single lanes.
* tree-vect-stmts.cc (get_group_load_store_type): Support
load/store-lanes for SLP.
(vectorizable_store): Support SLP code generation for store-lanes.
(vectorizable_load): Support SLP code generation for load-lanes.
* tree-vect-loop.cc (vect_analyze_loop_2): Do not cancel SLP
when store-lanes can be used.

* gcc.dg/vect/slp-55.c: New testcase.
* gcc.dg/vect/slp-56.c: Likewise.
* gcc.dg/vect/slp-11c.c: Adjust.
* gcc.dg/vect/slp-53.c: Likewise.
* gcc.dg/vect/slp-cond-1.c: Likewise.
* gcc.dg/vect/vect-complex-5.c: Likewise.
* gcc.dg/vect/slp-1.c: Likewise.
* gcc.dg/vect/slp-54.c: Remove riscv XFAIL.
* gcc.dg/vect/slp-perm-5.c: Adjust.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-perm-8.c: Likewise.
* gcc.dg/vect/slp-multitypes-11.c: Likewise.
* gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Remove expected SLP fail due to
three-vector permute.
* gcc.dg/vect/slp-perm-6.c: Remove XFAIL.
* gcc.dg/vect/slp-perm-1.c: Adjust.
* gcc.dg/vect/slp-perm-2.c: Likewise.
* gcc.dg/vect/slp-perm-3.c: Likewise.
* gcc.dg/vect/slp-perm-4.c: Likewise.
* gcc.dg/vect/pr68445.c: Likewise.
* gcc.dg/vect/slp-11b.c: Likewise.
* gcc.dg/vect/slp-2.c: Likewise.
* gcc.dg/vect/slp-23.c: Likewise.
* gcc.dg/vect/slp-33.c: Likewise.
* gcc.dg/vect/slp-42.c: Likewise.
* gcc.dg/vect/slp-46.c: Likewise.
* gcc.dg/vect/slp-perm-10.c: Likewise.

11 months agolower SLP load permutation to interleaving
Richard Biener [Mon, 13 May 2024 12:57:01 +0000 (14:57 +0200)] 
lower SLP load permutation to interleaving

The following emulates classical interleaving for SLP load permutes
that we are unlikely handling natively.  This is to handle cases
where interleaving (or load/store-lanes) is the optimal choice for
vectorizing even when we are doing that within SLP.  An example
would be

void foo (int * __restrict a, int * b)
{
  for (int i = 0; i < 16; ++i)
    {
      a[4*i + 0] = b[4*i + 0] * 3;
      a[4*i + 1] = b[4*i + 1] + 3;
      a[4*i + 2] = (b[4*i + 2] * 3 + 3);
      a[4*i + 3] = b[4*i + 3] * 3;
    }
}

where currently the SLP store is merging four single-lane SLP
sub-graphs but none of the loads in it can be code-generated
with V4SImode vectors and a VF of four as the permutes would need
three vectors.

The patch introduces a lowering phase after SLP discovery but
before SLP pattern recognition or permute optimization that
analyzes all loads from the same dataref group and creates an
interleaving scheme starting from an unpermuted load.

What can be handled is power-of-two group size and a group size of
three.  The possibility for doing the interleaving with a load-lanes
like instruction is done as followup.

For a group-size of three this is done by using
the non-interleaving fallback code which then creates at VF == 4 from
{ { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } }
the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 }
to produce { c0, c1, c2, c3 }.  This turns out to be more effective
than the scheme implemented for non-SLP for SSE and only slightly
worse for AVX512 and a bit more worse for AVX2.  It seems to me that
this would extend to other non-power-of-two group-sizes though (but
the patch does not).  Optimal schemes are likely difficult to lay out
in VF agnostic form.

I'll note that while the lowering assumes even/odd extract is
generally available for all vector element sizes (which is probably
a good assumption), it doesn't in any way constrain the other
permutes it generates based on target availability.  Again difficult
to do in a VF agnostic way (but at least currently the vector type
is fixed).

I'll also note that the SLP store side merges lanes in a way
producing three-vector permutes for store group-size of three, so
the testcase uses a store group-size of four.

The patch has a fallback for when there are multi-lane groups
and the resulting permutes to not fit interleaving.  Code
generation is not optimal when this triggers and might be
worse than doing single-lane group interleaving.

The patch handles gaps by representing them with NULL
entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node.
The SLP discovery changes could be elided if we manually build the
load node instead.

SLP load nodes covering enough lanes to not need intermediate
permutes are retained as having a load-permutation and do not
use the single SLP load node for each dataref group.  That's
something we might want to change, making load-permutation
something purely local to SLP discovery (but then SLP discovery
could do part of the lowering).

The patch misses CSEing intermediate generated permutes and
registering them with the bst_map which is possibly required
for SLP pattern detection in some cases - this re-spin of the
patch moves the lowering after SLP pattern detection.

* tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt.
(vect_build_slp_tree_2): Likewise.  Release load permutation
when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's
no actual permutation in that case.
(vllp_cmp): New function.
(vect_lower_load_permutations): Likewise.
(vect_analyze_slp): Call it.

* gcc.dg/vect/slp-11a.c: Expect SLP.
* gcc.dg/vect/slp-12a.c: Likewise.
* gcc.dg/vect/slp-51.c: New testcase.
* gcc.dg/vect/slp-52.c: New testcase.

11 months ago[PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.
Xianmiao Qu [Mon, 2 Sep 2024 04:28:13 +0000 (22:28 -0600)] 
[PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.

Currently, in RV32, even with the D extension enabled, the cost of DFmode
register moves is still set to 'COSTS_N_INSNS (2)'. This results in the
'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG
register moves, leading to the generation of many redundant instructions.

As an example, consider the following test case:
  double foo (int t, double a, double b)
  {
    if (t > 0)
      return a;
    else
      return b;
  }

When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is generated:
          .cfi_startproc
          addi    sp,sp,-32
          .cfi_def_cfa_offset 32
          fsd     fa0,8(sp)
          fsd     fa1,16(sp)
          lw      a4,8(sp)
          lw      a5,12(sp)
          lw      a2,16(sp)
          lw      a3,20(sp)
          bgt     a0,zero,.L1
          mv      a4,a2
          mv      a5,a3
  .L1:
          sw      a4,24(sp)
          sw      a5,28(sp)
          fld     fa0,24(sp)
          addi    sp,sp,32
          .cfi_def_cfa_offset 0
          jr      ra
          .cfi_endproc

After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the
generated code is as follows, with a significant reduction in the number
of instructions.
          .cfi_startproc
          ble     a0,zero,.L5
          ret
  .L5:
          fmv.d   fa0,fa1
          ret
          .cfi_endproc

gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the
DFmode register move for RV32.

gcc/testsuite/
* gcc.target/riscv/rv32-movdf-cost.c: New test.

11 months ago[committed][PR rtl-optimization/116544] Fix test for promoted subregs
Jeff Law [Mon, 2 Sep 2024 04:16:04 +0000 (22:16 -0600)] 
[committed][PR rtl-optimization/116544] Fix test for promoted subregs

This is a small bug in the ext-dce code's handling of promoted subregs.

Essentially when we see a promoted subreg we need to make additional bit groups
live as various parts of the RTL path know that an extension of a suitably
promoted subreg can be trivially eliminated.

When I added support for dealing with this quirk I failed to account for the
larger modes properly and it ignored the case when the size of the inner object
was > 32 bits.  Opps.

This does _not_ fix the outstanding x86 issue.  That's caused by something
completely different and more concerning ;(

Bootstrapped and regression tested on x86.  Obviously fixes the testcase on
riscv as well.

Pushing to the trunk.

PR rtl-optimization/116544
gcc/
* ext-dce.cc (ext_dce_process_uses): Fix thinko in promoted subreg
handling.

gcc/testsuite/
* gcc.dg/torture/pr116544.c: New test.

11 months agoi386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2
Levy Hsu [Mon, 2 Sep 2024 02:24:49 +0000 (10:24 +0800)] 
i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode
for int mask cmp.
* config/i386/sse.md (vec_cmp<mode><avx512fmaskmodelower>): New
vec_cmp expand for VBF modes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: New test.
* gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Ditto.

11 months agoi386: Support vectorized BF16 sqrt with AVX10.2 instruction
Levy Hsu [Mon, 2 Sep 2024 02:24:48 +0000 (10:24 +0800)] 
i386: Support vectorized BF16 sqrt with AVX10.2 instruction

gcc/ChangeLog:

* config/i386/sse.md: Expand VF2H to VF2HB with VBF modes.

11 months agoi386: Support vectorized BF16 smaxmin with AVX10.2 instructions
Levy Hsu [Mon, 2 Sep 2024 02:24:47 +0000 (10:24 +0800)] 
i386: Support vectorized BF16 smaxmin with AVX10.2 instructions

gcc/ChangeLog:

* config/i386/sse.md
(<code><mode>3): New define expand pattern for BF smaxmin.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test.
* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test.

11 months agoi386: Support vectorized BF16 FMA with AVX10.2 instructions
Levy Hsu [Mon, 2 Sep 2024 02:24:46 +0000 (10:24 +0800)] 
i386: Support vectorized BF16 FMA with AVX10.2 instructions

gcc/ChangeLog:

* config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test.
* gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test.

11 months agoi386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructions
Levy Hsu [Mon, 2 Sep 2024 02:24:45 +0000 (10:24 +0800)] 
i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructions

AVX10.2 introduces several non-exception instructions for BF16 vector.
Enable vectorized BF add/sub/mul/div operation by supporting standard
optab for them.

gcc/ChangeLog:

* config/i386/sse.md (div<mode>3): New expander for BFmode div.
(VF_BHSD): New mode iterator with vector BFmodes.
(<insn><mode>3<mask_name><round_name>): Change mode to VF_BHSD.
(mul<mode>3<mask_name><round_name>): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: New test.
* gcc.target/i386/avx10_2-bf-vector-operations-1.c: Ditto.

11 months agoi386: Optimize generate insn for AVX10.2 compare
Hu, Lin1 [Mon, 2 Sep 2024 02:24:36 +0000 (10:24 +0800)] 
i386: Optimize generate insn for AVX10.2 compare

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_fp_compare): Add UNSPEC to
support the optimization.
* config/i386/i386.cc (ix86_fp_compare_code_to_integer): Add NE/EQ.
* config/i386/i386.md (*cmpx<unord><MODEF:mode>): New define_insn.
(*cmpx<unord>hf): Ditto.
* config/i386/predicates.md (ix86_trivial_fp_comparison_operator):
Add ne/eq.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-compare-1b.c: New test.

11 months agoi386: Optimize ordered and nonequal
Hu, Lin1 [Mon, 2 Sep 2024 02:24:31 +0000 (10:24 +0800)] 
i386: Optimize ordered and nonequal

Currently, when we input !__builtin_isunordered (a, b) && (a != b), gcc
will emit
  ucomiss %xmm1, %xmm0
  movl $1, %ecx
  setp %dl
  setnp %al
  cmovne %ecx, %edx
  andl %edx, %eax
  movzbl %al, %eax

In fact,
  xorl %eax, %eax
  ucomiss %xmm1, %xmm0
  setne %al
is better.

gcc/ChangeLog:

* match.pd: Optimize (and ordered non-equal) to
(not (or unordered  equal))

gcc/testsuite/ChangeLog:

* gcc.target/i386/optimize_one.c: New test.

11 months agoi386: Auto vectorize sdot_prod, usdot_prod, udot_prod with AVX10.2 instructions
Haochen Jiang [Mon, 2 Sep 2024 02:24:29 +0000 (10:24 +0800)] 
i386: Auto vectorize sdot_prod, usdot_prod, udot_prod with AVX10.2 instructions

gcc/ChangeLog:

* config/i386/sse.md (VI1_AVX512VNNIBW): New.
(VI2_AVX10_2): Ditto.
(sdot_prod<mode>): Add AVX10.2
to auto vectorize and combine 512 bit part.
(udot_prod<mode>): Ditto.
(sdot_prodv64qi): Removed.
(udot_prodv64qi): Ditto.
(usdot_prod<mode>): Add AVX10.2 to auto vectorize.
(udot_prod<mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vnniint16-auto-vectorize-2.c: Only define
TEST when not defined.
* gcc.target/i386/vnniint8-auto-vectorize-2.c: Ditto.
* gcc.target/i386/vnniint16-auto-vectorize-3.c: New test.
* gcc.target/i386/vnniint16-auto-vectorize-4.c: Ditto.
* gcc.target/i386/vnniint8-auto-vectorize-3.c: Ditto.
* gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.

11 months agoRISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 3
Pan Li [Sun, 18 Aug 2024 06:08:21 +0000 (14:08 +0800)] 
RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 3

This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 3.  Aka:

Form 3:
  #define DEF_SAT_U_TRUC_FMT_3(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_3 (WT x) \
  {                                        \
    WT max = (WT)(NT)-1;                   \
    return x <= max ? (NT)x : (NT) max;    \
  }

QUAD:
DEF_SAT_U_TRUC_FMT_3 (uint16_t, uint64_t)
DEF_SAT_U_TRUC_FMT_3 (uint8_t, uint32_t)

OCT:
DEF_SAT_U_TRUC_FMT_3 (uint8_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_trunc-16.c: New test.
* gcc.target/riscv/sat_u_trunc-17.c: New test.
* gcc.target/riscv/sat_u_trunc-18.c: New test.
* gcc.target/riscv/sat_u_trunc-run-16.c: New test.
* gcc.target/riscv/sat_u_trunc-run-17.c: New test.
* gcc.target/riscv/sat_u_trunc-run-18.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 months agoRISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Pan Li [Sun, 18 Aug 2024 04:49:47 +0000 (12:49 +0800)] 
RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 2.  Aka:

Form 2:
  #define DEF_SAT_U_TRUC_FMT_2(NT, WT)     \
  NT __attribute__((noinline))             \
  sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
  {                                        \
    WT max = (WT)(NT)-1;                   \
    return x > max ? (NT) max : (NT)x;     \
  }

QUAD:
DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)

OCT:
DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_trunc-10.c: New test.
* gcc.target/riscv/sat_u_trunc-11.c: New test.
* gcc.target/riscv/sat_u_trunc-12.c: New test.
* gcc.target/riscv/sat_u_trunc-run-10.c: New test.
* gcc.target/riscv/sat_u_trunc-run-11.c: New test.
* gcc.target/riscv/sat_u_trunc-run-12.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 months agoRISC-V: Add testcases for form 4 of unsigned vector .SAT_ADD IMM
Pan Li [Fri, 30 Aug 2024 03:01:37 +0000 (11:01 +0800)] 
RISC-V: Add testcases for form 4 of unsigned vector .SAT_ADD IMM

This patch would like to add test cases for the unsigned vector .SAT_ADD
when one of the operand is IMM.

Form 4:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_4(T, IMM)                               \
  T __attribute__((noinline))                                               \
  vec_sat_u_add_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit)      \
  {                                                                         \
    unsigned i;                                                             \
    T ret;                                                                  \
    for (i = 0; i < limit; i++)                                             \
      {                                                                     \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) == 0 ? ret : -1; \
      }                                                                     \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_4(uint64_t, 123)

The below test are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-13.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-14.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-15.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-16.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 months agoRISC-V: Add testcases for form 3 of unsigned vector .SAT_ADD IMM
Pan Li [Fri, 30 Aug 2024 00:36:45 +0000 (08:36 +0800)] 
RISC-V: Add testcases for form 3 of unsigned vector .SAT_ADD IMM

This patch would like to add test cases for the unsigned vector .SAT_ADD
when one of the operand is IMM.

Form 3:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    T ret;                                                             \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
      }                                                                \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 123)

The below test are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-9.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 months agoRISC-V: Refactor gen zero_extend rtx for SAT_* when expand SImode in RV64
Pan Li [Fri, 30 Aug 2024 06:07:12 +0000 (14:07 +0800)] 
RISC-V: Refactor gen zero_extend rtx for SAT_* when expand SImode in RV64

In previous, we have some specially handling for both the .SAT_ADD and
.SAT_SUB for unsigned int.  There are similar to take care of SImode
in RV64 for zero extend.  Thus refactor these two helper function
into one for possible code duplication.

The below test suite are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Merge
the zero_extend handing from func riscv_gen_unsigned_xmode_reg.
(riscv_gen_unsigned_xmode_reg): Remove.
(riscv_expand_ussub): Leverage riscv_gen_zero_extend_rtx
instead of riscv_gen_unsigned_xmode_reg.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_sub-11.c: Adjust asm check.
* gcc.target/riscv/sat_u_sub-15.c: Ditto.
* gcc.target/riscv/sat_u_sub-19.c: Ditto.
* gcc.target/riscv/sat_u_sub-23.c: Ditto.
* gcc.target/riscv/sat_u_sub-27.c: Ditto.
* gcc.target/riscv/sat_u_sub-3.c: Ditto.
* gcc.target/riscv/sat_u_sub-31.c: Ditto.
* gcc.target/riscv/sat_u_sub-35.c: Ditto.
* gcc.target/riscv/sat_u_sub-39.c: Ditto.
* gcc.target/riscv/sat_u_sub-43.c: Ditto.
* gcc.target/riscv/sat_u_sub-47.c: Ditto.
* gcc.target/riscv/sat_u_sub-7.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-15_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: Ditto.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
11 months agoDaily bump.
GCC Administrator [Mon, 2 Sep 2024 00:16:51 +0000 (00:16 +0000)] 
Daily bump.

11 months agoslsr: Use simple_dce_from_worklist in SLSR [PR116554]
Andrew Pinski [Sun, 1 Sep 2024 00:23:19 +0000 (17:23 -0700)] 
slsr: Use simple_dce_from_worklist in SLSR [PR116554]

While working on a phiopt patch, it was noticed that
SLSR would leave around some unused ssa names. Let's
add simple_dce_from_worklist usage to SLSR to remove
the dead statements. This should give a small improvemnent
for passes afterwards.

Boostrapped and tested on x86_64.

gcc/ChangeLog:

PR tree-optimization/116554
* gimple-ssa-strength-reduction.cc: Include tree-ssa-dce.h.
(replace_mult_candidate): Add sdce_worklist argument, mark
the rhs1/rhs2 for maybe dceing.
(replace_unconditional_candidate): Add sdce_worklist argument,
Update call to replace_mult_candidate.
(replace_conditional_candidate): Add sdce_worklist argument,
update call to replace_mult_candidate.
(replace_uncond_cands_and_profitable_phis): Add sdce_worklist argument,
update call to replace_conditional_candidate,
replace_unconditional_candidate, and replace_uncond_cands_and_profitable_phis.
(replace_one_candidate): Add sdce_worklist argument, mark
the orig_rhs1/orig_rhs2 for maybe dceing.
(replace_profitable_candidates): Add sdce_worklist argument,
update call to replace_one_candidate and replace_profitable_candidates.
(analyze_candidates_and_replace): Call simple_dce_from_worklist and
update calls to replace_profitable_candidates, and
replace_uncond_cands_and_profitable_phis.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 months agotestsuite: Prune compilation messages for modules tests
Hans-Peter Nilsson [Sun, 18 Aug 2024 05:01:06 +0000 (07:01 +0200)] 
testsuite: Prune compilation messages for modules tests

All testsuite compiler-calls pass default_target_compile in the
dejagnu installation (typically /usr/share/dejagnu/target.exp) which
also calls the dejagnu-installed prune_warnings.

Normally, tests using the dg framework (most or all tests these days)
compile and link by calling various wrappers that end up calling
dg-test in the dejagnu installation, typically installed as
/usr/share/dejagnu/dg.exp.  That, besides the compiler call, also
calls ${tool}-dg-prune (g++-dg-prune) on the messages, which in turn
ends up calling prune_gcc_output in gcc/testsuite/lib/prune.exp.  That
gcc-specific "pruning" function handles more cases than the dejagnu
prune_warnings, and also has updated patterns.

But, module_do_it in modules.exp calls the lower-level
${tool}_target_compile "directly", i.e. g++_target_compile defined in
gcc/testsuite/lib/g++.exp.  That does not call ${tool}-dg-prune,
meaning those test-cases miss the gcc-specific pruning.

Noticed while testing a dejagnu update that handled the miniscule "in"
in the warning (line-breaks added below besides the original one after
"(void*)':")

"/path/to/cris-elf/bin/ld:
/gccobj/cris-elf/./libstdc++-v3/src/.libs/libstdc++.a(random.o): in
function `std::(anonymous namespace)::__libc_getentropy(void*)':
/gccsrc/libstdc++-v3/src/c++11/random.cc:183: warning: _getentropy is
not implemented and will always fail"

The line saying "in function" rather than "In function" (from the
binutils linker since 2018) is pruned by prune_gcc_output. The
prune_warnings in dejagnu-1.6.3 and earlier handles the second line
separately.  It's an unfortunate wart that neither consumes the
delimiting line-break, leaving to the callers to prune residual empty
lines.  See prune_warnings in dejagnu (default_target_compile and
dg-test) for those other line-break fixups, as alluded in the comment.

* g++.dg/modules/modules.exp (module_do_it): Prune compilation
messages.

11 months agoDaily bump.
GCC Administrator [Sun, 1 Sep 2024 00:25:25 +0000 (00:25 +0000)] 
Daily bump.

11 months agoi386: Support read-modify-write memory operands in STV.
Roger Sayle [Sat, 31 Aug 2024 20:17:18 +0000 (14:17 -0600)] 
i386: Support read-modify-write memory operands in STV.

This patch enables STV when the first operand of a TImode binary
logic operand (AND, IOR or XOR) is a memory operand, which is commonly
the case with read-modify-write instructions.

A different motivating example from the one given previously is:

__int128 m, p, q;
void foo() {
    m ^= (p & q);
}

Currently with -O2 -mavx the RMW instructions are rejected by STV,
resulting in scalar code:

foo: movq    p(%rip), %rax
        movq    p+8(%rip), %rdx
        andq    q(%rip), %rax
        andq    q+8(%rip), %rdx
        xorq    %rax, m(%rip)
        xorq    %rdx, m+8(%rip)
        ret

With this patch they become scalar-to-vector candidates:

foo: vmovdqa p(%rip), %xmm0
        vpand   q(%rip), %xmm0, %xmm0
        vpxor   m(%rip), %xmm0, %xmm0
        vmovdqa %xmm0, m(%rip)
        ret

2024-08-31  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (timode_scalar_to_vector_candidate_p):
Support the first operand of AND, IOR and XOR being MEM_P, i.e. a
read-modify-write insn.

gcc/testsuite/ChangeLog
* gcc.target/i386/movti-2.c: Change dg-options to -Os.
* gcc.target/i386/movti-4.c: Expected output of original movti-2.c.

11 months agolibobjc: Add cast to void* to disable warning for casting between incompatible functi...
Andrew Pinski [Sat, 31 Aug 2024 18:57:32 +0000 (11:57 -0700)] 
libobjc: Add cast to void* to disable warning for casting between incompatible function types [PR89586]

Even though __objc_get_forward_imp returns an IMP type, it will be casted to a compatable function
type before calling it. So we adding a cast to `void*` will disable warning about the incompatible type.

Pushed after bootstrap/test on x86_64.

libobjc/ChangeLog:

PR libobjc/89586
* sendmsg.c (__objc_get_forward_imp): Add cast to `void*` before casting to IMP.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 months agoAVR: Run pass avr-fuse-add a second time after pass_cprop_hardreg.
Georg-Johann Lay [Fri, 30 Aug 2024 17:38:30 +0000 (19:38 +0200)] 
AVR: Run pass avr-fuse-add a second time after pass_cprop_hardreg.

gcc/
* config/avr/avr-passes.cc (avr_pass_fuse_add) <clone>: Override.
* config/avr/avr-passes.def (avr_pass_fuse_add): Run again
after pass_cprop_hardreg.

11 months agoAVR: Tidy pass avr-fuse-add.
Georg-Johann Lay [Fri, 30 Aug 2024 17:38:30 +0000 (19:38 +0200)] 
AVR: Tidy pass avr-fuse-add.

gcc/
* config/avr/avr-protos.h (avr_split_tiny_move): Rename to
avr_split_fake_addressing_move.
* config/avr/avr-passes.def: Same.
* config/avr/avr-passes.cc: Same.
(avr_pass_data_fuse_add) <tv_id>: Set to TV_MACH_DEP.
* config/avr/avr.md (split-lpmx): Remove a define_split.  Such
splits are performed by avr_split_fake_addressing_move.

11 months agotestsuite, c++, coroutines: Avoid 'unused' warnings [NFC].
Iain Sandoe [Sat, 31 Aug 2024 11:53:40 +0000 (12:53 +0100)] 
testsuite, c++, coroutines: Avoid 'unused' warnings [NFC].

The 'torture' section of the coroutine tests is primarily about checking
correct operation of the generated code.  It should, ideally, be possible
to run this part of the testsuite with '-Wall' and expect no fails.  In
the case that we wish to test for a specific diagnostic (and that it does
not appear over a range of optimisation/debug conditions) then we should
make that explict (as done, for example, in pr109867.C).

The tests amended here have warnings because of unused entities; in many
cases those are relevant to the test, and so we just mark them with
__attribute__((__unused__)).

We amend the debug output in coro.h to avoid similar warnings when print
output is disabled (the default).

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro.h: Use a variadic macro for PRINTF to
avoid unused warnings when output is disabled.
* g++.dg/coroutines/torture/co-await-04-control-flow.C: Avoid
unused warnings.
* g++.dg/coroutines/torture/co-ret-13-template-2.C: Likewise.
* g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C: Likewise.
* g++.dg/coroutines/torture/local-var-04-hiding-nested-scopes.C:
Likewise.
* g++.dg/coroutines/torture/pr109867.C: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
11 months agotestsuite, c++, coroutines: Correct a test intent.
Iain Sandoe [Sat, 31 Aug 2024 11:42:36 +0000 (12:42 +0100)] 
testsuite, c++, coroutines: Correct a test intent.

The intention of the series of tests numberef pr95615-* is to
verify that entities created by the ramp and potentially needing
destruction are correctly handled when exceptions are thrown.
Because of a typo, one case was not being checked correctly (the
return object).  This patch amends the check to test that the
returned object is properly deleted.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr95615.inc: Check tha the
task object produced by get_return_object is correctly
deleted on exception.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
11 months agoc++, coroutines: Make and use a frame access helper.
Iain Sandoe [Tue, 27 Aug 2024 13:52:26 +0000 (14:52 +0100)] 
c++, coroutines: Make and use a frame access helper.

In the review of earlier patches it was suggested that we might make
use of finish_class_access_expr instead of doing a lookup for the
member and then a build_class_access_expr call.

finish_class_access_expr does a lot more work than we need and ends
up calling build_class_access_expr anyway.  So, instead, this patch
makes a new helper to do the lookup and build and uses that helper
everywhere except instances in the ramp function that we are going
to handle separately.

gcc/cp/ChangeLog:

* coroutines.cc (coro_build_frame_access_expr): New.
(transform_await_expr): Use coro_build_frame_access_expr.
(transform_local_var_uses): Likewise.
(build_actor_fn): Likewise.
(build_destroy_fn): Likewise.
(cp_coroutine_transform::build_ramp_function): Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
11 months agohppa: Enable PA 2.0 symbolic operands on ELF32 targets
John David Anglin [Sat, 31 Aug 2024 16:20:14 +0000 (12:20 -0400)] 
hppa: Enable PA 2.0 symbolic operands on ELF32 targets

The GNU ELF32 linker has been fixed and it can now handle PA 2.0
symbolic relocations.

This only affects non-pic code generation.

2024-08-31  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.cc (pa_emit_move_sequence): Remove symbolic
memory work arounds for TARGET_ELF32.
(pa_legitimate_address_p): Likewise.  Allow symbolic
operands.  Adjust comment.
* config/pa/pa.md: Replace reg_or_0_or_nonsymb_mem_operand
with reg_or_0_or_mem_operand predicate in various unnamed
move insns.
* config/pa/predicates.md (floating_point_store_memory_operand):
Update comment.  Remove symbolic memory work arounds for
TARGET_ELF32.
(nonsymb_mem_operand): Rename to mem_operand.  Allow
symbolic memory operands.
(reg_or_0_or_nonsymb_mem_operand): Rename to
reg_or_0_or_mem_operand.  Allow symbolic memory operands.

11 months agophiopt: Ignore some nop statements in heursics [PR116098]
Andrew Pinski [Fri, 30 Aug 2024 17:36:24 +0000 (10:36 -0700)] 
phiopt: Ignore some nop statements in heursics [PR116098]

The heurstics that was added for PR71016, try to search to see
if the conversion was being moved away from its definition. The problem
is the heurstics would stop if there was a non GIMPLE_ASSIGN (and already ignores
debug statements) and in this case we would have a GIMPLE_LABEL that was not
being ignored. So we should need to ignore GIMPLE_NOP, GIMPLE_LABEL and GIMPLE_PREDICT.
Note this is now similar to how gimple_empty_block_p behaves.

Note this fixes the wrong code that was reported by moving the VCE (conversion) out before
the phiopt/match could convert it into an bit_ior and move the VCE out with the VCE being
conditionally valid.

Bootstrapped and tested on x86_64-linux-gnu.
Also built and tested for aarch64-linux-gnu.

PR tree-optimization/116098

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Ignore
nops, labels and predicts for heuristic for conversion with a constant.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr116098-1.c: New test.
* gcc.target/aarch64/csel-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 months agotestsuite: Change what is being tested for pr66726-2.c
Andrew Pinski [Fri, 30 Aug 2024 16:53:01 +0000 (09:53 -0700)] 
testsuite: Change what is being tested for pr66726-2.c

r14-575-g6d6c17e45f62cf changed the debug dump message but the testcase
pr66726-2.c was not updated for the change. The testcase was searching to
make sure we didn't factor out a conversion but the testcase was no longer
testing that so we needed to update what was being searched for.

Tested on x86_64-linux.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr66726-2.c: Update scan dump message.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 months agoFortran: downgrade use associated namelist group name to legacy extension
Harald Anlauf [Fri, 30 Aug 2024 19:15:43 +0000 (21:15 +0200)] 
Fortran: downgrade use associated namelist group name to legacy extension

The Fortran standard disallows use associated names as namelist group name
(e.g. F2003:C581, but also later standards).  This feature is a gfortran
legacy extension, and we should give a warning even for -std=gnu.

gcc/fortran/ChangeLog:

* match.cc (gfc_match_namelist): Downgrade feature from GNU to
legacy extension.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr88169_3.f90: Adjust pattern.

11 months agoc++: Add unsequenced C++ testcase
Jakub Jelinek [Sat, 31 Aug 2024 14:03:20 +0000 (16:03 +0200)] 
c++: Add unsequenced C++ testcase

This is the testcase I wrote originally and which on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659154.html
patch didn't behave the way I wanted (no warning and no optimizations of
[[unsequenced]] function templates which don't have pointer/reference
arguments.
Posting this separately, because it depends on the above mentioned
patch as well as the PR116175
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659157.html
patch.

2024-08-31  Jakub Jelinek  <jakub@redhat.com>

* g++.dg/ext/attr-unsequenced-1.C: New test.

11 months agoc: Add support for unsequenced and reproducible attributes
Jakub Jelinek [Sat, 31 Aug 2024 13:58:23 +0000 (15:58 +0200)] 
c: Add support for unsequenced and reproducible attributes

C23 added in N2956 ( https://open-std.org/JTC1/SC22/WG14/www/docs/n2956.htm )
two new attributes, which are described as similar to GCC const and pure
attributes, but they aren't really same and it seems that even the paper
is missing some of the differences.
The paper says unsequenced is the same as const on functions without pointer
arguments and reproducible is the same as pure on such functions (except
that they are function type attributes rather than function
declaration ones), but it seems the paper doesn't consider the finiteness GCC
relies on (aka non-DECL_LOOPING_CONST_OR_PURE_P) - the paper only talks
about using the attributes for CSE etc., not for DCE.

The following patch introduces (for now limited) support for those
attributes, both as standard C23 attributes and as GNU extensions (the
difference is that the patch is then less strict on where it allows them,
like other function type attributes they can be specified on function
declarations as well and apply to the type, while C23 standard ones must
go on the function declarators (i.e. after closing paren after function
parameters) or in type specifiers of function type.

If function doesn't have any pointer/reference arguments, the patch
adds additional internal attribute with " noptr" suffix which then is used
by flags_from_decl_or_type to handle those easy cases as
ECF_CONST|ECF_LOOPING_CONST_OR_PURE or
ECF_PURE|ECF_LOOPING_CONST_OR_PURE
The harder cases aren't handled right now, I'd hope they can be handled
incrementally.

I wonder whether we shouldn't emit a warning for the
gcc.dg/c23-attr-{reproducible,unsequenced}-5.c cases, while the standard
clearly specifies that composite types should union the attributes and it
is what GCC implements for decades, for ?: that feels dangerous for the
new attributes, it would be much better to be conservative on say
(cond ? unsequenced_function : normal_function) (args)

There is no diagnostics on incorrect [[unsequenced]] or [[reproducible]]
function definitions, while I think diagnosing non-const static/TLS
declarations in the former could be easy, the rest feels hard.  E.g. the
const/pure discovery can just punt on everything it doesn't understand,
but complete diagnostics would need to understand it.

2024-08-31  Jakub Jelinek  <jakub@redhat.com>

PR c/116130
gcc/
* doc/extend.texi (unsequenced, reproducible): Document new function
type attributes.
* calls.cc (flags_from_decl_or_type): Handle "unsequenced noptr" and
"reproducible noptr" attributes.
gcc/c-family/
* c-attribs.cc (c_common_gnu_attributes): Add entries for
"unsequenced", "reproducible", "unsequenced noptr" and
"reproducible noptr" attributes.
(handle_unsequenced_attribute): New function.
(handle_reproducible_attribute): Likewise.
* c-common.h (handle_unsequenced_attribute): Declare.
(handle_reproducible_attribute): Likewise.
* c-lex.cc (c_common_has_attribute): Return 202311 for standard
unsequenced and reproducible attributes.
gcc/c/
* c-decl.cc (handle_std_unsequenced_attribute): New function.
(handle_std_reproducible_attribute): Likewise.
(std_attributes): Add entries for "unsequenced" and "reproducible"
attributes.
(c_warn_type_attributes): Add TYPE argument.  Allow unsequenced
or reproducible attributes if it is FUNCTION_TYPE.
(groktypename): Adjust c_warn_type_attributes caller.
(grokdeclarator): Likewise.
(finish_declspecs): Likewise.
* c-parser.cc (c_parser_declaration_or_fndef): Likewise.
* c-tree.h (c_warn_type_attributes): Add TYPE argument.
gcc/testsuite/
* c-c++-common/attr-reproducible-1.c: New test.
* c-c++-common/attr-reproducible-2.c: New test.
* c-c++-common/attr-unsequenced-1.c: New test.
* c-c++-common/attr-unsequenced-2.c: New test.
* gcc.dg/c23-attr-reproducible-1.c: New test.
* gcc.dg/c23-attr-reproducible-2.c: New test.
* gcc.dg/c23-attr-reproducible-3.c: New test.
* gcc.dg/c23-attr-reproducible-4.c: New test.
* gcc.dg/c23-attr-reproducible-5.c: New test.
* gcc.dg/c23-attr-reproducible-5-aux.c: New file.
* gcc.dg/c23-attr-unsequenced-1.c: New test.
* gcc.dg/c23-attr-unsequenced-2.c: New test.
* gcc.dg/c23-attr-unsequenced-3.c: New test.
* gcc.dg/c23-attr-unsequenced-4.c: New test.
* gcc.dg/c23-attr-unsequenced-5.c: New test.
* gcc.dg/c23-attr-unsequenced-5-aux.c: New file.
* gcc.dg/c23-has-c-attribute-2.c: Add tests for unsequenced
and reproducible attributes.

11 months agoAVR: Don't print a space after , when printing instructions.
Georg-Johann Lay [Sat, 31 Aug 2024 08:58:12 +0000 (10:58 +0200)] 
AVR: Don't print a space after , when printing instructions.

gcc/
* config/avr/avr.cc: Follow the convention to not add a space
after comma when printing instructions.

11 months agoOptimize initialization of small padded objects
Alexandre Oliva [Sat, 31 Aug 2024 09:03:12 +0000 (06:03 -0300)] 
Optimize initialization of small padded objects

When small objects containing padding bits (or bytes) are fully
initialized, we will often store them in registers, and setting
bitfields and other small fields will attempt to preserve the
uninitialized padding bits, which tends to be expensive.
Zero-initializing registers, OTOH, tends to be cheap.

So, if we're optimizing, zero-initialize such small padded objects
even if that's not needed for correctness.  We can't zero-initialize
all such padding objects, though: if there's no padding whatsoever,
and all fields are initialized with nonzero, the zero initialization
would be flagged as dead.  That's why we introduce machinery to detect
whether objects have padding bits.  I considered distinguishing
between bitfields, units and larger padding elements, but I didn't
pursue that distinction.

Since the object's zero-initialization subsumes fields'
zero-initialization, the empty string test in builtin-snprintf-6.c's
test_assign_aggregate would regress without the addition of
native_encode_constructor.

for  gcc/ChangeLog

* expr.cc (categorize_ctor_elements_1): Change p_complete to
int, to distinguish complete initialization in presence or
absence of uninitialized padding bits.
(categorize_ctor_elements): Likewise.  Adjust all callers...
* expr.h (categorize_ctor_elements): ... and declaration.
(type_has_padding_at_level_p): New.
* gimple-fold.cc (type_has_padding_at_level_p): New.
* fold-const.cc (native_encode_constructor): New.
(native_encode_expr): Call it.
* gimplify.cc (gimplify_init_constructor): Clear small
non-addressable non-volatile objects with padding or
other uninitialized fields as an optimization.

for  gcc/testsuite/ChangeLog

* gcc.dg/init-pad-1.c: New.

11 months agoDaily bump.
GCC Administrator [Sat, 31 Aug 2024 00:18:22 +0000 (00:18 +0000)] 
Daily bump.

11 months agoc++: add fixed test [PR101099]
Marek Polacek [Fri, 30 Aug 2024 21:09:19 +0000 (17:09 -0400)] 
c++: add fixed test [PR101099]

-fconcepts-ts is no longer supported so this can't be made to ICE
anymore.

PR c++/101099

gcc/testsuite/ChangeLog:

* g++.dg/concepts/pr101099.C: New test.

11 months agoc++: add fixed test [PR115616]
Marek Polacek [Fri, 30 Aug 2024 20:34:11 +0000 (16:34 -0400)] 
c++: add fixed test [PR115616]

This got fixed by r15-2120.

PR c++/115616

gcc/testsuite/ChangeLog:

* g++.dg/template/friend83.C: New test.

11 months agoc++: fix used but not defined warning for friend
Jason Merrill [Thu, 29 Aug 2024 17:27:13 +0000 (13:27 -0400)] 
c++: fix used but not defined warning for friend

Here limit_bad_template_recursion avoids instantiating foo, and then we
wrongly warn that it isn't defined, because as a non-template (but
templated) friend DECL_TEMPLATE_INSTANTIATION is false.

gcc/cp/ChangeLog:

* decl2.cc (c_parse_final_cleanups): Also check
DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/used-inline1.C: New test.

11 months agoFortran: default-initialization of derived-type function results [PR98454]
Harald Anlauf [Thu, 29 Aug 2024 20:17:07 +0000 (22:17 +0200)] 
Fortran: default-initialization of derived-type function results [PR98454]

gcc/fortran/ChangeLog:

PR fortran/98454
* resolve.cc (resolve_symbol): Add default-initialization of
non-allocatable, non-pointer derived-type function results.

gcc/testsuite/ChangeLog:

PR fortran/98454
* gfortran.dg/alloc_comp_class_4.f03: Remove bogus pattern.
* gfortran.dg/pdt_26.f03: Adjust expected count.
* gfortran.dg/derived_result_3.f90: New test.

11 months agogdbhooks: Fix printing of vec with vl_ptr layout
Alex Coplan [Fri, 30 Aug 2024 14:29:34 +0000 (15:29 +0100)] 
gdbhooks: Fix printing of vec with vl_ptr layout

As it stands, the pretty printing of GCC's vecs by gdbhooks.py only
handles vectors with vl_embed layout.  As such, when encountering a vec
with vl_ptr layout, GDB would print a diagnostic like:

  gdb.error: There is no member or method named m_vecpfx.

when (e.g.) any such vec occurred in a backtrace.  This patch extends
VecPrinter.children to also handle vl_ptr vectors.

gcc/ChangeLog:

* gdbhooks.py (VEC_KIND_EMBED): New.
(VEC_KIND_PTR): New.
(get_vec_kind): New.
(VecPrinter.children): Also handle vectors with vl_ptr layout.

11 months agoDon't remove /usr/lib and /lib from when passing to the linker [PR97304/104707]
Andrew Pinski [Tue, 16 Apr 2024 19:06:51 +0000 (12:06 -0700)] 
Don't remove /usr/lib and /lib from when passing to the linker [PR97304/104707]

With newer ld, the default search library path does not include /usr/lib nor /lib
but the driver decides to not pass -L down to the link for these and then in some/most
cases libc is not found.
This code dates from at least 1992 and it is done in a way which is not safe and
does not make sense. So let's remove it.

Bootstrapped and tested on x86_64-linux-gnu (which defaults to being a multilib).

gcc/ChangeLog:

PR driver/104707
PR driver/97304

* gcc.cc (is_directory): Don't not include /usr/lib and /lib
for library directory pathes. Remove library argument.
(add_to_obstack): Update call to is_directory.
(driver_handle_option): Likewise.
(spec_path): Likewise.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 months agomiddle-end: Remove integer_three_node [PR116537]
Andrew Pinski [Thu, 29 Aug 2024 18:01:56 +0000 (11:01 -0700)] 
middle-end: Remove integer_three_node [PR116537]

After the small expansion patch for __builtin_prefetch, the
only use of integer_three_node is inside tree-ssa-loop-prefetch.cc so let's
remove it as the loop prefetch pass is not enabled these days by default and
having a tree node around just for that pass is a little wasteful. Integer
constants are also shared these days so calling build_int_cst will use the cached
node anyways.

Bootstrapped and tested on x86_64-linux.

PR middle-end/116537

gcc/ChangeLog:

* tree-core.h (enum tree_index): Remove TI_INTEGER_THREE
* tree-ssa-loop-prefetch.cc (issue_prefetch_ref): Call build_int_cst
instead of using integer_three_node.
* tree.cc (build_common_tree_nodes): Remove initialization
of integer_three_node.
* tree.h (integer_three_node): Delete.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 months agoexpand: Small speed up expansion of __builtin_prefetch
Andrew Pinski [Thu, 29 Aug 2024 17:58:41 +0000 (10:58 -0700)] 
expand: Small speed up expansion of __builtin_prefetch

This is a small speed up of the expansion of __builtin_prefetch.
Basically for the optional arguments, no reason to call expand_normal
on a constant integer that we know the value, just replace it with
GEN_INT/const0_rtx instead.

Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* builtins.cc (expand_builtin_prefetch): Rewrite expansion of the optional
arguments to not expand known constants.

11 months agoPR modula2/116181: m2rts fix -Wodr warning
Gaius Mulley [Fri, 30 Aug 2024 13:22:01 +0000 (14:22 +0100)] 
PR modula2/116181: m2rts fix -Wodr warning

This patch fixes the -Wodr warning seen in pge-boot/m2rts.h
when building pge.

gcc/m2/ChangeLog:

PR modula2/116181
* pge-boot/GM2RTS.h: Regenerate.
* pge-boot/m2rts.h: Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
11 months agoAvoid division by zero via constant_multiple_p
Richard Biener [Fri, 30 Aug 2024 07:50:32 +0000 (09:50 +0200)] 
Avoid division by zero via constant_multiple_p

With recent SLP vectorization patches I see RISC-V divison by zero
for gfortran.dg/matmul_10.f90 and others in get_group_load_store_type
which does

              && can_div_trunc_p (group_size
                                  * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
                                  nunits, &tem, &remain)
              && (known_eq (remain, 0u)
                  || (constant_multiple_p (nunits, remain, &num)
                      && (vector_vector_composition_type (vectype, num,
                                                          &half_vtype)
                          != NULL_TREE))))
            overrun_p = false;

where for [2, 2] / [0, 2] the condition doesn't reflect what we
are trying to test - that, when remain is zero or, when non-zero,
nunits is a multiple of remain, we can avoid touching a gap via
loading smaller pieces and vector composition.

It isn't safe to change the known_eq to maybe_eq so instead
require known_ne (remain, 0u) before doing constant_multiple_p.
There's the corresponding code in vectorizable_load that's known
to have a latent similar issue, so sync that up as well.

* tree-vect-stmts.cc (get_group_load_store_type): Check
known_ne (remain, 0u) before doing constant_multiple_p.
(vectorizable_load): Likewise.