Richard Biener [Thu, 18 Jul 2024 11:35:33 +0000 (13:35 +0200)]
middle-end/115641 - invalid address construction
fold_truth_andor_1 via make_bit_field_ref builds an address of
a CALL_EXPR which isn't valid GENERIC and later causes an ICE.
The following simply avoids the folding for f ().a != 1 || f ().b != 2
as it is a premature optimization anyway. The alternative would
have been to build a TARGET_EXPR around the call. To get this far
f () has to be const as otherwise the two calls are not semantically
equivalent for the optimization.
PR middle-end/115641
* fold-const.cc (decode_field_reference): If the inner
reference isn't something we can take the address of, fail.
Fortran: Fix Explicit cobounds of a procedures parameter not respected [PR78466]
Explicit cobounds of class array procedure parameters were not taken
into account. Furthermore were different cobounds in distinct
procedure parameter lists mixed up, i.e. the last definition was taken
for all. The bounds are now regenerated when tree's and expr's bounds
do not match.
PR fortran/78466
PR fortran/80774
gcc/fortran/ChangeLog:
* array.cc (gfc_compare_array_spec): Take cotype into account.
* class.cc (gfc_build_class_symbol): Coarrays are also arrays.
* gfortran.h (IS_CLASS_COARRAY_OR_ARRAY): New macro to detect
regular and coarray class arrays.
* interface.cc (compare_components): Take codimension into
account.
* resolve.cc (resolve_symbol): Improve error message.
* simplify.cc (simplify_bound_dim): Remove duplicate.
* trans-array.cc (gfc_trans_array_cobounds): Coarrays are also
arrays.
(gfc_trans_array_bounds): Same.
(gfc_trans_dummy_array_bias): Same.
(get_coarray_as): Get the as having a non-zero codim.
(is_explicit_coarray): Detect explicit coarrays.
(gfc_conv_expr_descriptor): Create a new descriptor for explicit
coarrays.
* trans-decl.cc (gfc_build_qualified_array): Coarrays are also
arrays.
(gfc_build_dummy_array_decl): Same.
(gfc_get_symbol_decl): Same.
(gfc_trans_deferred_vars): Same.
* trans-expr.cc (class_scalar_coarray_to_class): Get the
descriptor from the correct location.
(gfc_conv_variable): Pick up the descriptor when needed.
* trans-types.cc (gfc_is_nodesc_array): Coarrays are also
arrays.
(gfc_get_nodesc_array_type): Indentation fix only.
(cobounds_match_decl): Match a tree's bounds to the expr's
bounds and return true, when they match.
(gfc_get_derived_type): Create a new type tree/descriptor, when
the cobounds of the existing declaration and expr to not
match. This happends for class arrays in parameter list, when
there are different cobound declarations.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/poly_run_1.f90: Activate old test code.
* gfortran.dg/coarray/poly_run_2.f90: Activate test. It was
stopping before and passing without an error.
Paul Thomas [Thu, 18 Jul 2024 07:51:35 +0000 (08:51 +0100)]
Fortran: Suppress bogus used uninitialized warnings [PR108889].
2024-07-18 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/108889
* gfortran.h: Add bit field 'allocated_in_scope' to gfc_symbol.
* trans-array.cc (gfc_array_allocate): Set 'allocated_in_scope'
after allocation if not a component reference.
(gfc_alloc_allocatable_for_assignment): If 'allocated_in_scope'
not set, not a component ref and not allocated, set the array
bounds and offset to give zero length in all dimensions. Then
set allocated_in_scope.
gcc/testsuite/
PR fortran/108889
* gfortran.dg/pr108889.f90: New test.
gimple-fold: consistent dump of builtin call simplifications
Previously only simplifications of the `__st[xrp]cpy_chk`
were dumped. Now all call replacement simplifications are
dumped.
Examples of statements with corresponding dumpfile entries:
`printf("mystr\n");`:
optimized: simplified printf to __builtin_puts
`printf("%c", 'a');`:
optimized: simplified printf to __builtin_putchar
`printf("%s\n", "mystr");`:
optimized: simplified printf to __builtin_puts
The below test suites passed for this patch
* The x86 bootstrap test.
* Manual testing with some small example code manually
examining dump logs, outputting the lines mentioned above.
gcc/ChangeLog:
* gimple-fold.cc (dump_transformation): Moved definition.
(replace_call_with_call_and_fold): Calls dump_transformation.
(gimple_fold_builtin_stxcpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.
(gimple_fold_builtin_stxncpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.
Richard Biener [Wed, 17 Jul 2024 08:22:47 +0000 (10:22 +0200)]
tree-optimization/104515 - store motion and clobbers
The following addresses an old regression when end-of-object/storage
clobbers were introduced. In particular when there's an end-of-object
clobber in a loop but no corresponding begin-of-object we can still
perform store motion of may-aliased refs when we re-issue the
end-of-object/storage on the exits but elide it from the loop. This
should be the safest way to deal with this considering stack-slot
sharing and it should not cause missed dead store eliminations given
DSE can now follow multiple paths in case there are multiple exits.
Note when the clobber is re-materialized only on one exit but not
on anther we are erroring on the side of removing the clobber on
such path. This should be OK (removing clobbers is always OK).
Note there's no corresponding code to handle begin-of-object/storage
during the hoisting part of loads that are part of a store motion
optimization, so this only enables stored-only store motion or cases
without such clobber inside the loop.
PR tree-optimization/104515
* tree-ssa-loop-im.cc (execute_sm_exit): Add clobbers_to_prune
parameter and handle re-materializing of clobbers.
(sm_seq_valid_bb): end-of-storage/object clobbers are OK inside
an ordered sequence of stores.
(sm_seq_push_down): Refuse to push down clobbers.
(hoist_memory_references): Prune clobbers from the loop body
we re-materialized on an exit.
Roger Sayle [Thu, 18 Jul 2024 07:27:36 +0000 (08:27 +0100)]
Implement a -ftrapping-math/-fsignaling-nans TODO in match.pd.
I've been investigating some (float)i == CST optimizations for match.pd,
and noticed there's already a TODO comment in match.pd that's relatively
easy to implement. When CST is a NaN, we only need to worry about
exceptions with flag_trapping_math, and equality/inequality tests for
sNaN only behave differently to qNaN with -fsignaling-nans. These
issues are related to PR 57371 and PR 106805 in bugzilla.
2024-07-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd ((FTYPE) N CMP CST): Only worry about exceptions with
flag_trapping_math, and about signaling NaNs with HONOR_SNANS.
gcc/testsuite/ChangeLog
* c-c++-common/pr57371-4.c: Update comment.
* c-c++-common/pr57371-5.c: Add missing testcases from pr57371-4.c
and update for -fno-signaling-nans -fno-trapping-math.
Fortran: Use char* for deferred length character arrays [PR82904]
Randomly during compiling the pass IPA: inline would ICE. This was
caused by a saved deferred length string. The length variable was not
set, but the variable was used in the array's declaration. Now using a
character pointer to prevent this.
PR fortran/82904
gcc/fortran/ChangeLog:
* trans-types.cc (gfc_sym_type): Use type `char*` for saved
deferred length char arrays.
* trans.cc (get_array_span): Get `.span` also for `char*` typed
arrays, i.e. for those that have INTEGER_TYPE instead of
ARRAY_TYPE.
gcc/testsuite/ChangeLog:
* gfortran.dg/deferred_character_38.f90: New test.
Jakub Jelinek [Thu, 18 Jul 2024 07:22:10 +0000 (09:22 +0200)]
testsuite: Fix up builtin-clear-padding-3.c for -funsigned-char
As reported on gcc-regression, this test FAILs on aarch64, but my
r15-2090 change didn't change anything on the generated assembly,
just added the forgotten dg-do run directive to the test, so the
test has been failing forever, just we didn't know it.
I can actually reproduce it on x86_64 with -funsigned-char too,
s2.b.a has int type and -1 is stored to it, so we should compare
it against -1 rather than (char) -1; the latter is appropriate for
testing char fields into which we've stored -1.
2024-07-18 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/torture/builtin-clear-padding-3.c (main): Compare
s2.b.a against -1 rather than (char) -1.
For compile test, we should generate valid asm except for special purposes.
Fix the compile test that generates invalid asm.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-egprs-names.c: Use ax for short and
al for char instead of eax.
* gcc.target/i386/avx512bw-kandnq-1.c: Do not run the test
under -m32 since kmovq with register is invalid. Use long
long to use 64 bit register instead of 32 bit register for
kmovq.
* gcc.target/i386/avx512bw-kandq-1.c: Ditto.
* gcc.target/i386/avx512bw-knotq-1.c: Ditto.
* gcc.target/i386/avx512bw-korq-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
* gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
* gcc.target/i386/avx512bw-kxorq-1.c: Ditto.
[aarch64] Document rewriting of -march=native to -mcpu=native
Commit dd9e5f4db2debf1429feab7f785962ccef6e0dbd changed -march=native to
treat it as -mcpu=native if no other mcpu or mtune option was given.
It would make sense to document this, especially if we try to persuade
compilers like LLVM to take the same approach.
This patch documents that behaviour.
Bootstrapped and tested on aarch64-none-linux-gnu.
Andi Kleen [Tue, 21 May 2024 14:01:57 +0000 (07:01 -0700)]
Give better error messages for musttail
When musttail is set, make tree-tailcall give error messages
when it cannot handle a call. This avoids vague "other reasons"
error messages later at expand time when it sees a musttail
function not marked tail call.
In various cases this requires delaying the error until
the call is discovered.
Also print more information on the failure to the dump file.
gcc/ChangeLog:
PR c/83324
* tree-tailcall.cc (maybe_error_musttail): New function.
(suitable_for_tail_opt_p): Report error reason.
(suitable_for_tail_call_opt_p): Report error reason.
(find_tail_calls): Accept basic blocks with abnormal edges.
Delay reporting of errors until the call is discovered.
Move top level suitability checks to here.
(tree_optimize_tail_calls_1): Remove top level checks.
Andi Kleen [Thu, 16 May 2024 02:57:22 +0000 (19:57 -0700)]
Enable musttail tail conversion even when not optimizing
Enable the tailcall optimization for non optimizing builds,
but in this case only checks calls that have the musttail attribute set.
This makes musttail work without optimization.
This is done with a new late musttail pass that is only active when
not optimizing. The new pass relies on tree-cfg to discover musttails.
This avoids a ~0.8% compiler run time penalty at -O0.
Andi Kleen [Sun, 2 Jun 2024 05:04:41 +0000 (22:04 -0700)]
Fix pro_and_epilogue for sibcalls at -O0 (PR115255)
Some of the cfg fixups in pro_and_epilogue for sibcalls were dependent on "optimize".
Make them check cfun->tail_call_marked instead to handle the -O0 musttail
case. This fixes the musttail test cases on arm targets.
Andi Kleen [Wed, 24 Jan 2024 07:42:08 +0000 (23:42 -0800)]
Improve must tail in RTL backend
- Give error messages for all causes of non sibling call generation
- When giving error messages clear the musttail flag to avoid ICEs
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message.
gcc/ChangeLog:
PR c/83324
* calls.cc (maybe_complain_about_tail_call): Clear must tail
flag on error.
(expand_call): Give error messages for all musttail failures.
c++/modules: Conditionally start timer during lazy load [PR115165]
While lazy loading, instantiation of pendings can sometimes recursively
perform name lookup and begin further lazy loading. When using the
'-ftime-report' functionality this causes ICEs as we could start an
already-running timer for the importing.
This patch fixes the issue by using the 'timevar_cond*' API instead to
support such recursive calls.
PR c++/115165
gcc/cp/ChangeLog:
* module.cc (lazy_load_binding): Use 'timevar_cond*' APIs.
(lazy_load_pendings): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/timevar-1_a.H: New test.
* g++.dg/modules/timevar-1_b.C: New test.
When partially instantiating a previously declared hidden template
friend definition (at class template scope) such as slot_allocated in
the first testcase below, tsubst_friend_function needs to go through
all existing specializations thereof and make them point to the new
definition.
But when the previous declaration was also at class template scope,
old_decl is not the most general template, instead it's the partial
instantiation, and since instantiations are relative to the most general
template, old_decl's DECL_TEMPLATE_INSTANTIATIONS is empty. So we
to consistently use the most general template here. And when adjusting
DECL_TI_ARGS to match, only the innermost template arguments should be
preserved; the outer ones should correspond to the new definition.
Otherwise we fail a checking-only sanity check in instantiate_decl in
the first testcase, and in the second/third we end up emitting multiple
definitions of the template friend instantiation, resulting in a link
failure.
PR c++/112288
gcc/cp/ChangeLog:
* pt.cc (tsubst_friend_function): When adjusting existing
specializations after defining a previously declared template
friend, consider the most general template and correct
DECL_TI_ARGS adjustment.
gcc/testsuite/ChangeLog:
* g++.dg/template/friend80.C: New test.
* g++.dg/template/friend81.C: New test.
* g++.dg/template/friend81a.C: New test.
Patrick Palka [Thu, 18 Jul 2024 00:57:54 +0000 (20:57 -0400)]
c++: missing -Wunused-value for !<expr> [PR114104]
Here we're neglecting to issue a -Wunused-value warning for suitable !
operator expressions, and in turn for != operator expressions that are
rewritten as !(x == y), only because we don't call warn_if_unused_value
on TRUTH_NOT_EXPR since its class is tcc_expression. This patch makes
us also consider warning for TRUTH_NOT_EXPR and also for ADDR_EXPR.
PR c++/114104
gcc/cp/ChangeLog:
* cvt.cc (convert_to_void): Call warn_if_unused_value for
TRUTH_NOT_EXPR and ADDR_EXPR as well.
Patrick Palka [Thu, 18 Jul 2024 00:54:14 +0000 (20:54 -0400)]
c++: diagnose failed qualified lookup into current inst
When the scope of a qualified name is the current instantiation, and
qualified lookup finds nothing at template definition time, then we
know it'll find nothing at instantiation time (unless the current
instantiation has dependent bases). So such qualified name lookup
failure can be diagnosed ahead of time as per [temp.res.general]/6.
This patch implements that, for qualified names of the form (where
the current instantiation is A<T>):
It turns out we already optimistically attempt qualified lookup of
seemingly every qualified name, even when it's dependently scoped, and
then suppress issuing a lookup failure diagnostic after the fact.
So implementing this is mostly a matter of restricting the diagnostic
suppression to "dependentish" scopes (i.e. dependent scopes or the
current instantiation with dependent bases), rather than suppressing
for any dependently-typed scope as we currently do.
The cp_parser_conversion_function_id change is needed to avoid regressing
lookup/using8.C:
using A<T>::operator typename A<T>::Nested*;
When looking up A<T>::Nested we consider it not dependently scoped since
we entered A<T> from cp_parser_conversion_function_id earlier. But this
A<T> is the implicit instantiation A<T> not the primary template type A<T>,
and so the lookup fails which we now diagnose. This patch works around
this by not entering the template scope of a qualified conversion
function-id in this case, i.e. if we're in an expression vs declaration
context, by seeing if the type already went through finish_template_type
with entering_scope=true.
gcc/cp/ChangeLog:
* decl.cc (make_typename_type): Restrict name lookup failure
punting to dependentish_scope_p instead of dependent_type_p.
* error.cc (qualified_name_lookup_error): Improve diagnostic
when the scope is the current instantiation.
* parser.cc (cp_parser_diagnose_invalid_type_name): Likewise.
(cp_parser_conversion_function_id): Don't call push_scope on
a template scope unless we're in a declaration context.
(cp_parser_lookup_name): Restrict name lookup failure
punting to dependentish_scope_p instead of depedent_type_p.
* semantics.cc (finish_id_expression_1): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.
* g++.dg/cpp0x/alignas18.C: Expect name lookup error for U::X.
* g++.dg/cpp0x/forw_enum13.C: Expect name lookup error for
D3::A and D4<T>::A.
* g++.dg/parse/access13.C: Declare A::E::V to avoid name lookup
failure and preserve intent of the test.
* g++.dg/parse/enum11.C: Expect extra errors, matching the
non-template case.
* g++.dg/template/crash123.C: Avoid name lookup failure to
preserve intent of the test.
* g++.dg/template/crash124.C: Likewise.
* g++.dg/template/crash7.C: Adjust expected diagnostics.
* g++.dg/template/dtor6.C: Declare A::~A() to avoid name lookup
failure and preserve intent of the test.
* g++.dg/template/error22.C: Adjust expected diagnostics.
* g++.dg/template/static30.C: Avoid name lookup failure to
preserve intent of the test.
* g++.old-deja/g++.other/decl5.C: Adjust expected diagnostics.
* g++.dg/template/non-dependent34.C: New test.
lmap was introduced in tcl 8.6, and while it was released in 2012, lmap
does not really make too much of a difference to warrant the friction on
consverative (and relevant) systems.
gcc/testsuite/ChangeLog:
* lib/gcov.exp: Use foreach, not lmap, for tcl <= 8.5 compat.
rtl-ssa: Fix move range canonicalisation [PR115929]
In this PR, canonicalize_move_range walked off the end of a list
and triggered a null dereference. There are multiple ways of fixing
that, but I think the approach taken in the patch should be
relatively efficient.
gcc/
PR rtl-optimization/115929
* rtl-ssa/movement.h (canonicalize_move_range): Check for null prev
and next insns and create an invalid move range for them.
gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-2.c: New test.
One of the goals of the rtl-ssa representation was to allow a
group of consecutive clobbers to be skipped in constant time,
with amortised sublinear insertion and deletion. This involves
putting consecutive clobbers in groups. Splitting or joining
groups would be linear if we had to update every clobber on
each update, so the operation to query a clobber's group is
lazy and (again) amortised sublinear.
This means that, when splitting a group into two, we cannot
reuse the old group for one side. We have to invalidate it,
so that the lazy clobber_info::group query can tell that something
has changed. The ICE in the PR came from failing to do that.
gcc/
PR rtl-optimization/115928
* rtl-ssa/accesses.h (clobber_group): Add a new constructor that
takes the first, last and root clobbers.
* rtl-ssa/internals.inl (clobber_group::clobber_group): Define it.
* rtl-ssa/accesses.cc (function_info::split_clobber_group): Use it.
Allocate a new group for both sides and invalidate the previous group.
(function_info::add_def): After calling split_clobber_group,
remove the old group from the splay tree.
gcc/testsuite/
PR rtl-optimization/115928
* gcc.dg/torture/pr115928.c: New test.
genattrtab: Drop enum tags, consolidate type names
genattrtab printed an "enum" tag before references to attribute
enums, but that's redundant in C++. Removing it means that each
attribute type becomes a single token and can be easily stored
in the attr_desc structure.
gcc/
* genattrtab.cc (attr_desc::cxx_type): New field.
(write_attr_get, write_attr_value): Use it.
(gen_attr, find_attr, make_internal_attr): Initialize it,
dropping enum tags.
Marek Polacek [Wed, 17 Jul 2024 15:19:32 +0000 (11:19 -0400)]
c++: wrong error initializing empty class [PR115900]
In r14-409, we started handling empty bases first in cxx_fold_indirect_ref_1
so that we don't need to recurse and waste time.
This caused a bogus "modifying a const object" error. I'm appending my
analysis from the PR, but basically, cxx_fold_indirect_ref now returns
a different object than before, and we mark the wrong thing as const,
but since we're initializing an empty object, we should avoid setting
the object constness.
~~
Pre-r14-409: we're evaluating the call to C::C(), which is in the body of
B::B(), which is the body of D::D(&d):
C::C ((struct C *) this, NON_LVALUE_EXPR <0>)
It's a ctor so we get here:
3118 /* Remember the object we are constructing or destructing. */
3119 tree new_obj = NULL_TREE;
3120 if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
3121 {
3122 /* In a cdtor, it should be the first `this' argument.
3123 At this point it has already been evaluated in the call
3124 to cxx_bind_parameters_in_call. */
3125 new_obj = TREE_VEC_ELT (new_call.bindings, 0);
We proceed to evaluate the call, then we get here:
3317 /* At this point, the object's constructor will have run, so
3318 the object is no longer under construction, and its possible
3319 'const' semantics now apply. Make a note of this fact by
3320 marking the CONSTRUCTOR TREE_READONLY. */
3321 if (new_obj && DECL_CONSTRUCTOR_P (fun))
3322 cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/true,
3323 non_constant_p, overflow_p);
new_obj is still d.D.2656.D.2597, its type is "C", cxx_set_object_constness
doesn't set anything as const. This is fine.
After r14-409: on line 3125, new_obj is (struct C *) &d.D.2656 as before,
but we go to cxx_fold_indirect_ref_1:
type is C, which is an empty class; optype is "const D", and C is a base of D.
So we return the VAR_DECL 'd'. Then we get to cxx_set_object_constness with
object=d, which is const, so we mark the constructor READONLY.
Then we're evaluating A::A() which has
((A*)this)->data = 0;
we evaluate the LHS to d.D.2656.a, for which the initializer is
{.D.2656={.a={.data=}}} which is TREE_READONLY and 'd' is const, so we think
we're modifying a const object and fail the constexpr evaluation.
PR c++/115900
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_call_expression): Set new_obj to NULL_TREE
if cxx_fold_indirect_ref set empty_base to true.
Eikansh Gupta [Wed, 22 May 2024 17:58:48 +0000 (23:28 +0530)]
MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]
This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
`vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.
The patch adds match pattern for a, b:
(a ? x : y) != (b ? x : y) --> (a^b) ? TRUE : FALSE
(a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
(a ? x : y) != (b ? y : x) --> (a^b) ? TRUE : FALSE
(a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE
PR tree-optimization/111150
gcc/ChangeLog:
* match.pd (`(a ? x : y) eq/ne (b ? x : y)`): New pattern.
(`(a ? x : y) eq/ne (b ? y : x)`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr111150.c: New test.
* gcc.dg/tree-ssa/pr111150-1.c: New test.
* g++.dg/tree-ssa/pr111150.C: New test.
Andrew Pinski [Tue, 16 Jul 2024 16:53:20 +0000 (09:53 -0700)]
Add debug counter for ext_dce
Like r15-1610-gb6215065a5b143 (which adds one for late_combine),
adding one for ext_dce is useful to debug some issues with this pass.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* dbgcnt.def (ext_dce): New debug counter.
* ext-dce.cc (ext_dce_try_optimize_insn): Reject the insn
if the debug counter says so.
(ext_dce): Rename to ...
(ext_dce_execute): This.
(pass_ext_dce::execute): Update for the name of ext_dce.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
which requires at most 2 instructions. When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).
gcc/
PR target/90616
* config/avr/predicates.md (const_0mod256_operand): New predicate.
* config/avr/constraints.md (Cp8): New constraint.
* config/avr/avr.md (*aligned_add_symbol): New insn.
* config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
When op2 is a multiple of 256, there is no need to add / subtract
the lo8 part.
(avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
new insn *aligned_add_symbol as it applies.
Jakub Jelinek [Wed, 17 Jul 2024 15:32:21 +0000 (17:32 +0200)]
bitint: Use gsi_insert_on_edge rather than gsi_insert_on_edge_immediate [PR115887]
The following testcase ICEs on x86_64-linux, because we try to
gsi_insert_on_edge_immediate a statement on an edge which already has
statements queued with gsi_insert_on_edge, and the deferral has been
intentional so that we don't need to deal with cfg changes in between.
The following patch uses the delayed insertion as well.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115887
* gimple-lower-bitint.cc (gimple_lower_bitint): Use gsi_insert_on_edge
instead of gsi_insert_on_edge_immediate and set edge_insertions to
true.
Jakub Jelinek [Wed, 17 Jul 2024 15:30:24 +0000 (17:30 +0200)]
varasm: Shorten assembly of strings with larger zero regions
When not using .base64 directive, we emit for long sequences of zeros
.string "foobarbaz"
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
The following patch changes that to
.string "foobarbaz"
.zero 12
It keeps emitting .string "" if there is just one zero or two zeros where
the first one is preceded by non-zeros, so we can have
.string "foobarbaz"
.string ""
or
.base64 "VG8gYmUgb3Igbm90IHRvIGJlLCB0aGF0IGlzIHRoZSBxdWVzdGlvbg=="
.string ""
but not 2 .string "" in a row.
On a testcase I have with around 310440 0-255 unsigned char character
constants mostly derived from cc1plus start but with too long sequences of
0s which broke transformation to STRING_CST adjusted to have at most 126
consecutive 0s, I see: 1504498 bytes long assembly without this patch on i686-linux (without
.base64 support in binutils) 1155071 bytes long assembly with this patch on i686-linux (without .base64
support in binutils)
431390 bytes long assembly without this patch on x86_64-linux (with
.base64 support in binutils)
427593 bytes long assembly with this patch on x86_64-linux (with .base64
support in binutils)
All 4 assemble to identical *.o file when using x86_64-linux .base64
supporting gas, and the former 2 when using older x86_64-linux gas assemble
to identical content as well.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
* varasm.cc (default_elf_asm_output_ascii): Use ASM_OUTPUT_SKIP instead
of 2 or more default_elf_asm_output_limited_string (f, "") calls and
adjust base64 heuristics correspondingly.
Tamar Christina [Wed, 17 Jul 2024 15:22:14 +0000 (16:22 +0100)]
middle-end: fix 0 offset creation and folding [PR115936]
As shown in PR115936 SCEV and IVOPTS create an invalidate IV when the IV is
a pointer type:
ivtmp.39_65 = ivtmp.39_59 + 0B;
where the IVs are DI mode and the offset is a pointer.
This comes from this weird candidate:
Candidate 8:
Var befor: ivtmp.39_59
Var after: ivtmp.39_65
Incr POS: before exit test
IV struct:
Type: sizetype
Base: 0
Step: 0B
Biv: N
Overflowness wrto loop niter: No-overflow
This IV was always created just ended up not being used.
This is created by SCEV.
simple_iv_with_niters in the case where no CHREC is found creates an IV with
base == ev, offset == 0;
however in this case EV is a POINTER_PLUS_EXPR and so the type is a pointer.
it ends up creating an unusable expression.
gcc/ChangeLog:
PR tree-optimization/115936
* tree-scalar-evolution.cc (simple_iv_with_niters): Use sizetype for
pointers.
Patrick Palka [Wed, 17 Jul 2024 15:08:35 +0000 (11:08 -0400)]
c++: constrained partial spec type context [PR111890]
maybe_new_partial_specialization wasn't propagating TYPE_CONTEXT when
creating a new class type corresponding to a constrained partial spec,
which do_friend relies on via template_class_depth to distinguish a
template friend from a non-template friend, and so in the below testcase
we were incorrectly instantiating the non-template operator+ as if it
were a template leading to an ICE.
PR c++/111890
gcc/cp/ChangeLog:
* pt.cc (maybe_new_partial_specialization): Propagate TYPE_CONTEXT
to the newly created partial specialization.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-partial-spec15.C: New test.
Feng Xue [Wed, 29 May 2024 09:28:14 +0000 (17:28 +0800)]
vect: Optimize order of lane-reducing operations in loop def-use cycles
When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example:
int sum = 1;
for (i)
{
sum += d0[i] * d1[i]; // dot-prod <vector(16) char>
sum += w[i]; // widen-sum <vector(16) char>
sum += abs(s0[i] - s1[i]); // sad <vector(8) short>
sum += n[i]; // normal <vector(4) int>
}
For a higher instruction parallelism in final vectorized loop, an optimal
means is to make those effective vector lane-reducing ops be distributed
evenly among all def-use cycles. Transformed as the below, DOT_PROD,
WIDEN_SUM and SADs are generated into disparate cycles, instruction
dependency among them could be eliminated.
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
* tree-vect-loop.cc (vect_transform_reduction): Generate lane-reducing
statements in an optimized order.
Feng Xue [Wed, 29 May 2024 09:22:36 +0000 (17:22 +0800)]
vect: Support multiple lane-reducing operations for loop reduction [PR114440]
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.
This patches removes some constraints in reduction analysis to allow multiple
arbitrary lane-reducing operations with mixed input vectypes in a loop
reduction chain. For example:
int sum = 1;
for (i)
{
sum += d0[i] * d1[i]; // dot-prod <vector(16) char>
sum += w[i]; // widen-sum <vector(16) char>
sum += abs(s0[i] - s1[i]); // sad <vector(8) short>
}
The vector size is 128-bit vectorization factor is 16. Reduction statements
would be transformed as:
vector<4> int sum_v0 = { 0, 0, 0, 1 };
vector<4> int sum_v1 = { 0, 0, 0, 0 };
vector<4> int sum_v2 = { 0, 0, 0, 0 };
vector<4> int sum_v3 = { 0, 0, 0, 0 };
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (vectorizable_lane_reducing): New function
declaration.
* tree-vect-stmts.cc (vect_analyze_stmt): Call new function
vectorizable_lane_reducing to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_model_reduction_cost): Remove cost computation
code related to emulated_mixed_dot_prod.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing.
(vect_transform_reduction): Adjust comments with updated example.
Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization. For example:
int sum = 1;
for (i)
{
sum += d0[i] * d1[i]; // dot-prod <vector(16) char>
}
The vector size is 128-bit,vectorization factor is 16. Reduction
statements would be transformed as:
vector<4> int sum_v0 = { 0, 0, 0, 1 };
vector<4> int sum_v1 = { 0, 0, 0, 0 };
vector<4> int sum_v2 = { 0, 0, 0, 0 };
vector<4> int sum_v3 = { 0, 0, 0, 0 };
gcc/
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
Calculate effective vector stmts number with generic
vect_get_num_copies.
(vect_transform_reduction): Insert copies for lane-reducing so as to
fix over-estimated vector stmts number.
(vect_transform_cycle_phi): Calculate vector PHI number only based on
output vectype.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
adjustment on vector stmts number specific to slp reduction.
vect: Add a unified vect_get_num_copies for slp and non-slp
Extend original vect_get_num_copies (pure loop-based) to calculate number of
vector stmts for slp node regarding a generic vect region.
2024-07-12 Feng Xue <fxue@os.amperecomputing.com>
gcc/
* tree-vectorizer.h (vect_get_num_copies): New overload function.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Calculate
number of vector stmts for slp node with vect_get_num_copies.
(vect_slp_analyze_node_operations): Calculate number of vector elements
for constant/external slp node with vect_get_num_copies.
Richard Biener [Wed, 17 Jul 2024 09:42:13 +0000 (11:42 +0200)]
tree-optimization/115959 - ICE with SLP condition reduction
The following fixes how during reduction epilogue generation we
gather conditional compares for condition reductions, thereby
following the reduction chain via STMT_VINFO_REDUC_IDX. The issue
is that SLP nodes for COND_EXPRs can have either three or four
children dependent on whether we have legacy GENERIC expressions
in the transitional pattern GIMPLE for the COND_EXPR condition.
PR tree-optimization/115959
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Get at the REDUC_IDX child in a safer way for COND_EXPR
nodes.
Jakub Jelinek [Wed, 17 Jul 2024 09:40:58 +0000 (11:40 +0200)]
testsuite: Add dg-do run to another test
This is another test which clearly has been written with the assumption that
it will be executed, but it isn't.
It works fine when it is executed on both x86_64-linux and i686-linux.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/torture/builtin-convertvector-1.c: Add dg-do run
directive.
Jakub Jelinek [Wed, 17 Jul 2024 09:40:03 +0000 (11:40 +0200)]
varasm: Fix bootstrap after the .base64 changes [PR115958]
Apparently there is a -Wsign-compare warning if ptrdiff_t has precision of
int, then (t - s + 1 + 2) / 3 * 4 has int type while cnt unsigned int.
This doesn't warn if ptrdiff_t has larger precision, say on x86_64
it is 64-bit and so (t - s + 1 + 2) / 3 * 4 has long type and cnt unsigned
int. And it doesn't warn when using older binutils (in my tests I've
used new binutils on x86_64 and old binutils on i686).
Anyway, earlier condition guarantees that t - s is at most 256-ish and
t >= s by construction, so we can just cast it to (unsigned) to avoid
the warning.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR other/115958
* varasm.cc (default_elf_asm_output_ascii): Cast t - s to unsigned
to avoid -Wsign-compare warnings.
Jakub Jelinek [Wed, 17 Jul 2024 09:38:33 +0000 (11:38 +0200)]
gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]
The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size. That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some. If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
> (unsigned HOST_WIDE_INT) buf->sz)
{
gcc_assert (wordsize > 1);
wordsize /= 2;
i -= wordsize;
continue;
}
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end. If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115527
* gimple-fold.cc (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.
Kewen Lin [Wed, 17 Jul 2024 05:19:30 +0000 (00:19 -0500)]
rs6000: Change optab for ibm128 and ieee128 conversion
Currently for 128 bit floating-point ibm128 and ieee128
formats conversion, the corresponding libcalls are:
ibm128 -> ieee128 "__trunctfkf2"
ieee128 -> ibm128 "__extendkftf2"
, and generic code handling (like convert_mode_scalar) also
adopts sext_optab for ieee128 -> ibm128 while trunc_optab
for ibm128 -> ieee128. But in rs6000 port as function
rs6000_expand_float128_convert and init_float128_ieee show,
we adopt sext_optab for ibm128 -> ieee128 with "__trunctfkf2"
while trunc_optab for ieee128 -> ibm128 with "__extendkftf2".
To make them consistent and avoid some surprises, this patch
is to adjust rs6000 internal handlings by adopting trunc_optab
for ibm128 -> ieee128 with "__trunctfkf2" while sext_optab for
ieee128 -> ibm128 with "__extendkftf2".
gcc/ChangeLog:
* config/rs6000/rs6000.cc (init_float128_ieee): Use trunc_optab rather
than sext_optab for converting FLOAT128_IBM_P mode to FLOAT128_IEEE_P
mode, and use sext_optab rather than trunc_optab for converting
FLOAT128_IEEE_P mode to FLOAT128_IBM_P mode.
(rs6000_expand_float128_convert): Likewise.
Kewen Lin [Wed, 17 Jul 2024 05:19:00 +0000 (00:19 -0500)]
tree: Remove KFmode workaround [PR112993]
The fix for PR112993 makes KFmode have 128 bit mode precision,
we don't need this workaround to fix up the type precision any
more, and just go with mode precision. So this patch is to
remove KFmode workaround.
PR target/112993
gcc/ChangeLog:
* tree.cc (build_common_tree_nodes): Drop the workaround for rs6000
KFmode precision adjustment.
Kewen Lin [Wed, 17 Jul 2024 05:17:42 +0000 (00:17 -0500)]
ranger: Revert the workaround introduced in PR112788 [PR112993]
This reverts commit r14-6478-gfda8e2f8292a90 "range:
Workaround different type precision between _Float128 and
long double [PR112788]" as the fixes for PR112993 make
all 128 bits scalar floating point have the same 128 bit
precision, this workaround isn't needed any more.
PR target/112993
gcc/ChangeLog:
* value-range.h (range_compatible_p): Remove the workaround on
different type precision between _Float128 and long double.
Kewen Lin [Wed, 17 Jul 2024 05:16:59 +0000 (00:16 -0500)]
fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]
Previously effective target fortran_real_c_float128 never
passes on Power regardless of the default 128 long double
is ibmlongdouble or ieeelongdouble. It's due to that TF
mode is always used for kind 16 real, which has precision
127, while the node float128_type_node for c_float128 has
128 type precision, get_real_kind_from_node can't find a
matching as it only checks gfc_real_kinds[i].mode_precision
and type precision.
With changing TFmode/IFmode/KFmode to have the same mode
precision 128, now fortran_real_c_float12 can pass with
ieeelongdouble enabled by default and test cases guarded
with it get tested accordingly. But with ibmlongdouble
enabled by default, since TFmode has precision 128 which
is the same as type precision 128 of float128_type_node,
get_real_kind_from_node considers kind for TFmode matches
float128_type_node, but it's wrong as at this time point
TFmode is with ibm extended format. So this patch is to
teach get_real_kind_from_node to check one more field which
can be differentiable from the underlying real format, it
can avoid the unexpected matching when there more than one
modes have the same precisoin.
PR target/112993
gcc/fortran/ChangeLog:
* trans-types.cc (get_real_kind_from_node): Consider the case where
more than one modes have the same precision.
Kewen Lin [Wed, 17 Jul 2024 05:14:43 +0000 (00:14 -0500)]
rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]
On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode. With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128. But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.
This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode. Besides, build_common_tree_nodes
adopts the newly added hook mode_for_floating_type so we
don't need to worry about unexpected mode for long double
type node.
In function convert_mode_scalar, with the proposed change,
it adopts sext_optab for converting ieee128 format mode to
ibm128 format mode while trunc_optab for converting ibm128
format mode to ieee128 format mode. Thus this patch removes
useless extend and trunc optab supports, supplements new
define_expands expandkftf2 and trunctfkf2 to align with
convert_mode_scalar implementation. It also unnames two
define_insn_and_split to avoid conflicts and make them more
clear. Considering the current implementation that there is
no chance to have KF <-> IF conversion (since either of them
would be TF already), it adds two dummy define_expands to
assert this.
Kewen Lin [Wed, 17 Jul 2024 05:14:18 +0000 (00:14 -0500)]
expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format
With some historical reasons, rs6000 defines KFmode, TFmode
and IFmode to have different mode precisions, but it causes
some issues and needs some workarounds such as PR112993.
So we are going to make all rs6000 128 bit scalar FP modes
have 128 bit precision. Be prepared for that, this patch
is to make function convert_mode_scalar allow same precision
FP modes conversion if their underlying formats are
ibm_extended_format and ieee_quad_format respectively, just
like the existing special treatment on arm_bfloat_half_format
<-> ieee_half_format. It also factors out all the relevant
checks into a lambda function. Besides, similar to ieee fp16
-> bfloat conversion, it adopts trunc_optab rather than
sext_optab for ibm128 to ieee128 conversion.
PR target/112993
gcc/ChangeLog:
* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes. Use
trunc_optab rather than sext_optab for ibm128 to ieee128 conversion.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Use trunc_optab rather
than sext_optab for ibm128 to ieee128 conversion.
Peter Bergner [Mon, 15 Jul 2024 21:57:32 +0000 (16:57 -0500)]
rs6000: Error on CPUs and ABIs that don't support the ROP protection insns [PR114759]
We currently silently ignore the -mrop-protect option for old CPUs we don't
support with the ROP hash insns, but we throw an error for unsupported ABIs.
This patch treats unsupported CPUs and ABIs similarly by throwing an error
both both. This matches clang behavior and allows us to simplify our tests
in the code that generates our prologue and epilogue code.
2024-06-26 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Disallow
CPUs and ABIs that do no support the ROP protection insns.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Remove now
unneeded tests.
(rs6000_emit_prologue): Likewise.
Remove unneeded gcc_assert.
(rs6000_emit_epilogue): Likewise.
* config/rs6000/rs6000.md: Likewise.
gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-3.c: New test.
Peter Bergner [Wed, 19 Jun 2024 21:07:29 +0000 (16:07 -0500)]
rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]
We currently only emit the ROP-protect hash* insns for Power10, where the
insns were added to the architecture. We want to emit them for earlier
cpus (where they operate as NOPs), so that if those older binaries are
ever executed on a Power10, then they'll be protected from ROP attacks.
Binutils accepts hashst and hashchk back to Power8, so change GCC to emit
them for Power8 and later. This matches clang's behavior.
c++/modules: Propagate BINDING_VECTOR_*_DUPS_P on realloc [PR99242]
When importing modules, when a binding vector for a name runs out of
slots it gets reallocated with a larger size, and existing bindings are
copied across. However, the flags to indicate whether deduping needs to
occur did not: this causes ICEs, as it allows a duplicate binding to be
added which then violates assumptions later on.
* g++.dg/modules/pr99242_a.H: New test.
* g++.dg/modules/pr99242_b.H: New test.
* g++.dg/modules/pr99242_c.H: New test.
* g++.dg/modules/pr99242_d.C: New test.
c++/contracts: ICE in C++ Contracts with '-fno-exceptions' [PR 110159]
We currently only initialise terminate_fn if exceptions are enabled.
However, contract handling requires terminate_fn when building the
contract because a contract failure may result in std::terminate call
regardless of whether the exceptions are enabled. Refactored
init_exception_processing to extract the initialisation of
terminate_fn. New function init_terminate_fn added that initialises
terminate_fn if it hasn't already been initialised. Call to terminate_fn
added in cxx_init_decl_processing if contracts are enabled.
PR c++/110159
gcc/cp/ChangeLog:
* cp-tree.h (init_terminate_fn): Declaration of a new function.
* decl.cc (cxx_init_decl_processing): If contracts are enabled,
call init_terminate_fn.
* except.cc (init_exception_processing): Function refactored to
call init_terminate_fn.
(init_terminate_fn): Added new function that initializes
terminate_fn if it hasn't already been initialised.
Iain Sandoe [Sat, 15 Jun 2024 16:47:33 +0000 (17:47 +0100)]
c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].
The current implementation of contracts emits the checks into function
bodies in three places; for pre-conditions at the start of the body,
for asserts in-line in the function body and for post-conditions as an
addition to return statements.
In general (at least with existing "2a" contract semantics) the in-line
contract asserts behave as expected.
However, the mechanism is not applicable to:
* Handling pre conditions in coroutines since, for those, the standard
specifies a wrapping of the original function body by functionality
implementing initial and final suspends (along with some housekeeping
to route exceptions). Thus for such transformed function bodies, the
preconditions then get actioned after the initial suspend, which does
not behave as intended.
* Handling post conditions in functions that do not have return
statements (which applies to coroutines and void functions).
In the following, we identify a potentially transformed function body
(in the case of coroutines, this is usually called the "ramp()" function).
The patch here re-implements the code insertion in one of the two
following ways (code for exposition only):
* For functions with no post-conditions we wrap the potentially
transformed function as follows:
This implements the intent that the preconditions are processed after
the function parameters are initialised but before any other actions.
* For functions with post-conditions:
if (preconditions_exist)
handle_pre_condition_checking ();
try
{
potentially_transformed_function_body ();
}
finally
{
handle_post_condition_checking ();
}
else [only if the function is not marked noexcept(true) ]
{
;
}
In this, post-conditions [that might apply to the return value etc.]
are evaluated on every non-exceptional edge out of the function.
At present, the model here is that exceptions thrown by the function
propagate upwards as if there were no contracts present. If the desired
semantic becomes that an exception is counted as equivalent to a contract
violation - then we can add a second handler in place of the empty
statement.
This patch specifically does not address changes to code-gen and constexpr
handling that are contained in P2900.
PR c++/115434
PR c++/110871
PR c++/110872
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR.
* contracts.cc (finish_contract_attribute): Remove excess line.
(build_contract_condition_function): Post condition handlers are
void now.
(emit_postconditions_cleanup): Remove.
(emit_postconditions): New.
(add_pre_condition_fn_call): New.
(add_post_condition_fn_call): New.
(apply_preconditions): New.
(apply_postconditions): New.
(maybe_apply_function_contracts): New.
(apply_postcondition_to_return): Remove.
* contracts.h (apply_postcondition_to_return): Remove.
(maybe_apply_function_contracts): Add.
* coroutines.cc (coro_build_actor_or_destroy_function): Do not
copy contracts to coroutine helpers.
* decl.cc (finish_function): Handle wrapping a possibly
transformed function body in contract checks.
* typeck.cc (check_return_expr): Remove handling of post
conditions on return expressions.
gcc/ChangeLog:
* gimplify.cc (struct gimplify_ctx): Add a flag to show we are
expending a handler.
(gimplify_expr): When we are expanding a handler, and the body
transforms might have re-written DECL_RESULT into a gimple var,
ensure that hander references to DECL_RESULT are also re-written
to refer to the gimple var. When we are processing an EH_ELSE
expression, then add it if either of the cleanup slots is in
use.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/pr115434.C: New test.
* g++.dg/coroutines/pr110871.C: New test.
* g++.dg/coroutines/pr110872.C: New test.
AVR: testsuite - Add noipa function attribute to noclone functions.
Many functions under test have the noinline and noclone function
attributes attached so that no (constant) values are propagated
into the functions, so that we actually are testing what's supposed
to be tested. In order to enforce that, noipa may also be required
when inter-procedural analysis / optimizations are on.
Paul Thomas [Tue, 16 Jul 2024 14:56:44 +0000 (15:56 +0100)]
Fortran: Simplify len_trim with array ref and fix mapping bug[PR84868].
2024-07-16 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/84868
* simplify.cc (gfc_simplify_len_trim): If the argument is an
element of a parameter array, simplify all the elements and
build a new parameter array to hold the result, after checking
that it doesn't already exist.
* trans-expr.cc (gfc_get_interface_mapping_array) if a string
length is available, use it for the typespec.
(gfc_add_interface_mapping): Supply the se string length.
gcc/testsuite/
PR fortran/84868
* gfortran.dg/pr84868.f90: New test.
order_nodes are used to implement ordered comparisons between
two insns with the same program point number. remove_insn would
remove an order_node from its splay tree, but didn't remove it
from the insn. This caused confusion if the insn was later
reinserted somewhere else that also needed an order_node.
gcc/
PR rtl-optimization/115929
* rtl-ssa/insns.cc (function_info::remove_insn): Remove an
order_node from the instruction as well as from the splay tree.
gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-1.c: New test.
recog: restrict paradoxical mode punning in insn_propagation [PR115901]
In g:44fc801e97a8dc626a4806ff4124439003420b20 I'd extended
insn_propagation to handle simple cases of hard-reg mode punning.
One of the checks was that the new use mode occupied the same
number of registers as the original definition mode. However,
as PR115901 shows, we need to avoid increasing the size of any
registers in the punned "to" expression as well.
Specifically, the test includes a DImode move from GPR x0 to
a vector register, followed by a V2DI use of the vector register.
The simplification would then create a V2DI spanning x0 and x1,
manufacturing a new, unwanted use of x1.
Checking for that kind of thing directly seems too cumbersome,
and is not related to the original motivation (which was to improve
handling of shared vector zeros on aarch64). This patch therefore
restricts the paradoxical case to constants.
gcc/
PR rtl-optimization/115901
* recog.cc (insn_propagation::apply_to_rvalue_1): Restrict
paradoxical mode punning to cases where "to" is constant.
gcc/testsuite/
PR rtl-optimization/115901
* gcc.dg/torture/pr115901.c: New test.
rtl-ssa: Enforce earlyclobbers on hard-coded clobbers [PR115891]
The asm in the testcase has a memory operand and also clobbers ax.
The clobber means that ax cannot be used to hold inputs, which
extends to the address of the memory.
I think I had an implicit assumption that constrain_operands
would enforce this, but in hindsight, that clearly wasn't going
to be true. constrain_operands only looks at constraints, and
these clobbers are by definition outside the constraint system.
(And that's why they have to be handled conservatively, since there's
no way to distinguish the earlyclobber and non-earlyclobber cases.)
The semantics of hard-coded clobbers are generic enough that I think
they should be handled directly by rtl-ssa, rather than by consumers.
And in the context of rtl-ssa, the easiest way to check for a clash is
to walk the list of input registers, which we already have to hand.
It therefore seemed better not to push this down to a more generic
rtl helper.
The patch detects hard-coded clobbers in the same way as regrename:
by temporarily stubbing out the operands with pc_rtx.
gcc/
PR rtl-optimization/115891
* rtl-ssa/changes.cc (find_clobbered_access): New function.
(recog_level2): Use it to check for overlap between input
registers and hard-coded clobbers. Conditionally reset
recog_data.insn after changing the insn code.
gcc/testsuite/
PR rtl-optimization/115891
* gcc.target/i386/pr115891.c: New test.
where "extend" may be a sign-extend or zero-extend,
and the integer modes are SImode >= M > L >= QImode.
The existing patterns are now represented in terms of insns
with mode iterators and a code iterator over any_extend,
and these new insn support all valid combinations of M and L
(which previously was not the case).
gcc/
* config/avr/avr.cc (avr_out_minus): Assimilate into...
(avr_out_plus_ext): ...this new function.
(avr_adjust_insn_length) [ADJUST_LEN_PLUS_EXT]: Handle case.
(avr_rtx_costs_1) [PLUS, MINUS]: Adjust RTX costs.
* config/avr/avr.md (adjust_len) <plus_ext>: Add new attribute value.
(*addpsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_zero_extend.qi_split): Assimilate...
(*addsi3_zero_extend_split): Assimilate...
(*addsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_sign_extend.hi_split): Assimilate...
(*addhi3.sign_extend1_split): Assimilate...
(*add<PSISI:mode>3.<code>.<QIPSI:mode>_split): ...into this
new insn-and-split.
(*addpsi3_zero_extend.hi): Assimilate...
(*addpsi3_zero_extend.qi): Assimilate...
(*addsi3_zero_extend): Assimilate...
(*addsi3_zero_extend.hi): Assimilate...
(*addpsi3_sign_extend.hi): Assimilate...
(*addhi3.sign_extend1): Assimilate...
(*add<PSISI:mode>3.<code>.<QIPSI:mode>): ...into this new insn.
(*subpsi3_sign_extend.hi_split): Assimilate...
(*subhi3.sign_extend2_split): Assimilate...
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>_split): Assimilate...
(*sub<HISI:mode>3.<code><QIPSI:mode>_split): ...into this new
insn-and-split.
(*subpsi3_sign_extend.hi): Assimilate...
(*subhi3.sign_extend2): Assimilate...
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>): Assimilate...
(*sub<HISI:mode>3.<code>.<QIPSI:mode>): ...into this new insn.
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>): Use avr_out_plus_ext
for asm out.
* config/avr/avr-protos.h (avr_out_minus): Remove.
(avr_out_plus_ext): New proto.
gcc/testsuite/
* gcc.target/avr/torture/add-extend.c: New test.
* gcc.target/avr/torture/sub-extend.c: New test.
Gaius Mulley [Tue, 16 Jul 2024 14:27:21 +0000 (15:27 +0100)]
PR modula2/115957 ICE on procedure local const declaration
An ICE would occur if a constant was declared using a variable term.
This fix catches variable terms in constant expressions and generates
an unrecoverable error.
gcc/m2/ChangeLog:
PR modula2/115957
* gm2-compiler/M2StackAddress.mod (PopAddress): Detect tail=NIL
and generate an internal error.
* gm2-compiler/PCBuild.bnf (InConstParameter): New variable.
(InConstBlock): New variable.
(ErrorString): Rewrite using MetaErrorStringT0.
(ErrorArrayAt): Rewrite using MetaErrorStringT0.
(WarnMissingToken): Use MetaErrorStringT0.
(CompilationUnit): Set seenError FALSE.
(init): Initialize InConstParameter and InConstBlock.
(ConstantDeclaration): Set InConstBlock.
(ConstSetOrQualidentOrFunction): Call CheckNotVar if not
InConstParameter and InConstBlock.
(ConstActualParameters): Set InConstParameter TRUE and restore
value at the end.
* gm2-compiler/PCSymBuild.def (CheckNotVar): New procedure.
Remove all unnecessary export qualified list.
* gm2-compiler/PCSymBuild.mod (CheckNotVar): New procedure.
gcc/testsuite/ChangeLog:
PR modula2/115957
* gm2/errors/fail/badconst.mod: New test.
* gm2/pim/fail/tinyadr.mod: New test.
When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge). The code
currently disregards this situation.
With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.
PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.
AVR: Allow more combinations of XOR / IOR with byte-shifts.
This patch takes some existing patterns that have QImode as one
input and uses a mode iterator to allow for more modes to match.
These insns are split after reload into *xorqi3 resp. *iorqi3 insn(s).
gcc/
* config/avr/avr-protos.h (avr_emit_xior_with_shift): New proto.
* config/avr/avr.cc (avr_emit_xior_with_shift): New function.
* config/avr/avr.md (any_lshift): New code iterator.
(*<xior:code><mode>.<any_lshift:code>): New insn-and-split.
(<code><HISI:mode><QIPSI:mode>.0): Replaces...
(*<code_stdname><mode>qi.byte0): ...this one.
(*<xior:code><HISI:mode><QIPSI:mode>.<any_lshift:code>): Replaces...
(*<code_stdname><mode>qi.byte1-3): ...this one.
Andrew Burgess [Sat, 10 Feb 2024 11:22:13 +0000 (11:22 +0000)]
libiberty/buildargv: handle input consisting of only white space
GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.
I have recently been working to improve this area of GDB, and noticed
some unexpected behaviour to the libiberty function buildargv, when
the input is a string consisting only of white space.
What I observe is that if the input to buildargv is a string
containing only white space, then buildargv will return an argv list
containing a single empty argument, e.g.:
We get the same output from buildargv if the input is a single space,
or multiple spaces. Other white space characters give the same
results.
This doesn't seem right to me, and in fact, there appears to be a work
around for this issue in expandargv where we have this code:
/* If the file is empty or contains only whitespace, buildargv would
return a single empty argument. In this context we want no arguments,
instead. */
if (only_whitespace (buffer))
{
file_argv = (char **) xmalloc (sizeof (char *));
file_argv[0] = NULL;
}
else
/* Parse the string. */
file_argv = buildargv (buffer);
I think that the correct behaviour in this situation is to return an
empty argv array, e.g.:
And it turns out that this is a trivial change to buildargv. The diff
does look big, but this is because I've re-indented a block. Check
with 'git diff -b' to see the minimal changes. I've also removed the
work around from expandargv.
When testing this sort of thing I normally write the tests first, and
then fix the code. In this case test-expandargv.c has sort-of been
used as a mechanism for testing the buildargv function (expandargv
does call buildargv most of the time), however, for this particular
issue the work around in expandargv (mentioned above) masked the
buildargv bug.
I did consider adding a new test-buildargv.c file, however, this would
have basically been a copy & paste of test-expandargv.c (with some
minor changes to call buildargv). This would be fine now, but feels
like we would eventually end up with one file not being updated as
much as the other, and so test coverage would suffer.
Instead, I have added some explicit buildargv testing to the
test-expandargv.c file, this reuses the test input that is already
defined for expandargv.
Of course, once I removed the work around from expandargv then we now
do always call buildargv from expandargv, and so the bug I'm fixing
would impact both expandargv and buildargv, so maybe the new testing
is redundant? I tend to think more testing is always better, so I've
left it in for now.
2024-07-16 Andrew Burgess <aburgess@redhat.com>
libiberty/
* argv.c (buildargv): Treat input of only whitespace as an empty
argument list.
(expandargv): Remove work around for intput that is only
whitespace.
* testsuite/test-expandargv.c: Add new tests 10, 11, and 12.
Extend testing to call buildargv in more cases.
Andrew Burgess [Wed, 6 Dec 2023 16:45:31 +0000 (16:45 +0000)]
libiberty/buildargv: POSIX behaviour for backslash handling
GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.
I have recently been working to improve this area of GDB, and have
tracked done some of the unexpected behaviour to the libiberty
function buildargv, and how it handles backslash escapes.
1. Backslashes within single quotes should not be treated as an
escape, thus: '\a' should split to \a, retaining the backslash.
2. Backslashes within double quotes should only act as an escape if
they are immediately before one of the characters $ (dollar),
` (backtick), " (double quote), ` (backslash), or \n (newline). In
all other cases a backslash should not be treated as an escape
character. Thus: "\a" should split to \a, but "\$" should split to
$.
3. A backslash-newline sequence should be treated as a line
continuation, both the backslash and the newline should be removed.
I've updated libiberty and also added some tests. All the existing
libiberty tests continue to pass, but I'm not sure if there is more
testing that should be done, buildargv is used within lto-wraper.cc,
so maybe there's some testing folk can suggest that I run?
2024-07-16 Andrew Burgess <aburgess@redhat.com>
libiberty/
* argv.c (buildargv): Backslashes within single quotes are
literal, backslashes only escape POSIX defined special characters
within double quotes, and backslashed newlines should act as line
continuations.
* testsuite/test-expandargv.c: Add new tests 7, 8, and 9.
Code attribute bhfgq is missing a mapping for TF. This results in
unresolved iterators in assembler templates for *bswaptf.
With the TF mapping added the base mnemonics vlbr and vstbr are not
"used" anymore but only the extended mnemonics (vlbr<bhfgq> was
interpreted as vlbr; likewise for vstbr). Therefore, remove the base
mnemonics from the scheduling description, otherwise, genattrtab would
error about unknown mnemonics.
Similarly, we end up with unresolved iterators in assembler templates
for mulfprx23 since code attribute xdee is missing a mapping for FPRX2.
Richard Biener [Tue, 16 Jul 2024 08:45:27 +0000 (10:45 +0200)]
Fixup unaligned load/store cost for znver5
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply copied from the bogus znver4 costs. The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.
* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
load and store cost from the aligned costs.
In preparation of dropping vcond{,u,eq} optabs
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654690.html
enable 128-bit operands for vcond_mask---including integer as well as
floating point.
This fixes partially PR115519 w.r.t. autovec-long-double-signaling-*.c
tests.
gcc/ChangeLog:
* config/s390/vector.md: Enable vcond_mask for 128-bit ops.
s390: Emulate vec_cmp{eq,gt,gtu} for 128-bit integers
Mode iterator V_HW enables V1TI for target VXE which means
vec_cmpv1tiv1ti becomes available which leads to an ICE since there is
no corresponding insn.
Fixed by emulating comparisons and enabling mode V1TI unconditionally
for V_HW. For the sake of symmetry, I also added TI mode to V_HW since
TF mode is already included. As a consequence the consumers of V_HW
vec_{splat,slb,sld,sldw,sldb,srdb,srab,srb,test_mask_int,test_mask}
also become available for 128-bit integers.
This fixes gcc.c-torture/execute/pr105613.c and gcc.dg/pr106063.c.
* gcc.target/s390/vector/vec-cmp-emu-1.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-2.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-3.c: New test.
Richard Biener [Mon, 15 Jul 2024 11:50:58 +0000 (13:50 +0200)]
tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling
When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases. The following fixes this by properly applying
the bias for the component to the shift amount.
PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.
Richard Biener [Mon, 15 Jul 2024 11:01:24 +0000 (13:01 +0200)]
Fixup unaligned load/store cost for znver4
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4. The following makes
the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there. But it makes it qualify
as a regression fix.
PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.
Roger Sayle [Tue, 16 Jul 2024 06:58:28 +0000 (07:58 +0100)]
PR tree-optimization/114661: Generalize MULT_EXPR recognition in match.pd.
This patch resolves PR tree-optimization/114661, by generalizing the set
of expressions that we canonicalize to multiplication. This extends the
optimization(s) contributed (by me) back in July 2021.
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
allowed. A subtlety is that for non-wrapping integer types, we
actually fold this into (int)((unsigned)X*C3) so that we don't
introduce an undefined overflow that wasn't in the original.
Unfortunately, this transformation confuses itself, as the type-cast
multiplication isn't recognized when further combining bit operations.
Fixed here by allowing optional useless type conversions in transforms
to turn (int)((unsigned)X*C1)^(X<<C2) into (int)((unsigned)X*C3) so
that match.pd and EVRP can continue to construct multiplications.
For the example given in the PR:
unsigned mul(unsigned char c) {
if (c > 3) __builtin_unreachable();
return c << 18 | c << 15 |
c << 12 | c << 9 |
c << 6 | c << 3 | c;
}
mul: movzbl %dil, %eax
imull $299593, %eax, %eax
ret
2024-07-16 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/114661
* match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
type conversions around multiplications, such as those inserted
by this transformation.
gcc/testsuite/ChangeLog
PR tree-optimization/114661
* gcc.dg/pr114661.c: New test case.
Based on actual usage, trunc{128}2{16,32,64} use some instructions from
sse/sse3, so extend their scope to extend the scope of optimization.
gcc/ChangeLog:
PR target/107432
* config/i386/sse.md
(PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
(PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
(trunc<mode><pmov_dst_3_lower>2): Change constraint from TARGET_AVX2 to
TARGET_SSSE3.
(trunc<mode><pmov_dst_4_lower>2): Ditto.
(truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-10.c: New test.
Jeff Law [Tue, 16 Jul 2024 00:15:33 +0000 (18:15 -0600)]
Fix liveness computation for shift/rotate counts in ext-dce
So as I've noted before I believe the control flow in ext-dce.cc is horribly
messy. While investigating a fix for 115877 I came across another problem
related to control flow handling.
Specifically, if we have an binary op which implies the 2nd operand is fully
live, then we'd actually fail to mark that operand as live.
We essentially broke out of the loop which was supposed to be safe. But Y was
a REG and if Y is a REG or CONST_INT we skip sub-rtxs and thus failed to
process that operand (the shift count) at all.
Rather than muck around with control flow, we can just set all the bits as live
in DST_MASK and let normal processing continue. With all the bits live IN
DST_MASK all the bits implied by the mode of the argument will also be live.
No testcase.
Bootstrapped and regression tested on x86. Pushing to the trunk.
gcc/
* ext-dce.cc (ext_dce_process_uses): Simplify control flow and fix
liveness computation for shift/rotate counts.
Jeff Law [Mon, 15 Jul 2024 22:57:44 +0000 (16:57 -0600)]
Fix sign/carry bit handling in ext-dce.
My change to fix a ubsan issue broke handling propagation of the carry/sign bit
down through a right shift. Thanks to Andreas for the analysis and proposed
fix and Sergei for the testcase.
PR rtl-optimization/115876
PR rtl-optimization/115916
gcc/
* ext-dce.cc (carry_backpropagate): Make return type unsigned as well.
Cast to signed for right shift to preserve sign bit.
gcc/testsuite/
* g++.dg/torture/pr115916.C: New test.
Co-author: Andreas Schwab <schwab@linux-m68k.org>
Co-author: Sergei Trofimovich <slyfox at gentoo dot org>
Patrick Palka [Mon, 15 Jul 2024 22:07:55 +0000 (18:07 -0400)]
c++: alias template with dependent attributes [PR115897]
Here we're prematurely stripping the dependent alias template-id A<T> to
its defining-type-id T when used as a template argument, which in turn
causes us to essentially ignore A's vector_size attribute in the outer
template-id.
This has always been a problem for class template-ids it seems, and after
r14-2170 variable template-ids are affected as well.
This patch marks alias templates that have a dependent attribute as
complex (as with e.g. constrained alias templates) so that we don't look
through them prematurely.
PR c++/115897
gcc/cp/ChangeLog:
* pt.cc (complex_alias_template_p): Return true for an alias
template with attributes.
(get_underlying_template): Don't look through an alias template
with attributes.
RISC-V: Allow adding enabled extension via target arch attributes
The set of enabled extensions can be extended via target arch function
attributes by listing each extension with a '+' prefix and a comma as
list separator. E.g.:
__attribute__((target("arch=+zba,+zbb"))) void foo();
The programmer intends to ensure that one or more extensions
are enabled when building the code. This is independent of the arch
string that is passed at build time via the -march= option.
Therefore, it is reasonable to allow enabling extensions via target arch
attributes, which have already been enabled via the -march= string.
The subset list code already supports such duplication for implied
extensions. This patch adds an interface so the subset list
parser can be switched into a mode where duplication is allowed.
This commit fixes the following regressed test cases:
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_subset_list::add):
Allow adding enabled extension if m_allow_adding_dup is set.
* config/riscv/riscv-subset.h: Add m_allow_adding_dup and setter.
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Allow adding enabled extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr115554.c: Change expected fail to expected pass.
* gcc.target/riscv/target-attr-16.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
The target-arch attribute handling in RISC-V is only a few months old,
but already saw a rewrite (9941f0295a14), which addressed an important
issue. This rewrite introduced a hash table in the backend, which is
used to keep track of target-arch attributes of all functions.
The index of this hash table is the pointer to the function declaration
object (fndecl). However, objects like these don't have the lifetime
that is assumed here, which resulted in observing two fndecl objects
with the same address for different objects (triggering the assertion
in riscv_func_target_put() -- see also PR115562).
This patch removes the hash table approach in favor of storing target
specific options using the DECL_FUNCTION_SPECIFIC_TARGET() macro, which
is also used by other backends and is specifically designed for this
purpose (https://gcc.gnu.org/onlinedocs/gccint/Function-Properties.html).
To have an accessible field in the target options, we need to
adjust riscv.opt and introduce the field riscv_arch_string
(for the already existing option '-march=').
Using this macro allows to remove much code from riscv-common.cc, which
controls access to the objects 'func_target_table' and 'current_subset_list'.
One thing to mention is, that we had two subset lists:
current_subset_list and cmdline_subset_list, with the latter being
introduced recently for target attribute handling.
This patch reduces them back to one (cmdline_subset_list) which
contains the list of extensions that have been enabled by the command
line arguments.
Note that the patch keeps the existing behavior of rejecting
duplications of extensions when added via the '+' operator in a function
target attribute. E.g. "-march=rv64gc_zbb" and "arch=+zbb" will trigger
an error (see pr115554.c). However, at the same time this patch breaks
the acceptance of adding implied extensions, which causes the following
six regressions (with the error "extension 'EXT' appear more than one time"):
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c
New tests were added to document the behavior and to ensure it won't
regress. This patch did not show any regressions for rv32/rv64
and fixes the ICEs from PR115554 and PR115562.
PR target/115554
PR target/115562
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (struct riscv_func_target_info):
Remove.
(struct riscv_func_target_hasher): Likewise.
(riscv_func_decl_hash): Likewise.
(riscv_func_target_hasher::hash): Likewise.
(riscv_func_target_hasher::equal): Likewise.
(riscv_current_subset_list): Likewise.
(riscv_cmdline_subset_list): Remove obsolete space.
(riscv_func_target_table_lazy_init): Remove.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
(riscv_arch_str): Generate from cmdline_subset_list.
(riscv_set_arch_by_subset_list): Don't set current_subset_list.
(riscv_parse_arch_string): Remove current_subset_list.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Get subset list via riscv_cmdline_subset_list().
* config/riscv/riscv-subset.h (riscv_current_subset_list):
Remove prototype.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Build base arch string from existing target options, if any.
(riscv_target_attr_parser::update_settings): Store new arch
string in target options.
(riscv_process_one_target_attr): Whitespace fix.
(riscv_process_target_attr): Drop opts argument.
(riscv_option_valid_attribute_p): Properly save, change and restore
target options.
* config/riscv/riscv.cc (get_arch_str): New function.
(riscv_declare_function_name): Get arch string for option-arch
directive from function's target options.
* config/riscv/riscv.opt: Add riscv_arch_string variable to
march option.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/target-attr-01.c: Add test for option-arch directive.
* gcc.target/riscv/target-attr-02.c: Likewise.
* gcc.target/riscv/target-attr-03.c: Likewise.
* gcc.target/riscv/target-attr-04.c: Likewise.
* gcc.target/riscv/target-attr-05.c: Fix formatting.
* gcc.target/riscv/target-attr-06.c: Likewise.
* gcc.target/riscv/target-attr-07.c: Likewise.
* gcc.target/riscv/pr115554.c: New test.
* gcc.target/riscv/pr115562.c: New test.
* gcc.target/riscv/target-attr-08.c: New test.
* gcc.target/riscv/target-attr-09.c: New test.
* gcc.target/riscv/target-attr-10.c: New test.
* gcc.target/riscv/target-attr-11.c: New test.
* gcc.target/riscv/target-attr-12.c: New test.
* gcc.target/riscv/target-attr-13.c: New test.
* gcc.target/riscv/target-attr-14.c: New test.
* gcc.target/riscv/target-attr-15.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
RISC-V: Attribute parser: Use alloca() instead of new + std::unique_ptr
Allocating an object on the heap with new, wrapping it in a
std::unique_ptr and finally getting the buffer via buf.get()
is a correct way to allocate a buffer that is automatically
freed on return. However, a simple invocation of alloca()
does the same with less overhead.
gcc/ChangeLog:
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Replace new + std::unique_ptr by alloca().
(riscv_process_one_target_attr): Likewise.
(riscv_process_target_attr): Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
[i386] adjust flag_omit_frame_pointer in a single function [PR113719]
The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target. The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:
- ix86_recompute_optlev_based_flags computes and sets a
-f[no-]omit-frame-pointer default depending on
USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size
- ix86_option_override_internal enables flag_omit_frame_pointer for
-momit-leaf-frame-pointer to take effect
ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer. It is called during global process_options.
But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes. If they
differ, the testcase fails.
In order to fix this, we need to process all overriders of this option
whenever we process any of them. Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.
Edwin Lu [Fri, 12 Jul 2024 18:31:16 +0000 (11:31 -0700)]
RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark
The following testcase was not properly testing anything due to an
uninitialized variable. As a result, the loop was not iterating through
the testing data, but instead on undefined values which could cause an
unexpected abort.