Jan Hubicka [Thu, 6 Jul 2023 14:19:15 +0000 (16:19 +0200)]
updat_bb_profile_for_threading TLC
Apply some TLC to update_bb_profile_for_threading. The function resales
probabilities by:
FOR_EACH_EDGE (c, ei, bb->succs)
c->probability /= prob;
which is correct but in case prob is 0 (took all execution counts to the newly
constructed path), this leads to undefined results which do not sum to 100%.
In several other plpaces we need to change probability of one edge and rescale
remaining to sum to 100% so I decided to break this off to helper function
set_edge_probability_and_rescale_others
For jump threading the probability of edge is always reduced, so division is right
update, however in general case we also may want to increase probability of the edge
which needs different scalling. This is bit hard to do staying with probabilities
in range 0...1 for all temporaries.
For this reason I decided to add profile_probability::apply_scale which is symmetric
to what we already have in profile_count::apply_scale and does right thing in
both directions.
Finally I added few early exits so we do not produce confused dumps when
profile is missing and special case the common situation where edges out of BB
are precisely two. In this case we can set the other edge to inverter probability
which. Saling drop probability quality from PRECISE to ADJUSTED.
Bootstrapped/regtested x86_64-linux. The patch has no effect on in count mismatches
in tramp3d build and improves out-count. Will commit it shortly.
gcc/ChangeLog:
* cfg.cc (set_edge_probability_and_rescale_others): New function.
(update_bb_profile_for_threading): Use it; simplify the rest.
* cfg.h (set_edge_probability_and_rescale_others): Declare.
* profile-count.h (profile_probability::apply_scale): New.
Richard Biener [Thu, 6 Jul 2023 11:51:55 +0000 (13:51 +0200)]
tree-optimization/110556 - tail merging still pre-tuples
The stmt comparison function for GIMPLE_ASSIGNs for tail merging
still looks like it deals with pre-tuples IL. The following
attempts to fix this, not only comparing the first operand (sic!)
of stmts but all of them plus also compare the operation code.
PR tree-optimization/110556
* tree-ssa-tail-merge.cc (gimple_equal_p): Check
assign code and all operands of non-stores.
Claire Dross [Thu, 15 Jun 2023 14:22:11 +0000 (16:22 +0200)]
ada: Refactor the proof of the Value and Image runtime units
The aim of this refactoring is to avoid unnecessary dependencies
between Image and Value units even though they share the same
specification functions. These functions are grouped inside ghost
packages which are then withed by Image and Value units.
gcc/ada/
* libgnat/s-vs_int.ads: Instance of Value_I_Spec for Integer.
* libgnat/s-vs_lli.ads: Instance of Value_I_Spec for
Long_Long_Integer.
* libgnat/s-vsllli.ads: Instance of Value_I_Spec for
Long_Long_Long_Integer.
* libgnat/s-vs_uns.ads: Instance of Value_U_Spec for Unsigned.
* libgnat/s-vs_llu.ads: Instance of Value_U_Spec for
Long_Long_Unsigned.
* libgnat/s-vslllu.ads: Instance of Value_U_Spec for
Long_Long_Long_Unsigned.
* libgnat/s-imagei.ads: Take instances of Value_*_Spec as
parameters.
* libgnat/s-imagei.adb: Idem.
* libgnat/s-imageu.ads: Idem.
* libgnat/s-imageu.adb: Idem.
* libgnat/s-valuei.ads: Idem.
* libgnat/s-valuei.adb: Idem.
* libgnat/s-valueu.ads: Idem.
* libgnat/s-valueu.adb: Idem.
* libgnat/s-imgint.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-imglllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imgllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-imguns.ads: Adapt instance to new ghost parameters.
* libgnat/s-valint.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallli.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllli.ads: Adapt instance to new ghost parameters.
* libgnat/s-vallllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valllu.ads: Adapt instance to new ghost parameters.
* libgnat/s-valuns.ads: Adapt instance to new ghost parameters.
* libgnat/s-vaispe.ads: Take instance of Value_U_Spec as parameter
and remove unused declaration.
* libgnat/s-vaispe.adb: Idem.
* libgnat/s-vauspe.ads: Remove unused declaration.
* libgnat/s-valspe.ads: Factor out the specification part of
Val_Util.
* libgnat/s-valspe.adb: Idem.
* libgnat/s-valuti.ads: Move specification to Val_Spec.
* libgnat/s-valuti.adb: Idem.
* libgnat/s-valboo.ads: Use Val_Spec.
* libgnat/s-valboo.adb: Idem.
* libgnat/s-imgboo.adb: Idem.
* libgnat/s-imagef.adb: Adapt instances to new ghost parameters.
* Makefile.rtl: List new files.
Viljar Indus [Wed, 21 Jun 2023 13:22:37 +0000 (16:22 +0300)]
ada: Evaluate static expressions in Range attributes
Gigi assumes that the value of range expressions is an integer literal.
Force evaluation of such expressions since static non-literal expressions
are not always evaluated to a literal form by gnat.
gcc/ada/
* sem_attr.adb (analyze_attribute.check_array_type): Replace valid
indexes with their staticly evaluated values.
Viljar Indus [Tue, 20 Jun 2023 14:29:41 +0000 (17:29 +0300)]
ada: Refer to non-Ada binding limitations in user guide
The limitation of resetting the FPU mode for non 80-bit
precision was not referenced from "Creating a Stand-alone
Library to be used in a non-Ada context". Reference it the same
way it is already referenced from "Interfacing to C".
gcc/ada/
* doc/gnat_ugn/the_gnat_compilation_model.rst: Reference "Binding
with Non-Ada Main Programs" from "Creating a Stand-alone Library
to be used in a non-Ada context".
* gnat_ugn.texi: Regenerate.
Viljar Indus [Wed, 14 Jun 2023 20:19:49 +0000 (23:19 +0300)]
ada: Avoid crash in Find_Optional_Prim_Op
Find_Optional_Prim_Op can crash when the Underlying_Type is Empty.
This can happen when you are dealing with a structure type with a
private part that does not have its Full_View set yet.
gcc/ada/
* exp_util.adb (Find_Optional_Prim_Op): Stop deriving primitive
operation if there is no underlying type to derive it from.
Steve Baird [Tue, 6 Jun 2023 19:44:00 +0000 (12:44 -0700)]
ada: Finalization not performed for component of protected type
In some cases involving a discriminated protected type with an array
component that is subject to a discriminant-dependent index constraint,
where the element type of the array requires finalization and the array
type has not yet been frozen at the point of the declaration of the protected
type, finalization of an object of the protected type may incorrectly omit
finalization of the array component. One case where this scenario can arise
is an instantiation of Ada.Containers.Bounded_Synchronized_Queues, passing in
an Element type that requires finalization.
gcc/ada/
* exp_ch7.adb (Make_Final_Call): Add assertion that if no
finalization call is generated, then the type of the object being
finalized does not require finalization.
* freeze.adb (Freeze_Entity): If freezing an already-frozen
subtype, do not assume that nothing needs to be done. In the case
of a frozen subtype of a non-frozen type or subtype (which is
possible), freeze the non-frozen entity.
The following consolidates an assert that now hits for ppc64le
with an earlier check we already do, simplifying
vect_determine_partial_vectors_and_peeling and getting rid of
its now redundant argument.
PR tree-optimization/110563
* tree-vectorizer.h (vect_determine_partial_vectors_and_peeling):
Remove second argument.
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Remove for_epilogue_p argument. Merge assert ...
(vect_analyze_loop_2): ... with check done before determining
partial vectors by moving it after.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust.
Thomas Schwinge [Wed, 5 Jul 2023 13:34:56 +0000 (15:34 +0200)]
GGC, GTY: Tighten up a few things re 'reorder' option and strings
..., which doesn't make sense in combination.
This, again, is primarily preparational for another change.
gcc/
* ggc-common.cc (gt_pch_note_reorder, gt_pch_save): Tighten up a
few things re 'reorder' option and strings.
* stringpool.cc (gt_pch_p_S): This is now 'gcc_unreachable'.
Thomas Schwinge [Tue, 4 Jul 2023 20:47:48 +0000 (22:47 +0200)]
GTY: Clean up obsolete parametrized structs remnants
Support removed in 2014 with
commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators".
Thomas Schwinge [Tue, 4 Jul 2023 20:47:48 +0000 (22:47 +0200)]
GTY: Clean up obsolete 'bool needs_cast_p' field of 'gcc/gengtype.cc:struct walk_type_data'
Last use disappeared in 2014 with
commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators".
gcc/
* gengtype.cc (struct walk_type_data): Remove 'needs_cast_p'.
Adjust all users.
--- gcc/ggc-tests.cc
+++ gcc/ggc-tests.cc
@@ -258 +258 @@ class GTY((tag("1"))) some_subclass : public example_base
-class GTY((tag("2"))) some_other_subclass : public example_base
+class GTY((tag(user))) some_other_subclass : public example_base
@@ -384 +384 @@ test_chain_next ()
-struct GTY((user)) user_struct
+struct GTY((user user)) user_struct
..., we get unexpected "have a param<N>_is option" diagnostics:
[...]
build/gengtype \
-S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state
[...]/source-gcc/gcc/ggc-tests.cc:258: parse error: expected a string constant, have a param<N>_is option
[...]/source-gcc/gcc/ggc-tests.cc:384: parse error: expected ')', have a param<N>_is option
make[2]: *** [Makefile:2888: s-gtype] Error 1
[...]
This traces back to 2012 "Support garbage-collected C++ templates", which got
incorporated in commit 0823efedd0fb8669b7e840954bc54c3b2cf08d67
(Subversion r190402), which did add 'USER_GTY' to what nowadays is known as
'enum gty_token', but didn't accordingly update
'gcc/gengtype-parse.c:token_names', leaving those out of sync. Updating
'gcc/gengtype-parse.c:token_value_format' wasn't necessary, as:
/* print_token assumes that any token >= FIRST_TOKEN_WITH_VALUE may have
a meaningful value to be printed. */
FIRST_TOKEN_WITH_VALUE = PARAM_IS
This, in turn, got further confused -- or "fixed" -- by later changes:
2014 commit 63f5d5b818319129217e41bcb23db53f99ff11b0 (Subversion r218558)
"remove gengtype support for param_is use_param, if_marked and splay tree allocators",
which reciprocally missed corresponding clean-up.
With that addressed via adding the missing '"user"' to 'token_names', and,
until that is properly fixed, a temporary 'UNUSED_PARAM_IS' (re-)added for use
with 'FIRST_TOKEN_WITH_VALUE', we then get the expected:
[...]/source-gcc/gcc/ggc-tests.cc:258: parse error: expected a string constant, have 'user'
[...]/source-gcc/gcc/ggc-tests.cc:384: parse error: expected ')', have 'user'
gcc/
* gengtype-parse.cc (token_names): Add '"user"'.
* gengtype.h (gty_token): Add 'UNUSED_PARAM_IS' for use with
'FIRST_TOKEN_WITH_VALUE'.
Thomas Schwinge [Wed, 5 Jul 2023 06:38:49 +0000 (08:38 +0200)]
GTY: Enhance 'string_length' option documentation
We're (currently) not aware of any actual use of 'ht_identifier's with NUL
characters embedded; its 'len' field appears to exist for optimization
purposes, since "forever". Before 'struct ht_identifier' was added in
commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we had in
'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or earlier 'length',
earlier in 'gcc/cpphash.h:struct hashnode': 'unsigned short length', earlier
'size_t length' with comment: "length of token, for quick comparison", earlier
'int length', ever since the 'gcc/cpp*' files were added in
commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191).
--- gcc/varasm.cc
+++ gcc/varasm.cc
@@ -66 +66 @@ along with GCC; see the file COPYING3. If not see
-extern GTY(()) const char *first_global_object_name;
+extern GTY((string_length("strlen(%h.first_global_object_name) + 1"))) const char *first_global_object_name;
..., we get:
[...]
build/gengtype \
-S [...]/source-gcc/gcc -I gtyp-input.list -w tmp-gtype.state
/bin/sh [...]/source-gcc/gcc/../move-if-change tmp-gtype.state gtype.state
build/gengtype \
-r gtype.state
[...]/source-gcc/gcc/varasm.cc:66: global `first_global_object_name' has unknown option `string_length'
[...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: field `hex_str' of global `lazy_hex_fp_values[0]' has unknown option `string_length'
make[2]: *** [Makefile:2890: s-gtype] Error 1
[...]
These errors occur when writing "GC roots", where -- per my understanding --
'string_length' isn't relevant for actual GC purposes. However, like
elsewhere, it is for PCH purposes, and simply accepting 'string_length' here
isn't sufficient: we'll still get '(gt_pointer_walker) >_pch_n_S' used in the
'struct ggc_root_tab' instances, and there's no easy way to change that to
instead use 'gt_pch_n_S2' with explicit 'size_t string_len' argument. (At
least not sufficiently easy to justify spending any further time on, given that
I don't have an actual use for that feature.)
So, until an actual need arises, and/or to avoid the next person looking into
this having to figure out the same thing again, let's just document this
limitation:
[...]/source-gcc/gcc/varasm.cc:66: option `string_length' not supported for global `first_global_object_name'
[...]/source-gcc/gcc/c-family/c-cppbuiltin.cc:1789: option `string_length' not supported for field `hex_str' of global `lazy_hex_fp_values[0]'
Roger Sayle [Thu, 6 Jul 2023 08:58:17 +0000 (09:58 +0100)]
[Committed] Handle COPYSIGN in dwarf2out.cc's mem_loc_descriptor.
Many thanks to Hans-Peter Nilsson for reminding me that new RTX codes
need to be added to dwarf2out.cc's mem_loc_descriptor, and for doing
this for BITREVERSE. This patch does the same for the recently added
COPYSIGN. I'd been testing these on a target that doesn't use DWARF
(nvptx-none) and so didn't exhibit the issue, and my additional testing
on x86_64-pc-linux-gnu to double check that changes were safe, doesn't
(yet) trigger the problematic assert in dwarf2out.cc's mem_loc_descriptor.
2023-07-06 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener [Wed, 5 Jul 2023 13:57:49 +0000 (15:57 +0200)]
tree-optimization/110515 - wrong code with LIM + PRE
In this PR we face the issue that LIM speculates a load when
hoisting it out of the loop (since it knows it cannot trap).
Unfortunately this exposes undefined behavior when the load
accesses memory with the wrong dynamic type. This later
makes PRE use that representation instead of the original
which accesses the same memory location but using a different
dynamic type leading to a wrong disambiguation of that
original access against another and thus a wrong-code transform.
Fortunately there already is code in PRE dealing with a similar
situation for code hoisting but that left a small gap which
when fixed also fixes the wrong-code transform in this bug even
if it doesn't address the underlying issue of LIM speculating
that load.
The upside is this fix is trivially safe to backport and chances
of code generation regressions are very low.
PR tree-optimization/110515
* tree-ssa-pre.cc (compute_avail): Make code dealing
with hoisting loads with different alias-sets more
robust.
Richard Biener [Thu, 6 Jul 2023 06:52:46 +0000 (08:52 +0200)]
Fix expectation on gcc.dg/vect/pr71264.c
With the recent change to more reliably not vectorize code already
using vector types we run into FAILs of gcc.dg/vect/pr71264.c
The testcase was added for fixing an ICE and possible (re-)vectorization
of the code isn't really supported and I suspect might even go
wrong for non-bitops.
The following leaves the testcase as just testing for an ICE.
PR tree-optimization/110544
* gcc.dg/vect/pr71264.c: Remove scan for vectorization.
Hongyu Wang [Fri, 30 Jun 2023 01:44:56 +0000 (09:44 +0800)]
i386: Inline function with default arch/tune to caller
For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to allow callee with default arch/tune to be inlined.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_can_inline_p): If callee has
default arch=x86-64 and tune=generic, do not block the
inlining to its caller. Also allow callee with different
arch= to be inlined if it has always_inline attribute and
it's ISA is subset of caller's.
gcc/testsuite/ChangeLog:
* gcc.target/i386/inline_attr_arch.c: New test.
* gcc.target/i386/inline_target_clones.c: Ditto.
So the problem is vector generic decided to do comparisons in signed-boolean:32
types but the rest of the middle-end was not ready for that. Since we are building
the comparison which will feed into a cond_expr here, using boolean_type_node is
better and also correct. The rest of the compiler thinks the ranges for
comparison is always [0,1] too.
Note this code does not currently lowers bigger vector sizes into smaller
vector sizes so using boolean_type_node here is better.
OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR middle-end/110554
* tree-vect-generic.cc (expand_vector_condition): For comparisons,
just build using boolean_type_node instead of the cond_type.
For non-comparisons/non-scalar-bitmask, build a ` != 0` gimple
that will feed into the COND_EXPR.
rax is used to save and restore DFmode value. In RA both GENERAL_REGS
and SSE_REGS cost zero since we didn't disparage the
alternative in movdf_internal pattern, according to register
allocation order, GENERAL_REGS is allocated. The patch add ? for
alternative (r,v) and (v,r) just like we did for movsf/hf/bf_internal
pattern, after that we get optimal RA.
PR106907 has few warnings spotted from cppcheck. In that addressing
redundant initialization issue. Here the initialized value of 'new_addr'
was overwritten before it was read. Updated the source by removing the
unnecessary initialization of 'new_addr'.
Hao Liu [Thu, 6 Jul 2023 02:03:47 +0000 (10:03 +0800)]
tree-optimization/110474 - Vect: select small VF for epilog of unrolled loop
If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1),
the VFs of both main and epilog loop are enlarged. The epilog vect loop is
specific for a loop with small iteration counts, so a large VF may hurt
performance.
This patch unscales the main loop VF by suggested_unroll_factor while selecting
the epilog loop VF, so that it will be the same as vectorized loop without
unrolling (i.e. suggested_unroll_factor = 1).
gcc/ChangeLog:
PR tree-optimization/110474
* tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested
unroll factor while selecting the epilog vect loop VF.
Andrew MacLeod [Wed, 5 Jul 2023 17:36:27 +0000 (13:36 -0400)]
Simplify compute_operand_range for op1 and op2 case.
Move the check for co-dependency between 2 operands into
compute_operand_range, resulting in a much cleaner
compute_operand1_and_operand2_range routine.
* gimple-range-gori.cc (compute_operand_range): Check for
operand interdependence when both op1 and op2 are computed.
(compute_operand1_and_operand2_range): No checks required now.
Andrew MacLeod [Tue, 4 Jul 2023 15:28:52 +0000 (11:28 -0400)]
Move relation discovery into compute_operand_range
compute_operand1_range and compute_operand2_range were both doing
relation discovery between the 2 operands... move it into a common area.
* gimple-range-gori.cc (compute_operand_range): Check for
a relation between op1 and op2 and use that instead.
(compute_operand1_range): Don't look for a relation override.
(compute_operand2_range): Ditto.
Filip Kastl [Wed, 5 Jul 2023 15:36:02 +0000 (17:36 +0200)]
value-prof.cc: Correct edge prob calculation.
The mod-subtract optimization with ncounts==1 produced incorrect edge
probabilities due to incorrect conditional probability calculation. This
patch fixes the calculation.
Signed-off-by: Filip Kastl <filip.kastl@gmail.com>
gcc/ChangeLog:
sched: Change return type of predicate functions from int to bool
Also change some internal variables to bool.
gcc/ChangeLog:
* sched-int.h (struct haifa_sched_info): Change can_schedule_ready_p,
scehdule_more_p and contributes_to_priority indirect frunction
type from int to bool.
(no_real_insns_p): Change return type from int to bool.
(contributes_to_priority): Ditto.
* haifa-sched.cc (no_real_insns_p): Change return type from
int to bool and adjust function body accordingly.
* modulo-sched.cc (try_scheduling_node_in_cycle): Change "success"
variable type from int to bool.
(ps_insn_advance_column): Change return type from int to bool.
(ps_has_conflicts): Ditto. Change "has_conflicts"
variable type from int to bool.
* sched-deps.cc (deps_may_trap_p): Change return type from int to bool.
(conditions_mutex_p): Ditto.
* sched-ebb.cc (schedule_more_p): Ditto.
(ebb_contributes_to_priority): Change return type from
int to bool and adjust function body accordingly.
* sched-rgn.cc (is_cfg_nonregular): Ditto.
(check_live_1): Ditto.
(is_pfree): Ditto.
(find_conditional_protection): Ditto.
(is_conditionally_protected): Ditto.
(is_prisky): Ditto.
(is_exception_free): Ditto.
(haifa_find_rgns): Change "unreachable" and "too_large_failure"
variables from int to bool.
(extend_rgns): Change "rescan" variable from int to bool.
(check_live): Change return type from
int to bool and adjust function body accordingly.
(can_schedule_ready_p): Ditto.
(schedule_more_p): Ditto.
(contributes_to_priority): Ditto.
Robin Dapp [Wed, 28 Jun 2023 13:48:55 +0000 (15:48 +0200)]
gimple-isel: Recognize vec_extract pattern.
In gimple-isel we already deduce a vec_set pattern from an
ARRAY_REF(VIEW_CONVERT_EXPR). This patch does the same for a
vec_extract.
The code is largely similar to the vec_set one
including the addition of a can_vec_extract_var_idx_p function
in optabs.cc to check if the backend can handle a register
operand as index. We already have can_vec_extract in
optabs-query but that one checks whether we can extract
specific modes.
With the introduction of an internal function for vec_extract
the expander must not FAIL. For vec_set this has already been
the case so adjust the documentation accordingly.
Additionally, clarify the wording of the vector-vector case for
vec_extract.
gcc/ChangeLog:
* doc/md.texi: Document that vec_set and vec_extract must not
fail.
* gimple-isel.cc (gimple_expand_vec_set_expr): Rename this...
(gimple_expand_vec_set_extract_expr): ...to this.
(gimple_expand_vec_exprs): Call renamed function.
* internal-fn.cc (vec_extract_direct): Add.
(expand_vec_extract_optab_fn): New function to expand
vec_extract optab.
(direct_vec_extract_optab_supported_p): Add.
* internal-fn.def (VEC_EXTRACT): Add.
* optabs.cc (can_vec_extract_var_idx_p): New function.
* optabs.h (can_vec_extract_var_idx_p): Declare.
Pan Li [Tue, 4 Jul 2023 12:26:11 +0000 (20:26 +0800)]
RISC-V: Use FRM_DYN when add the rounding mode operand
This patch would like to take FRM_DYN const rtx as the rounding mode
operand according to the RVV spec, which takes the dyn as the only
rounding mode for floating-point.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_exact_insn): Use FRM_DYN instead of const0. Signed-off-by: Pan Li <pan2.li@intel.com>
Robin Dapp [Wed, 5 Jul 2023 12:42:21 +0000 (14:42 +0200)]
RISC-V: Change truncate to float_truncate in narrowing patterns.
This fixes a bug in the autovect FP narrowing patterns which resulted in
a combine ICE. It would try to e.g. simplify a unary operation by
simplify_const_unary_operation which obviously expects a float_truncate
and not a truncate for a floating-point mode.
VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer
Hi, Richard and Richi.
Address comments from Richi.
Make gs_info.ifn = LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE.
I have fully tested these 4 format:
length = vf is a dummpy length,
mask = {-1,-1, ... } is a dummy mask.
1. no length, no mask
LEN_MASK_GATHER_LOAD (..., length = vf, mask = {-1,-1,...})
2. exist length, no mask
LEN_MASK_GATHER_LOAD (..., len, mask = {-1,-1,...})
3. exist mask, no length
LEN_MASK_GATHER_LOAD (..., length = vf, mask)
4. both mask and length exist
LEN_MASK_GATHER_LOAD (..., length, mask)
All of these work fine in this patch.
Here is the example:
void
f (int *restrict a,
int *restrict b, int n,
int base, int step,
int *restrict cond)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i * 4] = b[i];
}
}
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
YunQiang Su [Wed, 31 May 2023 09:55:50 +0000 (17:55 +0800)]
MIPS: Use unaligned access to expand block_move on r6
MIPSr6 support unaligned memory access with normal lh/sh/lw/sw/ld/sd
instructions, and thus lwl/lwr/ldl/ldr and swl/swr/sdl/sdr is removed.
For microarchitecture, these memory access instructions issue 2
operation if the address is not aligned, which is like what lwl family
do.
For some situation (such as accessing boundary of pages) on some
microarchitectures, the unaligned access may not be good enough,
then the kernel should trap&emu it: the kernel may need
-mno-unalgined-access option.
gcc/
* config/mips/mips.cc (mips_expand_block_move): don't expand for
r6 with -mno-unaligned-access option if one or both of src and
dest are unaligned. restruct: return directly if length is not const.
(mips_block_move_straight): emit_move if ISA_HAS_UNALIGNED_ACCESS.
gcc/testsuite/
* gcc.target/mips/expand-block-move-r6-no-unaligned.c: new test.
* gcc.target/mips/expand-block-move-r6.c: new test.
Richard Biener [Wed, 5 Jul 2023 07:59:44 +0000 (09:59 +0200)]
adjust testcase for now happening epilogue vectorization
gcc.dg/vect/slp-perm-9.c is reported to FAIL with -march=cascadelake
now which is because we now vectorize the epilogue with V2HImode
vectors after the recent change to not scrap too large vector
epilogues during transform but during analysis time.
The following adjusts the testcase to always use the existing alternate
N which avoids epilogue vectorization.
* gcc.dg/vect/slp-perm-9.c: Always use alternate N.
Jan Beulich [Wed, 5 Jul 2023 07:52:41 +0000 (09:52 +0200)]
x86: suppress avx512f-copysign.c testcase for 32-bit
The test installed by "x86: make VPTERNLOG* usable on less than 512-bit
operands with just AVX512F" won't succeed on 32-bit, for floating point
operations being done there (by default) without using SIMD insns.
gcc/testsuite/
* gcc.target/i386/avx512f-copysign.c: Suppress for 32-bit.
Jan Beulich [Wed, 5 Jul 2023 07:49:16 +0000 (09:49 +0200)]
x86: yet more PR target/100711-like splitting
Following two-operand bitwise operations, add another splitter to also
deal with not followed by broadcast all on its own, which can be
expressed as simple embedded broadcast instead once a broadcast operand
is actually permitted in the respective insn. While there also permit
a broadcast operand in the corresponding expander.
gcc/
PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate as a singular vpternlog.
(one_cmpl<mode>2): Allow broadcast for operand 1.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Likewise.
gcc/testsuite/
PR target/100711
* gcc.target/i386/pr100711-6.c: New test.
Jan Beulich [Wed, 5 Jul 2023 07:48:47 +0000 (09:48 +0200)]
x86: further PR target/100711-like splitting
With respective two-operand bitwise operations now expressable by a
single VPTERNLOG, add splitters to also deal with ior and xor
counterparts of the original and-only case. Note that the splitters need
to be separate, as the placement of "not" differs in the final insns
(*iornot<mode>3, *xnor<mode>3) which are intended to pick up one half of
the result.
gcc/
PR target/100711
* config/i386/sse.md: New splitters to simplify
not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}.
gcc/testsuite/
PR target/100711
* gcc.target/i386/pr100711-4.c: New test.
* gcc.target/i386/pr100711-5.c: New test.
Richard Biener [Wed, 5 Jul 2023 06:53:01 +0000 (08:53 +0200)]
middle-end/110541 - VEC_PERM_EXPR documentation is off
The following adjusts the tree.def documentation about VEC_PERM_EXPR
which wasn't adjusted when the restrictions of permutes with constant
mask were relaxed.
PR middle-end/110541
* tree.def (VEC_PERM_EXPR): Adjust documentation to reflect
reality.
Jan Beulich [Wed, 5 Jul 2023 07:41:09 +0000 (09:41 +0200)]
x86: use VPTERNLOG also for certain andnot forms
When it's the memory operand which is to be inverted, using VPANDN*
requires a further load instruction. The same can be achieved by a
single VPTERNLOG*. Add two new alternatives (for plain memory and
embedded broadcast), adjusting the predicate for the first operand
accordingly.
Two pre-existing testcases actually end up being affected (improved) by
the change, which is reflected in updated expectations there.
gcc/
PR target/93768
* config/i386/sse.md (*andnot<mode>3): Add new alternatives
for memory form operand 1.
gcc/testsuite/
PR target/93768
* gcc.target/i386/avx512f-andn-di-zmm-2.c: New test.
* gcc.target/i386/avx512f-andn-si-zmm-2.c: Adjust expecations
towards generated code.
* gcc.target/i386/pr100711-3.c: Adjust expectations for 32-bit
code.
* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.
Andrew Pinski [Sat, 1 Jul 2023 02:22:48 +0000 (19:22 -0700)]
PR 110487: `(a !=/== CST1 ? CST2 : CST3)` pattern for type safety
The problem here is we might produce some values out of the type's
min/max (and/or valid values, e.g. signed booleans). The fix is to
use an integer type which has the same precision and signedness
as the original type.
Note two_value_replacement in phiopt had the same issue in previous
versions; though I don't know if a problem will show up there.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110487
* match.pd (a !=/== CST1 ? CST2 : CST3): Always
build a nonstandard integer and use that.
Andrew Pinski [Sat, 1 Jul 2023 00:50:08 +0000 (17:50 -0700)]
Fix PR 110487: invalid signed boolean value
This fixes the first part of this bug where `a ? -1 : 0`
would cause a value of 1 into the signed boolean value.
It fixes the problem by casting to an integer type of
the same size/signedness before doing the negative and
then casting to the type of expression.
OK? Bootstrapped and tested on x86_64.
gcc/ChangeLog:
* match.pd (a?-1:0): Cast type an integer type
rather the type before the negative.
(a?0:-1): Likewise.
* config/xtensa/xtensa.cc (machine_function, xtensa_expand_prologue):
Change to use HARD_REG_BIT and its macros.
* config/xtensa/xtensa.md
(peephole2: regmove elimination during DFmode input reload):
Likewise.
Richard Biener [Tue, 4 Jul 2023 10:52:27 +0000 (12:52 +0200)]
tree-optimization/110491 - PHI-OPT and undefs
The following makes sure to not make conditional undefs in PHI arguments
unconditional by folding cond ? arg1 : arg2.
PR tree-optimization/110491
* tree-ssa-phiopt.cc (match_simplify_replacement): Check
whether the PHI args are possibly undefined before folding
the COND_EXPR.
Pan Li [Wed, 21 Jun 2023 07:58:24 +0000 (15:58 +0800)]
Streamer: Fix out of range memory access of machine mode
We extend the machine mode from 8 to 16 bits already. But there still
one placing missing from the streamer. It has one hard coded array
for the machine code like size 256.
In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
value of the MAX_MACHINE_MODE will grow as more and more modes are
added. While the machine mode array in tree-streamer still leave 256 as is.
Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
lto_output_init_mode_table will touch the memory out of range unexpected.
This patch would like to take the MAX_MACHINE_MODE as the size of the
array in streamer, to make sure there is no potential unexpected
memory access in future. Meanwhile, this patch also adjust some place
which has MAX_MACHINE_MODE <= 256 assumption.
Care is taken that for offload compilation, we interpret the stream-in
data in terms of the host 'MAX_MACHINE_MODE' ('file_data->mode_bits'),
which very likely is different from the offload device
'MAX_MACHINE_MODE'.
gcc/
* lto-streamer-in.cc (lto_input_mode_table): Stream in the mode
bits for machine mode table.
* lto-streamer-out.cc (lto_write_mode_table): Stream out the
HOST machine mode bits.
* lto-streamer.h (struct lto_file_decl_data): New fields mode_bits.
* tree-streamer.cc (streamer_mode_table): Take MAX_MACHINE_MODE
as the table size.
* tree-streamer.h (streamer_mode_table): Ditto.
(bp_pack_machine_mode): Take 1 << ceil_log2 (MAX_MACHINE_MODE)
as the packing limit.
(bp_unpack_machine_mode): Ditto with 'file_data->mode_bits'.
gcc/lto/
* lto-common.cc (lto_file_finalize) [!ACCEL_COMPILER]: Initialize
'file_data->mode_bits'.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
Thomas Schwinge [Thu, 29 Jun 2023 19:33:06 +0000 (21:33 +0200)]
LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'
... instead of just 'unsigned char *mode_table'. Preparation for a forthcoming
change, where we need to capture an additional 'file_data' item, so it seems
easier to just capture that one proper.
Richard Biener [Tue, 4 Jul 2023 08:46:35 +0000 (10:46 +0200)]
Use mark_ssa_maybe_undefs in PHI-OPT
The following removes gimple_uses_undefined_value_p and instead
uses the conservative mark_ssa_maybe_undefs in PHI-OPT, the last
user of the other API.
* tree-ssa-phiopt.cc (pass_phiopt::execute): Mark SSA undefs.
(empty_bb_or_one_feeding_into_p): Check for them.
* tree-ssa.h (gimple_uses_undefined_value_p): Remove.
* tree-ssa.cc (gimple_uses_undefined_value_p): Likewise.
Hao Liu [Tue, 4 Jul 2023 09:17:50 +0000 (17:17 +0800)]
PR tree-optimization/110531 - Vect: avoid using uninitialized variable
slp_done_for_suggested_uf is used directly in vect_analyze_loop_2
without initialization, which is undefined behavior. Initialize it to false
according to the discussion.
gcc/ChangeLog:
PR tree-optimization/110531
* tree-vect-loop.cc (vect_analyze_loop_1): initialize
slp_done_for_suggested_uf to false.
Richard Biener [Tue, 4 Jul 2023 08:29:26 +0000 (10:29 +0200)]
tree-optimization/110228 - avoid undefs in ifcombine more thoroughly
The following replaces the simplistic gimple_uses_undefined_value_p
with the conservative mark_ssa_maybe_undefs approach as already
used by LIM and IVOPTs. This is to avoid exposing an unconditional
uninitialized read on a path from entry by if-combine.
PR tree-optimization/110228
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute):
Mark SSA may-undefs.
(bb_no_side_effects_p): Check stmt uses for undefs.
* gcc.dg/torture/pr110228.c: New testcase.
* gcc.dg/uninit-pr101912.c: Un-XFAIL.
Richard Biener [Tue, 4 Jul 2023 07:51:05 +0000 (09:51 +0200)]
tree-optimization/110436 - bogus live/relevant for unused pattern
When we compute liveness and relevantness we have to make sure to
handle live but not relevant stmts in a way we can later vectorize
them. When the stmt uses only operands that do not need vectorization
we can just leave such stmts in place - but not in the case they
are recognized as patterns. Since we don't have a way to cancel
pattern recognition we have to force mark such stmts as relevant.
PR tree-optimization/110436
* tree-vect-stmts.cc (vect_mark_relevant): Expand dumping,
force live but not relevant pattern stmts relevant.
Eric Botcazou [Mon, 26 Jun 2023 18:33:53 +0000 (20:33 +0200)]
ada: Do not unnecessarily use component-wise loop for slice assignment
This relaxes the condition under which Expand_Assign_Array leaves the
assignment to or from an array slice untouched. The main prerequisite
for the code generator is that everything be aligned on byte boundaries
and Is_Possibly_Unaligned_Slice is too strong a predicate for this, so
it is replaced by the combination of Possible_Bit_Aligned_Component and
Is_Bit_Packed_Array, modulo a change to Possible_Bit_Aligned_Component
to take into account the specific case of slices.
gcc/ada/
* exp_ch5.adb (Expand_Assign_Array): Adjust comment above the
calls to Possible_Bit_Aligned_Component on the LHS and RHS. Do not
call Is_Possibly_Unaligned_Slice in the slice case.
* exp_util.ads (Component_May_Be_Bit_Aligned): Add For_Slice
boolean parameter.
(Possible_Bit_Aligned_Component): Likewise.
* exp_util.adb (Component_May_Be_Bit_Aligned): Do not return False
for the slice of a small record or bit-packed array component.
(Possible_Bit_Aligned_Component): Pass For_Slice in recursive
calls, except in the slice case where True is passed, as well as
in call to Component_May_Be_Bit_Aligned.
Eric Botcazou [Sat, 24 Jun 2023 17:30:55 +0000 (19:30 +0200)]
ada: Small adjustments to new procedure Expand_Unchecked_Union_Equality
The procedure is not stable under repeated invocation. Now it may be called
twice on the same node, for example during the expansion of the renaming of
the predefined equality operator after the unchecked union type is frozen.
gcc/ada/
* exp_ch4.ads (Expand_Unchecked_Union_Equality): Only take a
single parameter.
* exp_ch4.adb (Expand_Unchecked_Union_Equality): Add guard against
repeated invocation on the same node.
* exp_ch6.adb (Expand_Call): Only pass a single actual parameter
in the call to Expand_Unchecked_Union_Equality.
Yannick Moy [Tue, 20 Jun 2023 13:30:35 +0000 (15:30 +0200)]
ada: Fix list of inherited subprograms in query for GNATprove
The query Inherited_Subprograms was returning a list containing
some subprograms whose overridding was also in the list, when
interfaces was present. This was an issue for GNATprove. Now propose
a mode for this function to filter out overridden primitives.
gcc/ada/
* sem_disp.adb (Inherited_Subprograms): Add parameter to filter
out results.
* sem_disp.ads: Likewise.
Richard Biener [Mon, 3 Jul 2023 08:28:10 +0000 (10:28 +0200)]
middle-end/110495 - avoid associating constants with (VL) vectors
When trying to associate (v + INT_MAX) + INT_MAX we are using
the TREE_OVERFLOW bit to check for correctness. That isn't
working for VECTOR_CSTs and it can't in general when one considers
VL vectors. It looks like it should work for COMPLEX_CSTs but
I didn't try to single out _Complex int in this change.
The following makes sure that for vectors we use the fallback of
using unsigned arithmetic when associating the above to
v + (INT_MAX + INT_MAX).
PR middle-end/110495
* tree.h (TREE_OVERFLOW): Do not mention VECTOR_CSTs
since we do not set TREE_OVERFLOW on those since the
introduction of VL vectors.
* match.pd (x +- CST +- CST): For VECTOR_CST do not look
at TREE_OVERFLOW to determine validity of association.
Richard Biener [Mon, 3 Jul 2023 11:59:33 +0000 (13:59 +0200)]
tree-optimization/110310 - move vector epilogue disabling to analysis phase
The following removes late deciding to elide vectorized epilogues to
the analysis phase and also avoids altering the epilogues niter.
The costing part from vect_determine_partial_vectors_and_peeling is
moved to vect_analyze_loop_costing where we use the main loop
analysis to constrain the epilogue scalar iterations.
I have not tried to integrate this with vect_known_niters_smaller_than_vf.
It seems the for_epilogue_p parameter in
vect_determine_partial_vectors_and_peeling is largely useless and
we could compute that in the function itself.
PR tree-optimization/110310
* tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
Move costing part ...
(vect_analyze_loop_costing): ... here. Integrate better
estimate for epilogues from ...
(vect_analyze_loop_2): Call vect_determine_partial_vectors_and_peeling
with actual epilogue status.
* tree-vect-loop-manip.cc (vect_do_peeling): ... here and
avoid cancelling epilogue vectorization.
(vect_update_epilogue_niters): Remove. No longer update
epilogue LOOP_VINFO_NITERS.
* gcc.target/i386/pr110310.c: New testcase.
* gcc.dg/vect/slp-perm-12.c: Disable epilogue vectorization.
Base one the review comments from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
I change len_mask_gather_load/len_mask_scatter_store order into:
{len,bias,mask}
We adjust adding len and mask using using add_len_and_mask_args
which is same as partial_load/parial_store.
Now, the codes become more reasonable and easier maintain.
This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:
void
f (uint8_t *restrict a,
uint8_t *restrict b, int n,
int base, int step,
int *restrict cond)
{
for (int i = 0; i < n; ++i)
{
if (cond[i])
a[i * step + base] = b[i * step + base];
}
}
We hope RVV can vectorize such case into following IR:
We failed to elide insn 2 (vsetvl insn) since insn 3 is modifying "a3" AVL.
Actually, we don't really care about insn 3 since we should only check and make sure
there is no insn between insn 1 and insn 2 that modifies "a3" AVL. Then, we can propgate
AVL "a3" from insn 1 to insn 2. Finally, insn 2 is eliminated.
This seems to have just been overlooked when introducing
BITREVERSE. Note that the function name mem_loc_descriptor
is a misnomer; it'd better be called rtx_loc_descriptor or
any_loc_descriptor, because "anything" RTX can end up here.
To wit, when introducing new RTL that ends up as code or for
other reasons appear in debug expressions, don't forget to
update this function. This was observed by building
libstdc+++ for cris-elf with a patch replacing the
CRIS_UNSPEC_SWAP_BITS by bitreverse, as hitting the
internal-error-generating default case.
Looking at the BSWAP, POPCOUNT and ROTATE cases, BITREVERSE
can probably be fully expressed as DWARF code if need be,
but let's start with not throwing an internal error.
Jonathan Wakely [Mon, 3 Jul 2023 18:33:18 +0000 (19:33 +0100)]
libstdc++: Fix <iosfwd> synopsis test
The <syncstream> header is only supported for the cxx11 ABI. The
declarations of basic_syncbuf, basic_osyncstream, syncbuf and
osyncstream were already correctly guarded by a check for
_GLIBCXX_USE_CXX11_ABI, but the wsyncbuf and wosyncstream declarations
were not.
libstdc++-v3/ChangeLog:
* testsuite/27_io/headers/iosfwd/synopsis.cc: Make wsyncbuf and
wosyncstream depend on _GLIBCXX_USE_CXX11_ABI.
* include/pstl/pstl_config.h (_PSTL_PRAGMA_SIMD_SCAN,
_PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN, _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN):
Define to OpenMP 5.0 pragmas even for GCC 10.0+.
(_PSTL_UDS_PRESENT): Define to 1 for GCC 10.0+.
This series adds basic support for the vector crypto extensions:
* Zvbb
* Zvbc
* Zvkg
* Zvkned
* Zvkhn[a,b]
* Zvksed
* Zvksh
* Zvkn
* Zvknc
* Zvkng
* Zvks
* Zvksc
* Zvksg
* Zvkt
This patch is based on the v20230620 version of the Vector Cryptography
specification. The specification is frozen and can be found here:
https://github.com/riscv/riscv-crypto/releases/tag/v20230620
* gcc.target/riscv/zvbb.c: New test.
* gcc.target/riscv/zvbc.c: New test.
* gcc.target/riscv/zvkg.c: New test.
* gcc.target/riscv/zvkn-1.c: New test.
* gcc.target/riscv/zvkn.c: New test.
* gcc.target/riscv/zvknc-1.c: New test.
* gcc.target/riscv/zvknc-2.c: New test.
* gcc.target/riscv/zvknc.c: New test.
* gcc.target/riscv/zvkned.c: New test.
* gcc.target/riscv/zvkng-1.c: New test.
* gcc.target/riscv/zvkng-2.c: New test.
* gcc.target/riscv/zvkng.c: New test.
* gcc.target/riscv/zvknha.c: New test.
* gcc.target/riscv/zvknhb.c: New test.
* gcc.target/riscv/zvks-1.c: New test.
* gcc.target/riscv/zvks.c: New test.
* gcc.target/riscv/zvksc-1.c: New test.
* gcc.target/riscv/zvksc-2.c: New test.
* gcc.target/riscv/zvksc.c: New test.
* gcc.target/riscv/zvksed.c: New test.
* gcc.target/riscv/zvksg-1.c: New test.
* gcc.target/riscv/zvksg-2.c: New test.
* gcc.target/riscv/zvksg.c: New test.
* gcc.target/riscv/zvksh.c: New test.
* gcc.target/riscv/zvkt.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Andrew Pinski [Sat, 1 Jul 2023 17:52:48 +0000 (10:52 -0700)]
Use chain_next on eh_landing_pad_d for GTY (PR middle-end/110510)
The backtrace in the bug report suggest there is a running out of
stack during GC collection, because of a long chain of eh_landing_pad_d.
This might fix that by adding chain_next onto eh_landing_pad_d's GTY marker.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Iain Sandoe [Sun, 19 Jun 2022 19:47:43 +0000 (20:47 +0100)]
testsuite, Darwin: Remove an unnecessary flags addition.
The addition of the multiply_defined suppress flag has been handled for some
considerable time now in the Darwin specs; remove it from the testsuite libs.
Avoid duplicates in the specs.
tree+ggc: Change return type of predicate functions from int to bool
Also change internal variable from int to bool.
gcc/ChangeLog:
* tree.h (tree_int_cst_equal): Change return type from int to bool.
(operand_equal_for_phi_arg_p): Ditto.
(tree_map_base_marked_p): Ditto.
* tree.cc (contains_placeholder_p): Update function body
for bool return type.
(type_cache_hasher::equal): Ditto.
(tree_map_base_hash): Change return type
from int to void and adjust function body accordingly.
(tree_int_cst_equal): Ditto.
(operand_equal_for_phi_arg_p): Ditto.
(get_narrower): Change "first" variable to bool.
(cl_option_hasher::equal): Update function body for bool return type.
* ggc.h (ggc_set_mark): Change return type from int to bool.
(ggc_marked_p): Ditto.
* ggc-page.cc (gt_ggc_mx): Change return type
from int to void and adjust function body accordingly.
(ggc_set_mark): Ditto.
Middle-end: Change order of LEN_MASK_LOAD/LEN_MASK_STORE arguments
Hi, Richard. I fix the order as you suggeted.
Before this patch, the order is {len,mask,bias}.
Now, after this patch, the order becomes {len,bias,mask}.
Since you said we should not need 'internal_fn_bias_index', the bias index should always be the len index + 1.
I notice LEN_STORE order is {len,vector,bias}, to make them consistent, I reorder into LEN_STORE {len,bias,vector}.
Just like MASK_STORE {mask,vector}.
Ok for trunk ?
gcc/ChangeLog:
* config/riscv/autovec.md: Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
* config/riscv/riscv-v.cc (expand_load_store): Ditto.
* doc/md.texi: Ditto.
* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(add_len_and_mask_args): New function.
(expand_partial_load_optab_fn): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
(expand_partial_store_optab_fn): Ditto.
(internal_fn_len_index): New function.
(internal_fn_mask_index): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* internal-fn.h (internal_fn_len_index): New function.
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Change order of
LEN_MASK_LOAD/LEN_MASK_STORE/LEN_LOAD/LEN_STORE arguments.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
Eric Botcazou [Fri, 23 Jun 2023 17:01:05 +0000 (19:01 +0200)]
ada: Fix renaming of predefined equality operator for unchecked union types
The problem is that the predefined equality operator for unchecked union
types is implemented out of line by invoking a function that takes more
parameters than the two operands, which means that the renaming is not
seen as type conforming with this function and, therefore, is rejected.
The way out is to implement these additional parameters as "extra" formal
parameters, since this kind of parameters is not taken into account for
semantic checks. The change also factors out the duplicated generation
of actuals for these additional parameters into a single procedure.
gcc/ada/
* exp_ch3.ads (Build_Variant_Record_Equality): Add Spec_Id as second
parameter.
* exp_ch3.adb (Build_Variant_Record_Equality): For unchecked union
types, build the additional parameters as extra formal parameters.
(Expand_Freeze_Record_Type.Build_Variant_Record_Equality): Pass
Empty as Spec_Id in call to Build_Variant_Record_Equality.
* exp_ch4.ads (Expand_Unchecked_Union_Equality): New procedure.
* exp_ch4.adb (Expand_Composite_Equality): In the presence of a
function implementing composite equality, do not special case the
unchecked union types, and only convert the operands if the base
types are not the same like in Build_Equality_Call.
(Build_Equality_Call): Do not special case the unchecked union types
and relocate the operands only once.
(Expand_N_Op_Eq): Do not special case the unchecked union types.
(Expand_Unchecked_Union_Equality): New procedure implementing the
specific expansion of calls to the predefined equality function.
* exp_ch6.adb (Is_Unchecked_Union_Equality): New predicate.
(Expand_Call): Call Is_Unchecked_Union_Equality to determine whether
to call Expand_Unchecked_Union_Equality or Expand_Call_Helper.
* exp_ch8.adb (Build_Body_For_Renaming): Set Has_Delayed_Freeze flag
earlier on Id and pass Id in call to Build_Variant_Record_Equality.