Jan Hubicka [Fri, 30 Jun 2023 14:27:27 +0000 (16:27 +0200)]
Fix handling of __builtin_expect_with_probability and improve first-match heuristics
While looking into the std::vector _M_realloc_insert codegen I noticed that
call of __throw_bad_alloc is predicted with 10% probability. This is because
the conditional guarding it has __builtin_expect (cond, 0) on it. This
incorrectly takes precedence over more reliable heuristics predicting that call
to cold noreturn is likely not going to happen.
So I reordered the predictors so __builtin_expect_with_probability comes first
after predictors that never makes a mistake (so user can use it to always
specify the outcome by hand). I also downgraded malloc predictor since I do
think user-defined malloc functions & new operators may behave funny ways and
moved usual __builtin_expect after the noreturn cold predictor.
This triggered latent bug in expr_expected_value_1 where
if (*predictor < predictor2)
*predictor = predictor2;
should be:
if (predictor2 < *predictor)
*predictor = predictor2;
which eventually triggered an ICE on combining heuristics. This made me notice
that we can do slightly better while combining expected values in case only
one of the parameters (such as in a*b when we expect a==0) can determine
overall result.
Note that the new code may pick weaker heuristics in case that both values are
predicted. Not sure if this scenario is worth the extra CPU time: there is
not correct way to combine the probabilities anyway since we do not know if
the predictions are independent, so I think users should not rely on it.
Fixing this issue uncovered another problem. In 2018 Martin Liska added
code predicting that MALLOC returns non-NULL but instead of that he predicts
that it returns true (boolean 1). This sort of works for testcase testing
malloc (10) != NULL
but, for example, we will predict
malloc (10) == malloc (10)
as true, which is not right and such comparsion may happen in real code
I think proper way is to update expr_expected_value_1 to work with value
ranges, but that needs greater surgery so I decided to postpone this and
only add FIXME and fill PR110499.
gcc/ChangeLog:
PR middle-end/109849
* predict.cc (estimate_bb_frequencies): Turn to static function.
(expr_expected_value_1): Fix handling of binary expressions with
predicted values.
* predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue.
(PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority
queue.
* predict.h (estimate_bb_frequencies): No longer declare it.
Iain Sandoe [Sat, 25 Feb 2023 23:18:13 +0000 (23:18 +0000)]
modula-2: Amend the handling of failed select() calls in RTint [PR108835].
When we make a select() that fails, there is an attempt to (a) diagnose
why and (b) make a fallback. These actions are causing some tests to
hang on some Darwin versions, this is because the first action that is
tried to assist in diagnosis/fallback handling is to replace the set
timeout with NIL (which causes select to wait forever, modulo other
reasons it might complete).
To fix this, call select with a zero timeout when checking for error
conditions. Also, as we check the possible failure conditions, if we
find a change that succeeds, then stop looking for errors.
Jonathan Wakely [Fri, 30 Jun 2023 13:37:59 +0000 (14:37 +0100)]
libstdc++: Make std::random_device throw more std::system_error [PR105081]
In r14-289-gf9412cedd6c0e7 I made the std::random_device constructor
throw std::system_error for unrecognized tokens. But it still throws
std::runtime_error for a token such as "rdseed" that is recognized but
not supported at runtime by the CPU the program is running on.
With this change we throw std::system_error for those cases too. This
fixes the following failures on Intel CPUs withour rdseed support:
FAIL: 26_numerics/random/random_device/94087.cc execution test
FAIL: 26_numerics/random/random_device/cons/token.cc execution test
FAIL: 26_numerics/random/random_device/entropy.cc execution test
libstdc++-v3/ChangeLog:
PR libstdc++/105081
* src/c++11/random.cc (random_device::_M_init): Throw
std::system_error when the requested device is a valid token but
not available at runtime.
Uros Bizjak [Fri, 30 Jun 2023 14:04:02 +0000 (16:04 +0200)]
fold-const+optabs: Change return type of predicate functions from int to bool
Also change some internal variables and function argument from int to bool.
gcc/ChangeLog:
* fold-const.h (multiple_of_p): Change return type from int to bool.
* fold-const.cc (split_tree): Change negl_p, neg_litp_p,
neg_conp_p and neg_var_p variables to bool.
(const_binop): Change sat_p variable to bool.
(merge_ranges): Change no_overlap variable to bool.
(extract_muldiv_1): Change same_p variable to bool.
(tree_swap_operands_p): Update function body for bool return type.
(fold_truth_andor): Change commutative variable to bool.
(multiple_of_p): Change return type
from int to void and adjust function body accordingly.
* optabs.h (expand_twoval_unop): Change return type from int to bool.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
* optabs.cc (add_equal_note): Ditto.
(widen_operand): Change no_extend argument from int to bool.
(expand_binop): Ditto.
(expand_twoval_unop): Change return type
from int to void and adjust function body accordingly.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(vec_widen_<su>abdl_lo_<mode>, vec_widen_<su>abdl_hi_<mode>):
Expansions for abd vec widen optabs.
(aarch64_<su>abdl<mode>_insn): VQW based abdl RTL.
* config/aarch64/iterators.md (USMAX_EXT): Code attributes
that give the appropriate extend RTL for the max RTL.
This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).
gcc/ChangeLog:
* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* doc/md.texi: Document them.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.
Pan Li [Fri, 30 Jun 2023 11:12:22 +0000 (19:12 +0800)]
RISC-V: Refactor vxrm_mode attr for type attr equal
This patch would like to refactor the vxrm_mode attr for duplicated
eq_attr condition. The common condition of attr is extraced to one
place instead of many places.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/vector.md: Refactor the common condition.
Richard Biener [Fri, 30 Jun 2023 10:03:26 +0000 (12:03 +0200)]
tree-optimization/110496 - TYPE_PRECISION issue with store-merging
When store-merging looks for bswap opportunities we also handle
BIT_FIELD_REFs where we verify the refed object is of scalar
type but we don't check for the result type we eventually use.
That's done later but after we eventually query TYPE_PRECISION.
The following re-orders this.
PR tree-optimization/110496
* gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Re-order
verifying and TYPE_PRECISION query for the BIT_FIELD_REF case.
Richard Biener [Fri, 30 Jun 2023 07:46:48 +0000 (09:46 +0200)]
middle-end/110489 - avoid useless work on statistics
When we call statistics_fini_pass we unconditionally allocate
the statistics hash and traverse it. When a TU has many small
functions this can take considerable time. The following avoids
this by never allocating the hash from this function.
PR middle-end/110489
* statistics.cc (curr_statistics_hash): Add argument
indicating whether we should allocate the hash.
(statistics_fini_pass): If the hash isn't allocated
only print the summary header.
... understanding that "turn on LRA" is an exaggeration here, given that nvptx
isn't actually doing register allocation ('TARGET_NO_REGISTER_ALLOCATION').
Jovan Dmitrovic [Mon, 26 Jun 2023 15:00:20 +0000 (17:00 +0200)]
mips: Fix overaligned function arguments [PR109435]
This patch changes alignment for typedef types when passed as
arguments, making the alignment equal to the alignment of
original (aliased) types.
This change makes it impossible for a typedef type to have
alignment that is less than its size.
2023-06-27 Jovan Dmitrović <jovan.dmitrovic@syrmia.com>
gcc/ChangeLog:
PR target/109435
* config/mips/mips.cc (mips_function_arg_alignment): Returns
the alignment of function argument. In case of typedef type,
it returns the aligment of the aliased type.
(mips_function_arg_boundary): Relocated calculation of the
aligment of function arguments.
gcc/testsuite/ChangeLog:
* gcc.target/mips/align-1-n64.c: New test.
* gcc.target/mips/align-1-o32.c: New test.
g++.dg/analyzer/PR100244.C was failing after a patch of PR109439.
The reason was a spurious preemptive return of get_store_value upon
out-of-bounds read that was preventing further checks. Now instead,
a boolean value check_poisoned goes to false when a OOB is detected,
and is later on given to get_or_create_initial_value.
gcc/analyzer/ChangeLog:
PR analyzer/110198
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Take an
optional boolean value to bypass poisoning checks
* region-model-manager.h: Update declaration of the above function.
* region-model.cc (region_model::get_store_value): No longer returns
on OOB, but rather gives a boolean to get_or_create_initial_value.
(region_model::check_region_access): Update docstring.
(region_model::check_region_for_write): Update docstring.
Signed-off-by: benjamin priour <priour.be@gmail.com>
So there is check for allocated block size being greater than max_size which is
wrapper in __builtin_expect. This makes ipa-fnsummary to give up analyzing
predicates and it will miss the fact that the two different calls to __throw
will be optimized out if __n is larady smaller than 1152921504606846975 which
it is after _M_check_len.
This patch extends ipa-fnsummary to understand functions that return their
parameter.
gcc/ChangeLog:
PR tree-optimization/109849
* ipa-fnsummary.cc (decompose_param_expr): Skip
functions returning its parameter.
(set_cond_stmt_execution_predicate): Return early
if predicate was constructed.
gcc/testsuite/ChangeLog:
PR tree-optimization/109849
* gcc.dg/ipa/pr109849.c: New test.
Marek Polacek [Thu, 29 Jun 2023 18:57:48 +0000 (14:57 -0400)]
testsuite: Use -fno-report-bug in gcc.dg/plugin/
Certain downstream compilers (for example, in Fedora) default to
-freport-bug. The extra output breaks the following tests. We can use
-fno-report-bug to fix that. Patch verified with:
$ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\} plugin.exp'
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/crash-test-ice-sarif.c: Use -fno-report-bug. Adjust
scan-sarif-file.
* gcc.dg/plugin/crash-test-ice-stderr.c: Use -fno-report-bug.
* gcc.dg/plugin/crash-test-write-though-null-sarif.c: Use
-fno-report-bug. Adjust scan-sarif-file.
* gcc.dg/plugin/crash-test-write-though-null-stderr.c: Use
-fno-report-bug.
Patrick Palka [Thu, 29 Jun 2023 20:10:18 +0000 (16:10 -0400)]
c++: NSDMI instantiation during overload resolution [PR110468]
Here we find ourselves instantiating the NSDMI for A<1>::m when
computing argument conversions during overload resolution, and
thus tf_conv is set. The flag causes mark_used for the constructor
used in the NSDMI to exit early and not instantiate its noexcept-spec,
which eventually leads to an ICE from nothrow_spec_p.
This patch fixes this by clearing any special tsubst flags during
instantiation of an NSDMI, since the result should be independent of
the context that requires the instantiation.
PR c++/110468
gcc/cp/ChangeLog:
* init.cc (maybe_instantiate_nsdmi_init): Mask out all
tsubst flags except for tf_warning_or_error.
Here we're incorrectly accepting the mutable member accesses because
cp_fold neglects to propagate CONSTRUCTOR_MUTABLE_POISON when folding a
CONSTRUCTOR.
Qing Zhao [Thu, 29 Jun 2023 17:07:26 +0000 (17:07 +0000)]
Update documentation to clarify a GCC extension [PR c/77650]
on a structure with a C99 flexible array member being nested in
another structure.
"The GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.
There are two situations:
* A structure containing a C99 flexible array member, or a union
containing such a structure, is the last field of another structure,
for example:
struct flex { int length; char data[]; };
union union_flex { int others; struct flex f; };
struct out_flex_struct { int m; struct flex flex_data; };
struct out_flex_union { int n; union union_flex flex_data; };
In the above, both 'out_flex_struct.flex_data.data[]' and
'out_flex_union.flex_data.f.data[]' are considered as flexible
arrays too.
* A structure containing a C99 flexible array member, or a union
containing such a structure, is not the last field of another structure,
for example:
struct flex { int length; char data[]; };
struct mid_flex { int m; struct flex flex_data; int n; };
In the above, accessing a member of the array 'mid_flex.flex_data.data[]'
might have undefined behavior. Compilers do not handle such a case
consistently, Any code relying on this case should be modified to ensure
that flexible array members only end up at the ends of structures.
Please use the warning option '-Wflex-array-member-not-at-end' to
identify all such cases in the source code and modify them. This extension
is now deprecated.
"
PR c/77650
gcc/c-family/ChangeLog:
* c.opt: New option -Wflex-array-member-not-at-end.
gcc/c/ChangeLog:
* c-decl.cc (finish_struct): Issue warnings for new option.
gcc/ChangeLog:
* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.
gcc/testsuite/ChangeLog:
* gcc.dg/variable-sized-type-flex-array.c: New test.
Qing Zhao [Thu, 29 Jun 2023 17:07:08 +0000 (17:07 +0000)]
Introduce IR bit TYPE_INCLUDES_FLEXARRAY for the GCC extension
on a structure with a C99 flexible array member being nested in
another structure
GCC extension accepts the case when a struct with a flexible array member
is embedded into another struct or union (possibly recursively) as the last
field.
This patch is to introduce the IR bit TYPE_INCLUDES_FLEXARRAY (reuse the
existing IR bit TYPE_NO_NAMED_ARGS_SATDARG_P), set it correctly in C FE,
stream it correctly in Middle-end, and print it during IR dumping.
gcc/c/ChangeLog:
* c-decl.cc (finish_struct): Set TYPE_INCLUDES_FLEXARRAY for
struct/union type.
gcc/lto/ChangeLog:
* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDES_FLEXARRAY properly
for its corresponding type.
gcc/ChangeLog:
* print-tree.cc (print_node): Print new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDES_FLEXARRAY): New macro TYPE_INCLUDES_FLEXARRAY.
Aldy Hernandez [Thu, 15 Jun 2023 15:28:21 +0000 (17:28 +0200)]
Tidy up the range normalization code.
There's a few spots where a range is being altered in-place, but we
fail to call normalize the range. This patch makes sure we always
call normalize_kind(), and that normalize_kind in turn calls
verify_range to make sure verything is canonical.
gcc/ChangeLog:
* value-range.cc (frange::set): Do not call verify_range.
(frange::normalize_kind): Verify range.
(frange::union_nans): Do not call verify_range.
(frange::union_): Same.
(frange::intersect): Same.
(irange::irange_single_pair_union): Call normalize_kind if
necessary.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_range_from_nonzero_bits): Verify range.
(irange::set_nonzero_bits): Call normalize_kind if necessary.
(irange::get_nonzero_bits): Tweak comment.
(irange::intersect_nonzero_bits): Call normalize_kind if
necessary.
(irange::union_nonzero_bits): Same.
* value-range.h (irange::normalize_kind): Verify range.
Uros Bizjak [Thu, 29 Jun 2023 15:29:03 +0000 (17:29 +0200)]
cselib+expr+bitmap: Change return type of predicate functions from int to bool
gcc/ChangeLog:
* cselib.h (rtx_equal_for_cselib_1):
Change return type from int to bool.
(references_value_p): Ditto.
(rtx_equal_for_cselib_p): Ditto.
* expr.h (can_store_by_pieces): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(safe_from_p): Ditto.
* sbitmap.h (bitmap_equal_p): Ditto.
* cselib.cc (references_value_p): Change return type
from int to void and adjust function body accordingly.
(rtx_equal_for_cselib_1): Ditto.
* expr.cc (is_aligning_offset): Ditto.
(can_store_by_pieces): Ditto.
(mostly_zeros_p): Ditto.
(all_zeros_p): Ditto.
(safe_from_p): Ditto.
(is_aligning_offset): Ditto.
(try_casesi): Ditto.
(try_tablejump): Ditto.
(store_constructor): Change "need_to_clear" and
"const_bounds_p" variables to bool.
* sbitmap.cc (bitmap_equal_p): Change return type from int to bool.
Jonathan Wakely [Thu, 29 Jun 2023 10:40:32 +0000 (11:40 +0100)]
libstdc++: Fix src/c++20/tzdb.cc for non-constexpr std::mutex
Building libstdc++ reportedly fails for targets without lock-free
std::atomic<T*> which don't define __GTHREAD_MUTEX_INIT:
src/c++20/tzdb.cc:110:21: error: 'constinit' variable 'std::chrono::{anonymous}::list_mutex' does not have a constant initializer
src/c++20/tzdb.cc:110:21: error: call to non-'constexpr' function 'std::mutex::mutex()'
The solution implemented by this commit is to use a local static mutex
when it can't be constinit, so that it's constructed on first use.
With this change, we can also simplify the preprocessor logic for
defining USE_ATOMIC_SHARED_PTR. It now depends on the same conditions as
USE_ATOMIC_LIST_HEAD, so in theory we could have a single macro. Keeping
them separate would allow us to replace the use of atomic<shared_ptr<T>>
with a mutex if that performs better, without having to give up on the
lock-free cache for fast access to the list head.
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (USE_ATOMIC_SHARED_PTR): Define consistently
with USE_ATOMIC_LIST_HEAD.
(list_mutex): Replace global object with function. Use local
static object when std::mutex constructor isn't constexpr.
Jonathan Wakely [Wed, 28 Jun 2023 18:10:29 +0000 (19:10 +0100)]
libstdc++: Do not use off64_t in calls to copy_file_range [PR110462]
Although the copy_file_range(2) man page shows the arguments as off64_t*
that is not portable. For musl there is no off64_t type, as off_t is
always 64-bit. Use the loff_t type which is always 64-bit even if off_t
isn't. We could just use off_t because the filesystem library is
compiled with _FILE_OFFSET_BITS=64, but loff_t is the more correct type
for this interface.
libstdc++-v3/ChangeLog:
PR libstdc++/110462
* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check that
copy_file_range can be called with loff_t* arguments.
* configure: Regenerate.
* src/filesystem/ops-common.h (copy_file_copy_file_range):
Use loff_t for offsets.
There's currently no cheap way to obtain the partial template
specialization (and arguments relative to it) that was selected for a
class or variable template specialization. Our only option is to
compute the result from scratch via most_specialized_partial_spec.
For class templates this isn't really an issue because we usually need
this information just once, upon instantiation. But for variable
templates we need it upon specialization and also later upon instantiation.
We could implement an ad-hoc cache for variable templates only, but it'd
be nice for this information to be readily available in general.
To that end, this patch adds a TI_PARTIAL_INFO field to TEMPLATE_INFO
that holds another TEMPLATE_INFO consisting of the partial template and
arguments relative to it, which most_specialized_partial_spec then
uses to transparently cache its (now TEMPLATE_INFO) result.
Similarly, there's no easy way to go from the DECL_TEMPLATE_RESULT of a
partial TEMPLATE_DECL back to that TEMPLATE_DECL. (Our best option is to
walk the DECL_TEMPLATE_SPECIALIZATIONS list of the primary TEMPLATE_DECL.)
So this patch also uses this new field to link these entities in both
directions.
gcc/cp/ChangeLog:
* cp-tree.h (tree_template_info::partial): New data member.
(TI_PARTIAL_INFO): New tree accessor.
(most_specialized_partial_spec): Add defaulted bool parameter.
* module.cc (trees_out::core_vals) <case TEMPLATE_INFO>: Stream
TI_PARTIAL_INFO.
(trees_in::core_vals) <case TEMPLATE_INFO>: Likewise.
* parser.cc (specialization_of): Adjust after making
most_specialized_partial_spec return TEMPLATE_INFO instead
of TREE_LIST.
* pt.cc (process_partial_specialization): Set TI_PARTIAL_INFO
of 'decl' to point back to the partial TEMPLATE_DECL. Likewise
(and pass rechecking=true to most_specialization_partial_spec).
(instantiate_class_template): Likewise.
(instantiate_template): Set TI_PARTIAL_INFO to the result of
most_specialization_partial_spec after forming a variable
template specialization.
(most_specialized_partial_spec): Add 'rechecking' parameter.
Exit early if the template is not primary. Use the TI_PARTIAL_INFO
of the corresponding TEMPLATE_INFO as a cache unless 'rechecking'
is true. Don't bother setting TREE_TYPE of each TREE_LIST.
(instantiate_decl): Adjust after making
most_specialized_partial_spec return TEMPLATE_INFO instead of
TREE_LIST.
* ptree.cc (cxx_print_xnode) <case TEMPLATE_INFO>: Dump
TI_PARTIAL_INFO.
Tom Tromey [Thu, 22 Jun 2023 13:35:02 +0000 (07:35 -0600)]
Relax type-printer regexp in libstdc++ test suite
The libstdc++ test suite checks whether gdb type printers are
available like so:
set do_whatis_tests [gdb_batch_check "python print(gdb.type_printers)" \
"\\\[\\\]"]
This regexp assumes that the list of printers is empty. However,
sometimes it's convenient to ship a gdb that comes with some default
printers, causing this to erroneously report that gdb is "too old".
I believe the intent of this check is to ensure that gdb.type_printers
exists -- not to check its starting value. This patch changes the
check to accept any Python list as output.
Note that the patch doesn't look for the trailing "]". I tried this
but in my case the output was too long for expect. It seemed fine to
just check the start, as the point really is to reject the case where
the command prints an error message.
Robin Dapp [Thu, 29 Jun 2023 09:35:02 +0000 (11:35 +0200)]
tree-ssa-math-opts: Use element_precision.
The recent TYPE_PRECISION changes to detect improper usage
cause an ICE in divmod_candidate_p for RVV when called with
a vector type. Therefore, use element_precision instead.
gcc/ChangeLog:
* tree-ssa-math-opts.cc (divmod_candidate_p): Use
element_precision.
Roger Sayle [Thu, 29 Jun 2023 10:44:04 +0000 (11:44 +0100)]
[Committed] Add -mmove-max=128 -mstore-max=128 to pieces-memcmp-2.c
Adding -mmove-max=128 and -mstore-max=128 to the dg-options of the
recently added gcc.target/i386/pieces-memcmp-2.c avoids changing the
intent of this testcase when adding -march=cascadelake to RUNTESTFLAGS.
Committed as obvious.
2023-06-29 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: Specify that 128-bit
comparisons are desired, to see if 256-bit instructions are
generated inappropriately (fixes test on -march=cascadelake).
Richard Biener [Thu, 29 Jun 2023 08:23:29 +0000 (10:23 +0200)]
tree-optimization/110460 - fend off vector types from vectorizer
The following makes fending off existing vector types from vectorization
also apply to word_mode vector types. I've chosen to add a positive
list of allowed scalar types here for clarity.
PR tree-optimization/110460
* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
Only allow integral, pointer and scalar float type scalar_type.
Lili Cui [Thu, 29 Jun 2023 06:51:56 +0000 (06:51 +0000)]
Avoid adding loop-carried ops to long chains
Avoid adding loop-carried ops to long chains, otherwise the whole chain will
have dependencies across the loop iteration. Just keep loop-carried ops in a
separate chain.
E.g.
x_1 = phi(x_0, x_2)
y_1 = phi(y_0, y_2)
Alexandre Oliva [Thu, 29 Jun 2023 09:03:24 +0000 (06:03 -0300)]
[testsuite] tolerate enabled but missing language frontends
When a language is enabled but we run the testsuite against a tree in
which the frontend compiler is not present, help.exp fails. It
recognizes the output pattern for a disabled language, but not a
missing frontend. Extend the pattern so that it covers both cases.
for gcc/testsuite/ChangeLog
* lib/options.exp (check_for_options_with_filter): Handle
missing frontend compiler like disabled language.
when the mask mode is a scalar int mode like for AVX512 or GCN.
Instead of piecewise building the result via shifts and ors
we can take advantage of uniformity and signedness of the
component and simply sign-extend to the result.
Richard Biener [Thu, 29 Jun 2023 07:15:27 +0000 (09:15 +0200)]
middle-end/110461 - pattern applying wrongly to vectors
The following guards a match.pd pattern that wasn't supposed to
apply to vectors and thus runs into TYPE_PRECISION checking. For
vector support the constant case is lacking and the pattern would
have missing optab support checking for the result operation.
It uses the generic gt_pch_nx routines (with gt_pch_nx being the
“note pointers” hooks), such as:
template<typename T, typename A>
void
gt_pch_nx (vec<T, A, vl_embed> *v)
{
extern void gt_pch_nx (T &);
for (unsigned i = 0; i < v->length (); i++)
gt_pch_nx ((*v)[i]);
}
It then defines gt_pch_nx routines for Entity_Id &.
The problem is that if we wanted to take the same approach for
an array of unsigned ints, we'd need to define:
inline void gt_pch_nx (unsigned int &) { }
which would then be ambiguous with:
inline void gt_pch_nx (unsigned int) { }
The point of va_gc_atomic is that the elements don't need to be GCed,
and so we have:
template<typename T>
void
gt_ggc_mx (vec<T, va_gc_atomic, vl_embed> *v ATTRIBUTE_UNUSED)
{
/* Nothing to do. Vectors of atomic types wrt GC do not need to
be traversed. */
}
I think it's therefore reasonable to assume that no pointers will
need to be processed for PCH either.
The patch also relaxes the array_slice constructor for vec<T, va_gc> *
so that it handles all embedded vectors.
gcc/
* vec.h (gt_pch_nx): Add overloads for va_gc_atomic.
(array_slice): Relax va_gc constructor to handle all vectors
with a vl_embed layout.
gcc/ada/
* gcc-interface/decl.cc (gt_pch_nx): Remove overloads for Entity_Id.
Pan Li [Thu, 29 Jun 2023 01:25:36 +0000 (09:25 +0800)]
RISC-V: Support vfadd static rounding mode by mode switching
This patch would like to support the vfadd static round mode similar to
the fixed-point. Then the related fsrm instructions will be inserted
correlatively.
Please *NOTE* this PATCH doesn't cover anything about FRM dynamic mode,
it will be implemented in the underlying PATCH(s).
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_emit_mode_set): Add emit for FRM.
(riscv_mode_needed): Likewise.
(riscv_entity_mode_after): Likewise.
(riscv_mode_after): Likewise.
(riscv_mode_entry): Likewise.
(riscv_mode_exit): Likewise.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Add number
for FRM.
* config/riscv/riscv.md: Add FRM register.
* config/riscv/vector-iterators.md: Add FRM type.
* config/riscv/vector.md (frm_mode): Define new attr for FRM mode.
(fsrm): Define new insn for fsrm instruction.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-3.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-5.c: New test.
For tracking and development friendly, We will take some steps to support
all rounding modes for the RVV floating-point rounding modes.
1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
2. Support static rounding mode control by mode switch, like fixed-point.
3. Support dynamice round mode control by mode switch.
4. Support the rest floating-point instructions for frm.
Please *NOTE* this patch only allow the rounding mode control for the
vfadd intrinsic API, and the related frm will be coverred by step 2.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
Add macro for static frm min and max.
* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): New class for floating-point with frm.
(BASE): Add vfadd for frm.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfadd_frm): Likewise.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): New struct for alu with frm.
(SHAPE): Add alu with frm.
* config/riscv/riscv-vector-builtins-shapes.h: Likewise.
* config/riscv/riscv-vector-builtins.cc
(function_checker::report_out_of_range_and_not): New function
for report out of range and not val.
(function_checker::require_immediate_range_or): New function
for checking in range or one val.
* config/riscv/riscv-vector-builtins.h: Add function decl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm.c: New test.
Eugene Rozenfeld [Tue, 27 Jun 2023 20:58:14 +0000 (13:58 -0700)]
Fix collection and processing of autoprofile data for target libs
cc1, cc1plus, and lto built during STAGEautoprofile need to be built with
debug info since they are used to build target libs. -gtoggle was
turning off debug info for this stage.
create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto
instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when
processing profile data collected while building target libraries.
Tested on x86_64-pc-linux-gnu.
ChangeLog:
* Makefile.in: Remove -gtoggle for STAGEautoprofile
* Makefile.tpl: Remove -gtoggle for STAGEautoprofile
gcc/c/ChangeLog:
* Make-lang.in: Pass correct stage cc1 when processing
profile data collected while building target libraries
gcc/cp/ChangeLog:
* Make-lang.in: Pass correct stage cc1plus when processing
profile data collected while building target libraries
gcc/lto/ChangeLog:
* Make-lang.in: Pass correct stage lto when processing
profile data collected while building target libraries
CRIS: Don't apply PATTERN to insn before validation (PR 110144)
Oops. The validation was there, but PATTERN was applied
before that. Noticeable only with rtl-checking (for example
as in the report: "--enable-checking=yes,rtl") as this
statement was only a (one of many) straggling olde-C
declare-and-initialize-at-beginning-of-block thing.
PR target/110144
* config/cris/cris.cc (cris_postdbr_cmpelim): Don't apply PATTERN
to insn before validating it.
Jan Hubicka [Wed, 28 Jun 2023 21:05:28 +0000 (23:05 +0200)]
Enable early inlining into always_inline functions
Early inliner currently skips always_inline functions and moreover we ignore
calls from always_inline in ipa_reverse_postorder. This leads to disabling
most of propagation done using early optimization that is quite bad when
early inline functions are not leaf functions, which is now quite common
in libstdc++.
This patch instead of fully disabling the inline checks calls in callee.
I am quite conservative about what can be inlined as this patch is bit
touchy anyway. To avoid problems with always_inline being optimized
after early inline I extended inline_always_inline_functions to lazilly
compute fnsummary when needed.
gcc/ChangeLog:
PR middle-end/110334
* ipa-fnsummary.h (ipa_fn_summary): Add
safe_to_inline_to_always_inline.
* ipa-inline.cc (can_early_inline_edge_p): ICE
if SSA is not built; do cycle checking for
always_inline functions.
(inline_always_inline_functions): Be recrusive;
watch for cycles; do not updat overall summary.
(early_inliner): Do not give up on always_inlines.
* ipa-utils.cc (ipa_reverse_postorder): Do not skip
always inlines.
gcc/testsuite/ChangeLog:
PR middle-end/110334
* g++.dg/opt/pr66119.C: Disable early inlining.
* gcc.c-torture/compile/pr110334.c: New test.
* gcc.dg/tree-ssa/pr110334.c: New test.
Harald Anlauf [Wed, 28 Jun 2023 20:16:18 +0000 (22:16 +0200)]
Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]
gcc/fortran/ChangeLog:
PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): For non-constant string
argument passed to CHARACTER(LEN=1),VALUE dummy, ensure proper
dereferencing and truncation of string to length 1.
gcc/testsuite/ChangeLog:
PR fortran/110360
* gfortran.dg/value_9.f90: Add tests for intermediate regression.
Patrick Palka [Wed, 28 Jun 2023 19:43:33 +0000 (15:43 -0400)]
c++: ahead of time variable template-id coercion [PR89442]
This patch makes us coerce the arguments of a variable template-id ahead
of time, as we do for class template-ids, which causes us to immediately
diagnose template parm/arg kind mismatches and arity mismatches.
Unfortunately this causes a regression in cpp1z/constexpr-if20.C: coercing
the variable template-id m<ar, as> ahead of time means we strip it of
typedefs, yielding m<typename C<i>::q, typename C<j>::q>, but in this
stripped form we're directly using 'i' and so we expect to have captured
it. This is a variable template version of PR107437.
PR c++/89442
PR c++/107437
gcc/cp/ChangeLog:
* cp-tree.h (lookup_template_variable): Add complain parameter.
* parser.cc (cp_parser_template_id): Pass tf_warning_or_error
to lookup_template_variable.
* pt.cc (lookup_template_variable): Add complain parameter.
Coerce template arguments here ...
(finish_template_variable): ... instead of here.
(lookup_and_finish_template_variable): Check for error_mark_node
result from lookup_template_variable.
(tsubst_copy) <case TEMPLATE_ID_EXPR>: Pass complain to
lookup_template_variable.
(instantiate_template): Use build2 instead of
lookup_template_variable to build a TEMPLATE_ID_EXPR
for most_specialized_partial_spec.
gcc/testsuite/ChangeLog:
* g++.dg/cpp/pr64127.C: Expect "expected unqualified-id at end
of input" error.
* g++.dg/cpp0x/alias-decl-ttp1.C: Fix template parameter/argument
kind mismatch for variable template has_P_match_V.
* g++.dg/cpp1y/pr72759.C: Expect "template argument 1 is invalid"
error.
* g++.dg/cpp1z/constexpr-if20.C: XFAIL test due to bogus "'i' is
not captured" error.
* g++.dg/cpp1z/noexcept-type21.C: Fix arity of variable template d.
* g++.dg/diagnostic/not-a-function-template-1.C: Add default
template argument to variable template A so that A<> is valid.
* g++.dg/parse/error56.C: Don't expect "ISO C++ forbids
declaration with no type" error.
* g++.dg/parse/template30.C: Don't expect "parse error in
template argument list" error.
* g++.dg/cpp1y/var-templ82.C: New test.
Iain Buclaw [Wed, 28 Jun 2023 16:30:31 +0000 (18:30 +0200)]
d: Fix wrong code-gen when returning structs by value.
Since r13-1104, structs have have compute_record_mode called too early
on them, causing them to return differently depending on the order that
types are generated in, and whether there are forward references.
This patch moves the call to compute_record_mode into its own function,
and calls it after all fields have been given a size.
PR d/106977
PR target/110406
gcc/d/ChangeLog:
* types.cc (finish_aggregate_mode): New function.
(finish_incomplete_fields): Call finish_aggregate_mode.
(finish_aggregate_type): Replace call to compute_record_mode with
finish_aggregate_mode.
Iain Buclaw [Wed, 28 Jun 2023 15:38:16 +0000 (17:38 +0200)]
d: Fix d_signed_or_unsigned_type is invoked for vector types (PR110193)
This function can be invoked on VECTOR_TYPE, but the implementation
assumes it works on integer types only. To fix, added a check whether
the type passed is any `__vector(T)' or non-integral type, and return
early by calling `signed_or_unsigned_type_for()' instead.
Problem was found by instrumenting TYPE_PRECISION and ICEing when
applied on VECTOR_TYPEs.
PR d/110193
gcc/d/ChangeLog:
* types.cc (d_signed_or_unsigned_type): Handle being called with any
vector or non-integral type.
Here we get the "error reporting routines re-entered" ICE because
of an unguarded use of warning_at. While at it, I added a check
for a warning_at just above it.
PR c++/110175
gcc/cp/ChangeLog:
* typeck.cc (cp_build_unary_op): Check tf_warning before warning.
Uros Bizjak [Wed, 28 Jun 2023 08:22:20 +0000 (10:22 +0200)]
final+varasm: Change return type of predicate functions from int to bool
Also change some internal variables to bool and change return type of
compute_alignments to void.
gcc/ChangeLog:
* output.h (leaf_function_p): Change return type from int to bool.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
(maybe_assemble_visibility): Ditto.
* varasm.h (supports_one_only): Ditto.
* rtl.h (compute_alignments): Change return type from int to void.
* final.cc (app_on): Change return type from int to bool.
(compute_alignments): Change return type from int to void
and adjust function body accordingly.
(shorten_branches): Change "something_changed" variable
type from int to bool.
(leaf_function_p): Change return type from int to bool
and adjust function body accordingly.
(final_forward_branch_p): Ditto.
(only_leaf_regs_used): Ditto.
* varasm.cc (contains_pointers_p): Change return type from
int to bool and adjust function body accordingly.
(compare_constant): Ditto.
(maybe_assemble_visibility): Ditto.
(supports_one_only): Ditto.
Fixes: 6a2e8dcbbd4bab3
Propagation for the stack pointer in regcprop was enabled in 6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).
This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.
PR debug/110308
gcc/ChangeLog:
* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(maybe_copy_reg_attrs): New function.
(find_oldest_value_reg): Use maybe_copy_reg_attrs.
(copyprop_hardreg_forward_1): Ditto.
gcc/testsuite/ChangeLog:
* g++.dg/torture/pr110308.C: New test.
Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Richard Biener [Wed, 28 Jun 2023 09:27:45 +0000 (11:27 +0200)]
tree-optimization/110434 - avoid <retval> ={v} {CLOBBER} from NRV
When NRV replaces a local variable with <retval> it also replaces
occurences in clobbers. This leads to <retval> being clobbered
before the return of it which is strictly invalid but harmless in
practice since there's no pass after NRV which would remove
earlier stores.
The following fixes this nevertheless.
PR tree-optimization/110434
* tree-nrv.cc (pass_nrv::execute): Remove CLOBBERs of
VAR we replace with <retval>.
Christophe Lyon [Wed, 28 Jun 2023 09:19:25 +0000 (09:19 +0000)]
Make mve_fp_fpu[12].c accept single or double precision FPU
This tests currently expect a directive containing .fpu fpv5-sp-d16
and thus may fail if the test is executed for instance with
-march=armv8.1-m.main+mve.fp+fp.dp
This patch accepts either fpv5-sp-d16 or fpv5-d16 to avoid the failure.
Christophe Lyon [Wed, 28 Jun 2023 09:09:04 +0000 (09:09 +0000)]
Make nomve_fp_1.c require arm_fp
If GCC is configured with the default (soft) -mfloat-abi, and we don't
override the target_board test flags appropriately,
gcc.target/arm/mve/general-c/nomve_fp_1.c fails for lack of
-mfloat-abi=softfp or -mfloat-abi=hard, because it doesn't use
dg-add-options arm_v8_1m_mve (on purpose, see comment in the test).
Require and use the options needed for arm_fp to fix this problem.
Richard Biener [Wed, 28 Jun 2023 11:36:59 +0000 (13:36 +0200)]
tree-optimization/110451 - hoist invariant compare after interchange
The following adjusts the cost model of invariant motion to consider
[VEC_]COND_EXPRs and comparisons producing a data value as expensive.
For 503.bwaves_r this avoids an unnecessarily high vectorization
factor because of an integer comparison besides data operations on
double.
PR tree-optimization/110451
* tree-ssa-loop-im.cc (stmt_cost): [VEC_]COND_EXPR and
tcc_comparison are expensive.
Paul Thomas [Wed, 28 Jun 2023 11:38:58 +0000 (12:38 +0100)]
Fortran: Enable class expressions in structure constructors [PR49213]
2023-06-28 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/49213
* expr.cc (gfc_is_ptr_fcn): Remove reference to class_pointer.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (get_symbol_decl): Remove extraneous line.
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
size of intrinsic and character expressions.
(gfc_trans_subcomponent_assign): Expand assignment to class
components to include intrinsic and character expressions.
gcc/testsuite/
PR fortran/49213
* gfortran.dg/pr49213.f90 : New test
Roger Sayle [Wed, 28 Jun 2023 10:11:34 +0000 (11:11 +0100)]
i386: Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).
This patch fixes some very odd (unanticipated) code generation by
compare_by_pieces with -m32 -mavx, since the recent addition of the
cbranchoi4 pattern. The issue is that cbranchoi4 is available with
TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT
which results in the odd behaviour (thanks to OPTAB_WIDEN) that with
-m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit
comparisons to 256-bits before performing PTEST.
This patch fixes this by providing a cbranchti4 pattern that's available
with either TARGET_64BIT or TARGET_SSE4_1.
2023-06-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest
for TImode comparisons on 32-bit architectures.
* config/i386/i386.md (cbranch<mode>4): Change from SDWIM to
SWIM1248x to exclude/avoid TImode being conditional on -m64.
(cbranchti4): New define_expand for TImode on both TARGET_64BIT
and/or with TARGET_SSE4_1.
* config/i386/predicates.md (ix86_timode_comparison_operator):
New predicate that depends upon TARGET_64BIT.
(ix86_timode_comparison_operand): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: New test case.
Roger Sayle [Wed, 28 Jun 2023 10:07:47 +0000 (11:07 +0100)]
i386: Fix FAIL of gcc.target/i386/pr78794.c on ia32.
This patch fixes that FAIL of gcc.target/i386/pr78794.c on ia32, which
is caused by minor STV rtx_cost differences with -march=silvermont.
It turns out that generic tuning results in pandn, but the lack of
accurate parameterization for COMPARE in compute_convert_gain combined
with small differences in scalar<->SSE costs on silvermont results in
this DImode chain not being converted.
The solution is to provide more accurate costs/gains for converting
(DImode and SImode) comparisons.
I'd been holding off of doing this as I'd thought it would be possible
to turn pandn;ptestz into ptestc (for an even bigger scalar-to-vector
win) but I've recently realized that these optimizations (as I've
implemented them) occur in the wrong order (stv2 occurs after
combine), so it isn't easy for STV to convert CCZmode into CCCmode.
Doh! Perhaps something can be done in peephole2.
2023-06-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/78794
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate gains for conversion of scalar comparisons to
PTEST.
Jan Hubicka [Wed, 28 Jun 2023 09:45:15 +0000 (11:45 +0200)]
Add cold attribute to throw wrappers and terminate
PR middle-end/109849
* include/bits/c++config (std::__terminate): Mark cold.
* include/bits/functexcept.h: Mark everything as cold.
* libsupc++/exception: Mark terminate and unexpected as cold.
Richard Biener [Wed, 28 Jun 2023 08:16:57 +0000 (10:16 +0200)]
tree-optimization/110443 - prevent SLP splat of gathers
The following prevents non-grouped load SLP in case the element
to splat is from a gather operation. While it should be possible
to support this it is not similar to the single element interleaving
case I was trying to mimic here.
Haochen Gui [Wed, 28 Jun 2023 08:45:50 +0000 (16:45 +0800)]
rs6000: Add two peephole patterns for "mr." insn
When investigating the issue mentioned in PR87871#c30 - if compare
and move pattern benefits before RA, I checked the assembly generated
for SPEC2017 and found that certain insn sequences aren't converted to
"mr." instructions.
Following two sequence are never to be combined to "mr." pattern as
there is no register link between them. This patch adds two peephole2
patterns to convert them to "mr." instructions.
cmp 0,3,0
mr 4,3
mr 4,3
cmp 0,3,0
The patch also creates a new mode iterator which decided by
TARGET_POWERPC64. This mode iterator is used in "mr." and its split
pattern. The original P iterator is improper when -m32/-mpowerpc64 is
set. In this situation, the "mr." should compares the whole 64-bit
register with 0 other than the low 32-bit one.
gcc/
* config/rs6000/rs6000.md (peephole2 for compare_and_move): New.
(peephole2 for move_and_compare): New.
(mode_iterator WORD): New. Set the mode to SI/DImode by
TARGET_POWERPC64.
(*mov<mode>_internal2): Change the mode iterator from P to WORD.
(split pattern for compare_and_move): Likewise.
2. (set (reg) (fma (float_extend:reg)(reg)(reg)))
This pattern is the intermediate IR that enhances the combine optimizations.
Since for the complicate situation, combine pass can not combine both operands
of multiplication at the first time, it will try to first combine at the first
stage: (set (reg) (fma (float_extend:reg)(reg)(reg))). Then combine another
extension of the other operand at the second stage.
This can enhance combine optimization for the following case:
Haochen Gui [Wed, 28 Jun 2023 08:30:44 +0000 (16:30 +0800)]
rs6000: Splat vector small V2DI constants with vspltisw and vupkhsw
This patch adds a new insn for vector splat with small V2DI constants on P8.
If the value of constant is in RANGE (-16, 15) but not 0 or -1, it can be
loaded with vspltisw and vupkhsw on P8.
gcc/
PR target/104124
* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct): Rename
to...
(altivec_vupkhs<VU_char>_direct): ...this.
* config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New
predicate to test if a constant can be loaded with vspltisw and
vupkhsw.
(easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if
a vector constant can be synthesized with a vspltisw and a vupkhsw.
* config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p):
Declare.
* config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New
function to return true if OP mode is V2DI and can be synthesized
with vupkhsw and vspltisw.
* config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
constants with vspltisw and vupkhsw.
Richard Biener [Fri, 9 Jun 2023 07:31:14 +0000 (09:31 +0200)]
Prevent TYPE_PRECISION on VECTOR_TYPEs
The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled. This should avoid wrong-code
in cases like PR110182 and instead ICE.
It also introduces a TYPE_PRECISION_RAW accessor and adjusts
places I found that are eligible to use that.
* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
(TYPE_PRECISION_RAW): Provide raw access to the precision
field.
* tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
(gimple_canonical_types_compatible_p): Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Stream TYPE_PRECISION_RAW.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
gcc/lto/
* lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.
Jason Merrill [Tue, 27 Jun 2023 09:15:01 +0000 (05:15 -0400)]
c++: inherited constructor attributes
Inherited constructors are like constructor clones; they don't exist from
the language perspective, so they should copy the attributes in the same
way. But it doesn't make sense to copy alias or ifunc attributes in either
case. Unlike handle_copy_attribute, we do want to copy inlining attributes.
The discussion of PR110334 pointed out that we weren't copying the
always_inline attribute, leading to poor inlining choices.
PR c++/110334
gcc/cp/ChangeLog:
* cp-tree.h (clone_attrs): Declare.
* method.cc (implicitly_declare_fn): Use it for inherited
constructor.
* optimize.cc (clone_attrs): New.
(maybe_clone_body): Use it.
Alexandre Oliva [Wed, 28 Jun 2023 04:25:59 +0000 (01:25 -0300)]
Add leafy mode for zero-call-used-regs
Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
nonleaf functions, respectively.
for gcc/ChangeLog
* doc/extend.texi (zero-call-used-regs): Document leafy and
variants thereof.
* flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
LEAFY and variants.
* function.cc (gen_call_ued_regs_seq): Set only_used for leaf
functions in leafy mode.
* opts.cc (zero_call_used_regs_opts): Add leafy and variants.
Alexandre Oliva [Wed, 28 Jun 2023 04:25:55 +0000 (01:25 -0300)]
[testsuite] note pitfall in how outputs.exp sets gld
This patch documents a glitch in gcc.misc-tests/outputs.exp: it checks
whether the linker is GNU ld, and uses that to decide whether to
expect collect2 to create .ld1_args files under -save-temps, but
collect2 bases that decision on whether HAVE_GNU_LD is set, which may
be false zero if the linker in use is GNU ld. Configuring
--with-gnu-ld fixes this misalignment. Without that, atsave tests are
likely to fail, because without HAVE_GNU_LD, collect2 won't use @file
syntax to run the linker (so it won't create .ld1_args files).
Long version: HAVE_GNU_LD is set when (i) DEFAULT_LINKER is set during
configure, pointing at GNU ld; (ii) --with-gnu-ld is passed to
configure; or (iii) config.gcc sets gnu_ld=yes. If a port doesn't set
gnu_ld, and the toolchain isn't configured so as to assume GNU ld,
configure and thus collect2 conservatively assume the linker doesn't
support @file arguments.
But outputs.exp can't see how configure set HAVE_GNU_LD (it may be
used to test an installed compiler), and upon finding that the linker
used by the compiler is GNU ld, it will expect collect2 to use @file
arguments when running the linker. If that assumption doesn't hold,
atsave tests will fail.
for gcc/testsuite/ChangeLog
* gcc.misc-tests/outputs.exp (gld): Note a known mismatch and
record a workaround.
Jason Merrill [Sat, 24 Jun 2023 09:15:02 +0000 (05:15 -0400)]
c++: C++26 constexpr cast from void* [PR110344]
P2768 allows static_cast from void* to ob* in constant evaluation if the
pointer does in fact point to an object of the appropriate type.
cxx_fold_indirect_ref already does the work of finding such an object if it
happens to be a subobject rather than the outermost object at that address,
as in constexpr-voidptr2.C.
liuhongt [Mon, 26 Jun 2023 03:46:49 +0000 (11:46 +0800)]
Issue a warning for conversion between short and __bf16 under TARGET_AVX512BF16.
__bfloat16 is redefined from typedef short to real __bf16 since GCC
V13. The patch issues an warning for potential silent implicit
conversion between __bf16 and short where users may only expect a
data movement.
To avoid too many false positive, warning is only under
TARGET_AVX512BF16.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_invalid_conversion): New function.
(TARGET_INVALID_CONVERSION): Define as
ix86_invalid_conversion.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c: New test.
Robin Dapp [Tue, 20 Jun 2023 11:07:23 +0000 (13:07 +0200)]
RISC-V: Add autovec FP widening/narrowing.
This patch adds FP widening and narrowing expanders as well as tests.
Conceptually similar to integer extension/truncation, we emulate
_Float16 -> double by two vfwcvts and double -> _Float16 by two vfncvts.
gcc/ChangeLog:
* config/riscv/autovec.md (extend<v_double_trunc><mode>2): New
expander.
(extend<v_quad_trunc><mode>2): Ditto.
(trunc<mode><v_double_trunc>2): Ditto.
(trunc<mode><v_quad_trunc>2): Ditto.
* config/riscv/vector-iterators.md: Add VQEXTF and HF to
V_QUAD_TRUNC and v_quad_trunc.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-template.h: New test.
* gcc.target/riscv/rvv/autovec/binop/copysign-zvfh-run.c: New test.
Robin Dapp [Thu, 22 Jun 2023 06:58:06 +0000 (08:58 +0200)]
RISC-V: Split VF iterators for Zvfh(min).
When working on FP widening/narrowing I realized the Zvfhmin handling
is not ideal right now: We use the "enabled" insn attribute to disable
instructions not available with Zvfhmin but only with Zvfh.
However, "enabled == 0" only disables insn alternatives, in our case all
of them when the mode is a HFmode. The insn itself remains available
(e.g. for combine to match) and we end up with an insn without alternatives
that reload cannot handle --> ICE.
The proper solution is to disable the instruction for the respective
mode altogether. This patch achieves this by splitting the VF as well
as VWEXTF iterators into variants with TARGET_ZVFH and
TARGET_VECTOR_ELEN_FP_16 (which is true when either TARGET_ZVFH or
TARGET_ZVFHMIN are true). Also, VWCONVERTI, VHF and VHF_LMUL1 need
adjustments.
gcc/ChangeLog:
* config/riscv/autovec.md: VF_AUTO -> VF.
* config/riscv/vector-iterators.md: Introduce VF_ZVFHMIN,
VWEXTF_ZVFHMIN and use TARGET_ZVFH in VWCONVERTI, VHF and
VHF_LMUL1.
* config/riscv/vector.md: Use new iterators.
Robin Dapp [Mon, 26 Jun 2023 11:30:26 +0000 (13:30 +0200)]
match.pd: Use element_mode instead of TYPE_MODE.
This patch changes TYPE_MODE into element_mode in a match.pd
simplification. As the simplification can be also called with vector
types real_can_shorten_arithmetic would ICE in REAL_MODE_FORMAT which
expects a scalar mode. Therefore, use element_mode instead of
TYPE_MODE.
Additionally, check if the target supports the resulting operation. One
target that supports e.g. a float addition but not a _Float16 addition
is the RISC-V vector extension Zvfhmin.
gcc/ChangeLog:
* match.pd: Use element_mode and check if target supports
operation with new type.
Andrew Pinski [Tue, 27 Jun 2023 00:14:06 +0000 (17:14 -0700)]
Mark asm goto with outputs as volatile
The manual references asm goto as being implicitly volatile already
and that was done when asm goto could not have outputs. When outputs
were added to `asm goto`, only asm goto without outputs were still being
marked as volatile. Now some parts of GCC decide, removing the `asm goto`
is ok if the output is not used, though not updating the CFG (this happens
on both the RTL level and the gimple level). Since the biggest user of `asm goto`
is the Linux kernel and they expect them to be volatile (they use them to
copy to/from userspace), we should just mark the inline-asm as volatile.
Eric Botcazou [Tue, 20 Jun 2023 22:50:40 +0000 (00:50 +0200)]
ada: Fix bad interaction between inlining and thunk generation
This may cause the type of the RESULT_DECL of a function which returns by
invisible reference to be turned into a reference type twice.
gcc/ada/
* gcc-interface/trans.cc (Subprogram_Body_to_gnu): Add guard to the
code turning the type of the RESULT_DECL into a reference type.
(maybe_make_gnu_thunk): Use a more precise guard in the same case.
Eric Botcazou [Sat, 17 Jun 2023 21:46:54 +0000 (23:46 +0200)]
ada: Fix double finalization of case expression in concatenation
This streamlines the expansion of case expressions by not wrapping them in
an Expression_With_Actions node when the type is not by copy, which avoids
the creation of a temporary and the associated finalization issues.
That's the same strategy as the one used for the expansion of if expressions
when the type is by reference, unless Back_End_Handles_Limited_Types is set
to True. Given that it is never set to True, except by a debug switch, and
has never been implemented, this parameter is removed in the process.
gcc/ada/
* debug.adb (d.L): Remove documentation.
* exp_ch4.adb (Expand_N_Case_Expression): In the not-by-copy case,
do not wrap the case statement in an Expression_With_Actions node.
(Expand_N_If_Expression): Do not test
Back_End_Handles_Limited_Types
* gnat1drv.adb (Adjust_Global_Switches): Do not set it.
* opt.ads (Back_End_Handles_Limited_Types): Delete.
Eric Botcazou [Fri, 16 Jun 2023 08:13:46 +0000 (10:13 +0200)]
ada: Fix incorrect handling of iterator specifications in recent change
Unlike for loop parameter specifications where it references an index, the
defining identifier references an element in them.
gcc/ada/
* sem_ch12.adb (Check_Generic_Actuals): Check the component type
of constants and variables of an array type.
(Copy_Generic_Node): Fix bogus handling of iterator
specifications.
Eric Botcazou [Thu, 1 Jun 2023 13:43:41 +0000 (15:43 +0200)]
ada: Fix too late finalization and secondary stack release in iterator loops
Sem_Ch5 contains an entire machinery to deal with finalization actions and
secondary stack releases around iterator loops, so this removes a recent
fix that was made in a narrower case and instead refines the condition under
which this machinery is triggered.
As a side effect, given that finalization and secondary stack management are
still entangled in this machinery, this also fixes the counterpart of a leak
for the former, which is a finalization occurring too late.
gcc/ada/
* exp_ch4.adb (Expand_N_Quantified_Expression): Revert the latest
change as it is subsumed by the machinery in Sem_Ch5.
* sem_ch5.adb (Prepare_Iterator_Loop): Also wrap the loop
statement in a block in the name contains a function call that
returns on the secondary stack.
Eric Botcazou [Mon, 12 Jun 2023 07:52:42 +0000 (09:52 +0200)]
ada: Plug small loophole in the handling of private views in instances
This deals with nested instantiations in package bodies.
gcc/ada/
* sem_ch12.adb (Scope_Within_Body_Or_Same): New predicate.
(Check_Actual_Type): Take into account packages nested in bodies
to compute the enclosing scope by means of
Scope_Within_Body_Or_Same.
Viljar Indus [Thu, 8 Jun 2023 14:29:23 +0000 (17:29 +0300)]
ada: Update printing container aggregates for debugging
All N_Aggregate nodes were printed with parentheses "()". However
the new container aggregates (homogeneous N_Aggregate nodes) should
be printed with brackets "[]".
gcc/ada/
* sprint.adb (Print_Node_Actual): Print homogeneous N_Aggregate
nodes with brackets.