Paul Thomas [Tue, 3 Dec 2024 15:56:53 +0000 (15:56 +0000)]
Fortran: Fix class transformational intrinsic calls [PR102689]
2024-12-03 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/102689
* trans-array.cc (get_array_ref_dim_for_loop_dim): Use the arg1
class container carried in ss->info as the seed for a lhs in
class valued transformational intrinsic calls that are not the
rhs of an assignment. Otherwise, the lhs variable expression is
taken from the loop chain. For this latter case, the _vptr and
_len fields are set.
(gfc_trans_create_temp_array): Use either the lhs expression
seeds to build a class variable that will take the returned
descriptor as its _data field. In the case that the arg1 expr.
is used, 'atmp' must be marked as unused, a typespec built with
the correct rank and the _vptr and _len fields set. The element
size is provided for the temporary allocation and to set the
descriptor span.
(gfc_array_init_size): When an intrinsic type scalar expr3 is
used in allocation of a class array, use its element size in
the descriptor dtype.
* trans-expr.cc (gfc_conv_class_to_class): Class valued
transformational intrinsics return the pointer to the array
descriptor as the _data field of a class temporary. Extract
directly and return the address of the class temporary.
(gfc_conv_procedure_call): store the expression for the first
argument of a class valued transformational intrinsic function
in the ss info class_container field. Later, use its type as
the element type in the call to gfc_trans_create_temp_array.
(fcncall_realloc_result): Add a dtype argument and use it in
the descriptor, when available.
(gfc_trans_arrayfunc_assign): For class lhs, build a dtype with
the lhs rank and the rhs element size and use it in the call to
fcncall_realloc_result.
gcc/testsuite/
PR fortran/102689
* gfortran.dg/class_transformational_1.f90: New test for class-
valued reshape.
* gfortran.dg/class_transformational_2.f90: New test for other
class_valued transformational intrinsics.
Joseph Myers [Tue, 3 Dec 2024 13:01:58 +0000 (13:01 +0000)]
preprocessor: Adjust C rules on UCNs for C23 [PR117162]
As noted in bug 117162, C23 changed some rules on UCNs to match C++
(this was a late change agreed in the resolution to CD2 comment
US-032, implementing changes from N3124), which we need to implement.
Allow UCNs below 0xa0 outside identifiers for C, with a
pedwarn-if-pedantic before C23 (and a warning with -Wc11-c23-compat)
except for the always-allowed cases of UCNs for $ @ `. Also as part
of that change, do not allow \u0024 in identifiers as equivalent to $
for C23.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/117162
libcpp/
* include/cpplib.h (struct cpp_options): Add low_ucns.
* init.cc (struct lang_flags, lang_defaults): Add low_ucns.
(cpp_set_lang): Set low_ucns
* charset.cc (_cpp_valid_ucn): For C, allow UCNs below 0xa0
outside identifiers, with a pedwarn if pedantic before C23 or a
warning with -Wc11-c23-compat. Do not allow \u0024 in identifiers
for C23.
Richard Biener [Tue, 3 Dec 2024 07:56:35 +0000 (08:56 +0100)]
tree-optimization/117874 - optimize SLP discovery budget use
The following tries to avoid eating into the SLP discovery limit
when we can do cheaper checks first. Together with the previous
patch this allows to use two-lane SLP discovery for mult_su3_an
in 433.milc.
PR tree-optimization/117874
* tree-vect-slp.cc (vect_build_slp_tree_2): Perform early
reassoc checks before eating into discovery limit.
Richard Biener [Tue, 3 Dec 2024 07:52:48 +0000 (08:52 +0100)]
Use the number of relevant stmts to limit SLP build
The following removes scalar stmt counting from loop vectorization
and using that as base to limit both the SLP tree final size and
discovery. Instead use the number of relevant stmts for that
which is conveniently the number of stmt_vec_infos we create which
in turn includes things like pattern stmts.
PR tree-optimization/117874
* tree-vectorizer.h (vec_info_shared::n_stmts): Remove.
(LOOP_VINFO_N_STMTS): Likewise.
* tree-vectorizer.cc (vec_info_shared::vec_info_shared): Adjust.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Do not
count stmts.
(vect_analyze_loop_2): Adjust. Pass stmt_vec_info.length ()
to vect_analyze_slp as SLP tree size limit.
The previous version of the patch was based on the mistaken assumption that
features in /proc/cpuinfo had matching names to the feature names that gcc and
gas accept.
This patch enables the fp8 feature when the f8cvt feature is enabled, under the
assumption that fpmr is always enabled when f8cvt is.
Jonathan Wakely [Mon, 2 Dec 2024 15:13:52 +0000 (15:13 +0000)]
libstdc++: Make std::vector<bool> constructor noexcept (LWG 3778)
LWG 3778 was approved in November 2022. We already implement all the
changes except for one, which this commit does.
The new test verifies all the changes from LWG 3778, not just the one
implemented here.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (vector(const allocator_type&)):
Add noexcept, as per LWG 3778.
* testsuite/23_containers/vector/bool/cons/lwg3778.cc: New test.
Jakub Jelinek [Tue, 3 Dec 2024 10:17:49 +0000 (11:17 +0100)]
tree-ssanames, match.pd: get_nonzero_bits/with_*_nonzero_bits* cleanups and improvements [PR117420]
The following patch implements the with_*_nonzero_bits* cleanups and
improvements I was talking about.
get_nonzero_bits is extended to also handle BIT_AND_EXPR (as a tree or
as SSA_NAME with BIT_AND_EXPR def_stmt), new function is added for the
bits known to be set (get_known_nonzero_bits) and the match.pd predicates
are renamed and adjusted, so that there is no confusion on which one to
use (one is named and documented to be internal), changed so that it can be
used only as a simple predicate, not match some operands, and that it doesn't
try to match twice for the GIMPLE case (where SSA_NAME with integral or pointer
type matches, but SSA_NAME with BIT_AND_EXPR def_stmt matched differently).
Furthermore, get_nonzero_bits just returns the all bits set (or
get_known_nonzero_bits no bits set) fallback if the argument isn't a
SSA_NAME (nor INTEGER_CST or whatever the functions handle explicitly).
2024-12-03 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117420
* tree-ssanames.h (get_known_nonzero_bits): Declare.
* tree-ssanames.cc (get_nonzero_bits): New wrapper function. Move old
definition to ...
(get_nonzero_bits_1): ... here, add static. Change widest_int in
function comment to wide_int.
(get_known_nonzero_bits_1, get_known_nonzero_bits): New functions.
* match.pd (with_possible_nonzero_bits2): Rename to ...
(with_possible_nonzero_bits): ... this. Guard the bit_and case with
#if GENERIC. Change to a normal match predicate without parameters.
Rename the old with_possible_nonzero_bits match to ...
(with_possible_nonzero_bits_1): ... this.
(with_certain_nonzero_bits2): Remove.
(with_known_nonzero_bits_1, with_known_nonzero_bits): New match
predicates.
(X == C (or X & Z == Y | C) is impossible if ~nonzero(X) & C != 0):
Use with_known_nonzero_bits@0 instead of
(with_certain_nonzero_bits2 @1), use with_possible_nonzero_bits@0
instead of (with_possible_nonzero_bits2 @0) and
get_known_nonzero_bits (@1) instead of wi::to_wide (@1).
Jakub Jelinek [Tue, 3 Dec 2024 10:16:37 +0000 (11:16 +0100)]
bitintlower: Fix up ?ROTATE_EXPR lowering [PR117847]
In the ?ROTATE_EXPR lowering I forgot to handle rotation by 0 correctly.
INTEGER_CST 0 is very unlikely, it would be probably folded away, but
a non-constant count can't use just p - n because then the shift count
is out of bounds for zero.
In the FE I use n == 0 ? x : (x << n) | (x >> (p - n)) but bitintlower
here isn't prepared at this point to have bb split and am not sure if
using COND_EXPR is a good idea either, so the patch uses (p - n) % p.
Perhaps I should just disable lowering the rotate in the FE for the
non-mode precision BITINT_TYPEs too.
2024-12-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117847
* gimple-lower-bitint.cc (gimple_lower_bitint) <case LROTATE_EXPR>:
Use m = (p - n) % p instead of m = p - n for the other shift count.
Tobias Burnus [Tue, 3 Dec 2024 10:02:03 +0000 (11:02 +0100)]
OpenMP: 'allocate' directive - fixes for 'alignof' and [[omp::decl]]
Fixed a check to permit [[omp::decl(allocate,...)]] parsing in C.
Additionaly, we discussed that 'allocate align' should not affect
'alignof' to avoid issues like with:
int a;
_Alignas(_Alignof(a)) int b;
#pragma omp allocate(a) align(128)
_Alignas(_Alignof(a)) int c;
Thus, the alignment is no longer set in the C and Fortran front ends,
but for static variables now in varpool_node::finalize_decl.
(For stack variables, the alignment is handled in gimplify_bind_expr.)
NOTE: 'omp allocate' is not yet supported in C++.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_allocate): Only check scope if
not in_omp_decl_attribute. Remove setting the alignment.
gcc/ChangeLog:
* cgraphunit.cc (varpool_node::finalize_decl): Set alignment
based on OpenMP's 'omp allocate' attribute/directive.
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_finish_var_decl): Remove setting the alignment.
libgomp/ChangeLog:
* libgomp.texi (Memory allocation): Mention (non-)effect of 'align'
on _Alignof.
* testsuite/libgomp.c/allocate-7.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/allocate-18.c: Check that alignof is unaffected
by 'omp allocate'.
* c-c++-common/gomp/allocate-19.c: Likewise.
aarch64: Add flags field to aarch64-simd-pragma-builtins.def
This patch adds a flags field to aarch64-simd-pragma-builtins.def
and uses it to add attributes to the function declaration.
gcc/
* config/aarch64/aarch64-simd-pragma-builtins.def: Add a flags
field to each entry.
* config/aarch64/aarch64-builtins.cc: Update includes accordingly.
(aarch64_pragma_builtins_data): Add a flags field.
(aarch64_init_pragma_builtins): Use the flags field to add attributes
to the function declaration.
Saurabh Jha [Tue, 3 Dec 2024 09:54:01 +0000 (09:54 +0000)]
aarch64: Add support for AdvSIMD lut
The AArch64 FEAT_LUT extension is optional from Armv9.2-A and mandatory
from Armv9.5-A. It introduces instructions for lookup table reads with
bit indices.
This patch adds support for AdvSIMD lut intrinsics. The intrinsics for
this extension are implemented as the following builtin functions:
* vluti2{q}_lane{q}_{u8|s8|p8}
* vluti2{q}_lane{q}_{u16|s16|p16|f16|bf16}
* vluti4q_lane{q}_{u8|s8|p8}
* vluti4q_lane{q}_{u16|s16|p16|f16|bf16}_x2
We also introduced a new approach to do lane checks for AdvSIMD.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_signatures): Add binary_lane.
(aarch64_fntype): Handle it.
(simd_types): Add 16-bit x2 types.
(aarch64_pragma_builtins_checker): New class.
(aarch64_general_check_builtin_call): Use it.
(aarch64_expand_pragma_builtin): Add support for lut unspecs.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Add lut option.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_BINARY_LANE): Modify to use new ENTRY macro.
(ENTRY_TERNARY_VLUT8): Macro to declare lut intrinsics.
(ENTRY_TERNARY_VLUT16): Macro to declare lut intrinsics.
(REQUIRED_EXTENSIONS): Declare lut intrinsics.
* config/aarch64/aarch64-simd.md
(@aarch64_<vluti_uns_op><VLUT:mode><VB:mode>): Instruction
pattern for luti2 and luti4 intrinsics.
(@aarch64_lutx2<VLUT:mode><VB:mode>): Instruction pattern for
luti4x2 intrinsics.
* config/aarch64/aarch64.h
(TARGET_LUT): lut flag.
* config/aarch64/iterators.md: Iterators and attributes for lut.
* doc/invoke.texi: Document extension in AArch64 Options.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/lut-incorrect-range.c: New test.
* gcc.target/aarch64/simd/lut-no-flag.c: New test.
* gcc.target/aarch64/simd/lut.c: New test.
Co-authored-by: Vladimir Miloserdov <vladimir.miloserdov@arm.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Saurabh Jha [Tue, 3 Dec 2024 09:54:00 +0000 (09:54 +0000)]
aarch64: Refactor AdvSIMD intrinsics
Refactor AdvSIMD intrinsics defined using the new pragma-based approach
so that it is more extensible.
Introduce a new struct, simd_type, which defines types using a mode and
qualifiers, and use objects of this struct in the declaration of intrinsics
in the aarch64-simd-pragma-builtins.def file.
Change aarch64_pragma_builtins_data struct to support return type and
argument types.
Refactor aarch64_fntype and aarch64_expand_pragma_builtin so that it
initialises corresponding vectors in a loop. As we add intrinsics with
more arguments, these functions won't need to change to support those.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(ENTRY): Modify to add support of return and argument types.
(struct simd_type): New struct to declare types using mode and
qualifiers.
(struct aarch64_pragma_builtins_data): Replace mode with
the array of types to support return and argument types.
(aarch64_fntype): Modify to handle different signatures.
(aarch64_expand_pragma_builtin): Modify to handle different
signatures.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_VHSDF): Rename to ENTRY_BINARY_VHSDF.
(ENTRY_BINARY): New macro to declare binary intrinsics.
(ENTRY_BINARY_VHSDF): Remove signature argument and use
ENTRY_BINARY.
Co-authored-by: Vladimir Miloserdov <vladimir.miloserdov@arm.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
We had a function called aarch64_vq_mode, where "vq" stood for "vector
quadword". It was used by aarch64_simd_container_mode (from which it
originated) and in preparation for various SVE ...Q instructions.
It's useful for follow-on patches if we also split out the handling
of 64-bit modes from aarch64_simd_container_mode. Keeping to the
same naming scheme would replace "q" with "d", but that has
unfortunate connotations, and doesn't AFAIK correspond to any
actual SVE mnemonics.
This patch therefore splits the handling out into a function called
aarch64_v64_mode and renames aarch64_vq_mode to aarch64_v128_mode for
consistency. I didn't rename the "vq" local variables, since I think
those names make sense in context.
aarch64: Move some diagnostic functions to aarch64.cc
Some of the diagnostics reported for SVE builtins would also be
useful for Advanced SIMD builtins, so this patch moves them from
aarch64-sve-builtins.cc to aarch64.cc. I put them in a new aarch64
namespace for now -- perhaps in future they should be generic.
gcc/
* config/aarch64/aarch64-sve-builtins.cc (report_non_ice)
(report_out_of_range, report_neither_nor, report_not_one_of)
(report_not_enum): Move to...
* config/aarch64/aarch64.cc: ...here, putting them in the aarch64
namespace, and...
* config/aarch64/aarch64-protos.h: ...declare them here.
Pan Li [Fri, 29 Nov 2024 12:33:19 +0000 (20:33 +0800)]
Match: Refactor the unsigned SAT_SUB match patterns [NFC]
This patch would like to refactor the all unsigned SAT_SUB patterns, aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of unsigned SAT_SUB match patterns.
Pan Li [Tue, 3 Dec 2024 06:08:07 +0000 (14:08 +0800)]
RISC-V: Fix incorrect optimization options passing to reduc and ternop
Like the strided load/store, the testcases of vector reduce and ternop
are designed to pick up different sorts of optimization options but
actually these option are ignored according to the Execution log of
the gcc.log.
This patch would like to make it correct almost the same as what we
fixed for strided load/store.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.
Heiko Eißfeldt [Tue, 3 Dec 2024 08:47:59 +0000 (09:47 +0100)]
replace atoi with strtoul in varasm.cc (decode_reg_name_and_count) [PR114540]
The function uses atoi, which can silently return valid numbers even for
some too large numbers in the string.
Furthermore, the verification that all the characters in asmspec are
decimal digits can be simplified when using strotoul, we can check just
the first digit and whether the end pointer points to '\0'.
2024-12-03 Heiko Eißfeldt <heiko@hexco.de>
PR middle-end/114540
* varasm.cc (decode_reg_name_and_count): Use strtoul instead of atoi
and simplify verification that the whole asmspec contains just decimal
digits.
* gcc.dg/pr114540.c: New test.
Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> Co-authored-by: Jakub Jelinek <jakub@redhat.com>
With SLP forced we fail to consider using single-lane SLP for a case
that we still end up discovering as hybrid (in the PR in question
this is because we run into the SLP discovery limit due to excessive
association).
Pan Li [Mon, 2 Dec 2024 13:57:53 +0000 (21:57 +0800)]
RISC-V: Fix incorrect optimization options passing to cond and builtin
Like the strided load/store, the testcases of vector cond and builtin are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we
fixed for strided load/store.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.
These files only still exist upstream; they should have been removed as
part of commit 104cc285533e742726ae18a7d3d4f384dd20c350
"gccrs: Refactor TypeResolution to be a simple query based system".
Jonathan Wakely [Thu, 28 Nov 2024 12:32:59 +0000 (12:32 +0000)]
libstdc++: Simplify std::_Destroy using 'if constexpr'
This is another place where we can use 'if constexpr' to replace
dispatching to a specialized class template, improving compile times and
avoiding a function call.
libstdc++-v3/ChangeLog:
* include/bits/stl_construct.h (_Destroy(FwdIter, FwdIter)): Use
'if constexpr' instead of dispatching to a member function of a
class template.
(_Destroy_n(FwdIter, Size)): Likewise.
(_Destroy_aux, _Destroy_n_aux): Only define for C++98.
Patrick Palka [Mon, 2 Dec 2024 15:58:50 +0000 (10:58 -0500)]
c++: some further concepts cleanups
This patch further cleans up the concepts code following the removal of
Concepts TS support:
* concept-ids are now the only kind of "concept check", so we can
simplify some code accordingly. In particular resolve_concept_check
seems like a no-op and can be removed.
* In turn, deduce_constrained_parameter doesn't seem to do anything
interesting.
* In light of the above we might as well inline finish_type_constraints
into its only caller.
* Introduce and use a helper for obtaining the prototype parameter of
a concept, i.e. its first template parameter.
* placeholder_extract_concept_and_args is only ever called on a
concept-id, so it's simpler to inline it into its callers.
* There's no such thing as a template-template-parameter with a
type-constraint, so we can remove such handling from the parser.
This means is_constrained_parameter is currently equivalent to
declares_constrained_type_template_parameter, so let's prefer
to use the latter.
* Remove WILDCARD_DECL and instead use the concept's prototype parameter
as the dummy first argument of a type-constraint during template
argument coercion.
* Remove a redundant concept_definition_p overload.
gcc/cp/ChangeLog:
* constraint.cc (resolve_concept_check): Remove.
(deduce_constrained_parameter): Remove.
(finish_type_constraints): Inline into its only caller
cp_parser_placeholder_type_specifier and remove.
(build_concept_check_arguments): Coding style tweaks.
(build_standard_check): Inline into its only caller ...
(build_concept_check): ... here.
(build_type_constraint): Use the prototype parameter as the
first template argument.
(finish_shorthand_constraint): Remove function concept
handling. Use concept_prototype_parameter.
(placeholder_extract_concept_and_args): Inline into its
callers and remove.
(equivalent_placeholder_constraints): Adjust after
placeholder_extract_concept_and_args removal.
(iterative_hash_placeholder_constraint): Likewise.
* cp-objcp-common.cc (cp_common_init_ts): Remove WILDCARD_DECL
handling.
* cp-tree.def (WILDCARD_DECL): Remove.
* cp-tree.h (WILDCARD_PACK_P): Remove.
(type_uses_auto_or_concept): Remove declaration of nonexistent
function.
(append_type_to_template_for_access_check): Likewise.
(finish_type_constraints): Remove declaration.
(placeholder_extract_concept_and_args): Remove declaration.
(deduce_constrained_parameter): Remove declaration.
(resolve_constraint_check): Remove declaration.
(valid_requirements_p): Remove declaration of nonexistent
function.
(finish_concept_name): Likewise.
(concept_definition_p): Remove redundant overload.
(concept_prototype_parameter): Define.
* cxx-pretty-print.cc (pp_cxx_constrained_type_spec): Adjust
after placeholder_extract_concept_and_args.
* error.cc (dump_decl) <case WILDCARD_DECL>: Remove.
(dump_expr) <case WILDCARD_DECL>: Likewise.
* parser.cc (is_constrained_parameter): Inline into
declares_constrained_type_template_parameter and remove.
(cp_parser_check_constrained_type_parm): Declare static.
(finish_constrained_template_template_parm): Remove.
(cp_parser_constrained_template_template_parm): Remove.
(finish_constrained_parameter): Remove dead code guarded by
cp_parser_constrained_template_template_parm.
(declares_constrained_type_template_parameter): Adjust after
is_constrained_parameter removal.
(declares_constrained_template_template_parameter): Remove.
(cp_parser_placeholder_type_specifier): Adjust after
finish_type_constraints removal. Check the prototype parameter
earlier, before build_type_constraint.
Use concept_prototype_parameter.
(cp_parser_parameter_declaration): Remove dead code guarded by
declares_constrained_template_template_parameter.
* pt.cc (convert_wildcard_argument): Remove.
(convert_template_argument): Remove WILDCARD_DECL handling.
(coerce_template_parameter_pack): Likewise.
(tsubst) <case TEMPLATE_TYPE_PARM>: Likewise.
(type_dependent_expression_p): Likewise.
(make_constrained_placeholder_type): Remove function concept
handling.
(placeholder_type_constraint_dependent_p): Remove WILDCARD_DECL
handling.
Andreas Schwab [Thu, 21 Nov 2024 14:35:01 +0000 (15:35 +0100)]
m68k: don't allow o/o in movdi, movdf, movxf
The movdi, movdf and movxf patterns allow both operands to be offsettable
memory, but output_move_double cannot handle overlapping objects. This is
visible in the failure of gcc.c-torture/execute/pr97073.c when compiled
with LTO (where cprop optimizes out the AND operation; the failure also
occurs without LTO when the AND is removed). Split the constraints so
that the operands cannot both be "o" in the same insn.
* config/m68k/m68k.md (movdi+1, movdf+1, movxf+2): Split
constraints so that the operands cannot both be "o".
Jakub Jelinek [Mon, 2 Dec 2024 13:51:57 +0000 (14:51 +0100)]
Add trailing newlines where needed
Especially in the recent CRC commits, I see
\ No newline at end of file
in almost every second file. So, I went through
the diff between r15-1 and current trunk in gcc/, looking for
additions of such problems which don't intentional (e.g.
Wtrailing-whitespace* tests had it there intentionally) and
just added the missing newline elsewhere.
Andre Vieira [Mon, 2 Dec 2024 13:35:03 +0000 (13:35 +0000)]
arm, mve: Adding missing Runtime Library Exception to header files
Add missing Runtime Library Exception to mve header files to bring them into
line with other similar headers. Not adding it in the first place was an
oversight.
Richard Biener [Mon, 2 Dec 2024 10:07:46 +0000 (11:07 +0100)]
tree-optimization/116352 - SLP scheduling and stmt order
The PR uncovers unchecked constraints on the ability to code-generate
with SLP but also latent issues with regard to stmt order checking
since loop (early-break) and BB (for quite some time) vectorization
are no longer constraint to single-BBs. In particular get_later_stmt
simply compares UIDs of stmts, but that's only reliable when they
are in the same BB.
For the PR in question the problematical case is demoting a SLP node
to external which fails to check we can actually code generate this
in the way we do (using get_later_stmt). The following thus adds
checking that we demote to external only when all defs are from
the same BB.
We no longer vectorize gcc.dg/vect/bb-slp-49.c but the testcase was
for a wrong-code issue and the vectorization done is a no-op.
Jakub Jelinek [Mon, 2 Dec 2024 12:55:02 +0000 (13:55 +0100)]
testsuite: Adjust rs6000-ldouble-2.c for switch to -std=gnu23 by default [PR117663]
-std=gnu23/-std=c23 changes LDBL_EPSILON for IBM long double, see r13-3029 and
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602738.html
for details.
That change even had a note:
"and when we move to a C2x
default, gcc.target/powerpc/rs6000-ldouble-2.c will need an
appropriate option added to keep using an older language version"
The following patch just implements it to fix rs6000-ldouble-2.c regression.
2024-12-02 Jakub Jelinek <jakub@redhat.com>
PR testsuite/117663
* gcc.target/powerpc/rs6000-ldouble-2.c: Add -std=gnu17 to dg-options.
yulong [Mon, 2 Dec 2024 01:31:53 +0000 (09:31 +0800)]
RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.
This commit adds intrinsics support for XXsfvfnrclipxfqf. We also redefine
the enum type frm_op_type in riscv-vector-builtins-bases.h file, because it
be used in sifive-vector-builtins-bases.cc file.
Pan Li [Fri, 29 Nov 2024 03:57:34 +0000 (11:57 +0800)]
RISC-V: Fix incorrect optimization options passing to widden
Like the strided load/store, the testcases of vector widen are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we fixed for
strided load/store.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.
This patch would like to fix the testcases failures of strided
load/store after sorts of optimization option passing to testcase.
* Add no strict align for vector option.
* Adjust dg-final by any-opts and/or no-opts if the rtl dump changes
on different optimization options (like O2, O3, zvl).
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Slava Barinov [Sun, 1 Dec 2024 18:59:13 +0000 (11:59 -0700)]
[PATCH] gcc: configure: Fix the optimization flags cleanup
Currently sed command in flag cleanup removes all the -O[0-9] flags, ignoring
the context. This leads to issues when the optimization flags is passed to
linker:
CFLAGS="-Os -Wl,-O1 -Wl,--hash-style=gnu"
is converted into
CFLAGS="-Os -Wl,-Wl,--hash-style=gnu"
Which leads to configure failure with ld: unrecognized option '-Wl,-Wl'.
gcc/
* configure.ac: Only remove -O[0-9] if not preceded with comma
* configure: Regenerated
Jovan Vukic [Sun, 1 Dec 2024 18:57:41 +0000 (11:57 -0700)]
Thanks for the feedback on the first version of the patch. Accordingly:
I have corrected the code formatting as requested. I added new tests to
the existing file phi-opt-11.c, instead of creating a new one.
I performed testing before and after applying the patch on the x86
architecture, and I confirm that there are no new regressions.
The logic and general code of the patch itself have not been changed.
> So the A EQ/NE B expression, we can reverse A and B in the expression
> and still get the same result. But don't we have to be more careful for
> the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for
> BIT_IOR we need ? b : a.
>
> I don't see that gets verified in the existing code or after your
> change. I suspect I'm just missing something here. Can you clarify how
> we verify that BIT_AND gets ? a : b for the true/false arms and that
> BIT_IOR gets ? b : a for the true/false arms?
I did not communicate this clearly last time, but the existing optimization
simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b".
Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a".
Thus, the existing and my optimization perform the following
simplifications:
(cond & (a == b)) ? a : b -> b
(cond & (a == b)) ? b : a -> a
(cond | (a != b)) ? a : b -> a
(cond | (a != b)) ? b : a -> b
For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to
confirm that one operand matches the true/false arm and the other matches
the false/true arm. In both cases, we simplify the expression to the third
operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2).
This is achieved in the value_replacement function after successfully
setting the value of *code within the rhs_is_fed_for_value_replacement
function to EQ_EXPR.
For BIT_IOR_EXPR, the same check is performed for A NE B, except now
*code remains NE_EXPR, and then value_replacement returns the second
operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1).
2024-10-30 Jovan Vukic <Jovan.Vukic@rt-rk.com>
gcc/ChangeLog:
* tree-ssa-phiopt.cc (rhs_is_fed_for_value_replacement): Add a new
optimization opportunity for BIT_IOR_EXPR and a != b.
(operand_equal_for_value_replacement): Ditto.
Mariam Arutunian [Mon, 11 Nov 2024 20:01:19 +0000 (13:01 -0700)]
[PATCH v7 11/12] Replace the original CRC loops with a faster CRC calculation
After the loop exit an internal function call (CRC, CRC_REV) is added, and its
result is assigned to the output CRC variable (the variable where the
calculated CRC is stored after the loop execution). The removal of the loop is
left to CFG cleanup and DCE.
gcc/
* gimple-crc-optimization.cc (optimize_crc_loop): New function.
(execute): Add optimize_crc_loop function call.
Mariam Arutunian [Mon, 11 Nov 2024 20:00:37 +0000 (13:00 -0700)]
[PATCH v7 10/12] Verify detected CRC loop with symbolic execution and LFSR matching
Symbolically execute potential CRC loops and check whether the loop actually
calculates CRC (uses LFSR matching). Calculated CRC and created LFSR are
compared on each iteration of the potential CRC loop.
gcc/
* Makefile.in (OBJS): Add crc-verification.o.
* crc-verification.cc: New file.
* crc-verification.h: New file.
* gimple-crc-optimization.cc (loop_calculates_crc): New function.
(is_output_crc): Likewise.
(swap_crc_and_data_if_needed): Likewise.
(validate_crc_and_data): Likewise.
(optimize_crc_loop): Likewise.
(get_output_phi): Likewise.
(execute): Add check whether potential CRC loop calculates CRC.
* sym-exec/sym-exec-state.cc (create_reversed_lfsr): New function.
(create_forward_lfsr): Likewise.
(last_set_bit): Likewise.
(create_lfsr): Likewise.
* sym-exec/sym-exec-state.h (is_bit_vector): Reorder, make the function public and static.
(create_reversed_lfsr) New static function declaration.
(create_forward_lfsr) New static function declaration.
Gives an opportunity to execute the code on bit level, assigning
symbolic values to the variables which don't have initial values.
Supports only CRC specific operations.
Example:
uint8_t crc;
uint8_t pol = 1;
crc = crc ^ pol;
during symbolic execution crc's value will be:
crc(8), crc(7), ... crc(1), crc(0) ^ 1
gcc/
* Makefile.in (OBJS): Add sym-exec/sym-exec-expression.o,
sym-exec/sym-exec-state.o, sym-exec/sym-exec-condition.o.
* configure (sym-exec): New subdir.
* sym-exec/sym-exec-condition.cc: New file.
* sym-exec/sym-exec-condition.h: New file.
* sym-exec/sym-exec-expr-is-a-helper.h: New file.
* sym-exec/sym-exec-expression.cc: New file.
* sym-exec/sym-exec-expression.h: New file.
* sym-exec/sym-exec-state.cc: New file.
* sym-exec/sym-exec-state.h: New file.
Mariam Arutunian [Mon, 11 Nov 2024 19:59:04 +0000 (12:59 -0700)]
[PATCH v7 08/12] Add a new pass for naive CRC loops detection
This patch adds a new compiler pass aimed at identifying naive CRC
implementations, characterized by the presence of a loop calculating
a CRC (polynomial long division). Upon detection of a potential CRC,
the pass prints an informational message.
Performs CRC optimization if optimization level is >= 2 and if
fno_gimple_crc_optimization given.
This pass is added for the detection and optimization of naive CRC
implementations, improving the efficiency of CRC-related computations.
This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be
provided in subsequent patches.
gcc/
* Makefile.in (OBJS): Add gimple-crc-optimization.o.
* common.opt (foptimize-crc): New option.
* common.opt.urls: Regenerate to add foptimize-crc.
* doc/invoke.texi (-foptimize-crc): Add documentation.
* gimple-crc-optimization.cc: New file.
* opts.cc (default_options_table): Add OPT_foptimize_crc.
(enable_fdo_optimizations): Enable optimize_crc.
* passes.def (pass_crc_optimization): Add new pass.
* timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
* tree-pass.h (make_pass_crc_optimization): New extern function
declaration.
Mark Harmstone [Thu, 7 Nov 2024 03:59:18 +0000 (03:59 +0000)]
Write binary annotations for CodeView S_INLINESITE symbols
Add "binary annotations" at the end of CodeView S_INLINESITE symbols,
which are a series of compressed integers that represent how line
numbers map to addresses.
This requires assembler support; you will need commit b3aa594d ("gas:
add .cv_ucomp and .cv_scomp pseudo-directives") in binutils.
gcc/
* configure.ac (HAVE_GAS_CV_UCOMP): New check.
* configure: Regenerate.
* config.in: Regenerate.
* dwarf2codeview.cc (enum binary_annotation_opcode): Define.
(struct codeview_function): Add htab_next and inline_loc;
(struct cv_func_hasher): Define.
(cv_func_htab): New global variable.
(new_codeview_function): Add new codeview_function to hash table.
(codeview_begin_block): Record location of inline block.
(codeview_end_block): Add dummy source line at end of inline block.
(find_line_function): New function.
(write_binary_annotations): New function.
(write_s_inlinesite): Call write_binary_annotations.
(codeview_debug_finish): Delete cv_func_htab.
testsuite: Silence gcc.dg/pr117806.c for default_packed
On default_packed targets like PRU, spurious warnings are emitted:
...workspace/gcc/gcc/testsuite/gcc.dg/pr117806.c:5:3: warning: 'packed' attribute ignored for field of type 'double' [-Wattributes]
Fix by annotating the excess warnings for default_packed targets.
gcc/testsuite/ChangeLog:
* gcc.dg/pr117806.c: Test can spill excess
errors for default_packed targets.
Andrew Pinski [Sat, 30 Nov 2024 22:09:48 +0000 (14:09 -0800)]
VN: Don't recurse on for the same value of `a != 0` [PR117859]
Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case.
After adding vn_valueize to the handle the `a ==/!= 0` case
of insert_predicates_for_cond, it would go into an infinite loop
as the Value number for a could be the same as what it
is for the whole expression. This avoids that recursion so there is
no infinite loop here.
Note lim was introducing `bool_var2 = bool_var1 != 0` originally but
with the gimple testcase in -2, there is no dependency on what passes
before hand will do.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117859
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
valueization for the new lhs for `lhs != 0`
is the same as the old ones, don't recurse.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr117859-1.c: New test.
* gcc.dg/torture/pr117859-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 30 Nov 2024 21:12:13 +0000 (13:12 -0800)]
gimple-lim: Reuse boolean var when moving PHI
While looking into PR 117859, I noticed that LIM
sometimes would produce `bool_var2 = bool_var1 != 0` instead
of just using bool_var2. This patch allows LIM to reuse bool_var1
in the place where bool_var2 was going to be used.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-loop-im.cc (move_computations_worker): While moving
phi, reuse the lhs of the conditional if it is a boolean type.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 05:11:42 +0000 (21:11 -0800)]
testsuite: Fix aarch64/sve/acle/general-c/gnu_vectors_[12].c for taking address of vector element
After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get
each of the elements like it was an array. There is no reason why taking the address of that
won't be invalid too. especially since we are limiting to the first N elements (where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.
Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Remove
error message on taking address of an element of a vector.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 05:04:10 +0000 (21:04 -0800)]
testsuite: Fix aarch64/sve/acle/general-c++/gnu_vectors_[12].C for taking address of vector element
After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get
each of the elements like it was an array. There is no reason why taking the address of that
won't be invalid too. especially since we are limiting to the first N elements (where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.
Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Remove
error message on taking address of an element of a vector.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 04:58:14 +0000 (20:58 -0800)]
testsuite: Fix sve-sizeless-[12].C for C++98
In C++98 `{ a }` for aggregates can only mean constructing by
each element rather than a copy. This adds the expected error
message for SVE vectors for C++98.
Pushed as obvious after a test for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.dg/ext/sve-sizeless-1.C: Add error message for line 164
for C++98 only.
* g++.dg/ext/sve-sizeless-2.C: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 04:40:13 +0000 (20:40 -0800)]
testsuite: Fix another issue with sve-sizeless-[12].C
There is a different error message expected on line 165 (for both files).
It was expecting:
error: cannot convert 'svint16_t' to 'sveint8_t' in initialization
But now we get:
error: cannot convert 'svint16_t' to 'signed char' in initialization
This is because we support constructing scalable vectors rather than before.
So just update error message.
Pushed as obvious after a quick test for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.dg/ext/sve-sizeless-1.C: Update error message for line 165.
* g++.dg/ext/sve-sizeless-2.C: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This patch adds optimization of the following patterns:
(zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
(xor:M (zero_extend:M (subreg:N (X:M)), mask))
... where the mask is GET_MODE_MASK (N).
For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr112398.c: New test.
* gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su.
* gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka.
Jonathan Wakely [Mon, 25 Nov 2024 13:52:19 +0000 (13:52 +0000)]
libstdc++: Move std::monostate to <utility> for C++26 (P0472R2)
Another C++26 paper just approved in Wrocław. The std::monostate class
is defined in <variant> since C++17, but for C++26 it should also be
available in <utility>.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add bits/monostate.h.
* include/Makefile.in: Regenerate.
* include/std/utility: Include <bits/monostate.h>.
* include/std/variant (monostate, hash<monostate>): Move
definitions to ...
* include/bits/monostate.h: New file.
* testsuite/20_util/headers/utility/synopsis.cc: Add monostate
and hash<monostate> declarations.
* testsuite/20_util/monostate/requirements.cc: New test.
Lewis Hyatt [Mon, 28 Oct 2024 16:52:31 +0000 (12:52 -0400)]
Support for 64-bit location_t: Internal parts
Several of the selftests in diagnostic-show-locus.cc and input.cc are
sensitive to linemap internals. Adjust them here so they will support 64-bit
location_t if configured.
Likewise, handle 64-bit location_t in the support for
-fdump-internal-locations. As was done with the analyzer, convert to
(unsigned long long) explicitly so that 32- and 64-bit can be handled with
the same printf formats.
gcc/ChangeLog:
* diagnostic-show-locus.cc
(test_one_liner_fixit_validation_adhoc_locations): Adapt so it can
effectively test 7-bit ranges instead of 5-bit ranges.
(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
* input.cc (get_end_location): Adjust types to support 64-bit
location_t.
(write_digit_row): Likewise.
(dump_location_range): Likewise.
(dump_location_info): Likewise.
(class line_table_case): Likewise.
(test_accessing_ordinary_linemaps): Replace some hard-coded
constants with the values defined in line-map.h.
(for_each_line_table_case): Likewise.
Lewis Hyatt [Tue, 26 Nov 2024 16:53:36 +0000 (11:53 -0500)]
Support for 64-bit location_t: toplev parts
With the upcoming move from 32-bit to 64-bit location_t, the recommended
number of range bits will change from 5 to 7. line-map.h now exports the
recommended setting, so use that instead of hard-coding 5.
gcc/ChangeLog:
* toplev.cc (general_init): Replace hard-coded constant with
line_map_suggested_range_bits.
Lewis Hyatt [Sat, 26 Oct 2024 15:13:30 +0000 (11:13 -0400)]
Support for 64-bit location_t: Backend parts
A few targets have been using "unsigned int" function arguments that need to
receive a "location_t". Change to "location_t" to prepare for the
possibility that location_t can be configured to be a different type.
Joseph Myers [Sat, 30 Nov 2024 16:15:51 +0000 (16:15 +0000)]
gimplify: Handle void expression as asm input [PR100501, PR100792]
As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C
tests involving a statement expression not returning a value as an asm
input; this includes the variant bug 100792 where the statement
expression ends with another asm statement.
The expected diagnostic for this case (as seen for C++ input) is one
coming from the gimplifier and so it seems reasonable to fix the
gimplifier to handle the GENERIC generated for this case by the C
front end, rather than trying to make the C front end detect it
earlier. Thus the gimplifier to handle a void
expression like other non-lvalues for such a memory input.
Bootstrapped with no regressions for x86_64-pc-linux-gnu. OK to commit?
PR c/100501
PR c/100792
gcc/
* gimplify.cc (gimplify_asm_expr): Handle void expressions for
memory inputs like other non-lvalues.
Mark Harmstone [Tue, 19 Nov 2024 00:55:25 +0000 (00:55 +0000)]
Write S_INLINEELINES CodeView subsection
When outputting the .debug$S CodeView section, also write an
S_INLINEELINES subsection, which records the filename and line number of
the start of each inlined function.
gcc/
* dwarf2codeview.cc (DEBUG_S_INLINEELINES): Define.
(CV_INLINEE_SOURCE_LINE_SIGNATURE): Define.
(struct codeview_inlinee_lines): Define.
(struct inlinee_lines_hasher): Define.
(func_htab, inlinee_lines_htab): New global variables.
(get_file_id): New function.
(codeview_source_line): Move file_id logic to get_file_id.
(write_inlinee_lines_entry): New function.
(write_inlinee_lines): New function.
(codeview_debug_finish): Call write_inlinee_lines, and free func_htab
and inlinee_lines_htab.
(get_func_id): New function.
(add_function): Move func_id logic to get_func_id.
(codeview_abstract_function): New function.
* dwarf2codeview.h (codeview_abstract_function): Add declaration.
* dwarf2out.cc (dwarf2out_abstract_function): Call
codeview_abstract_function if outputting CodeView debug info.
Mark Harmstone [Tue, 19 Nov 2024 00:52:55 +0000 (00:52 +0000)]
Don't output CodeView line numbers for inlined functions
If we encounter an inlined function, treat it as another
codeview_function, and skip over these when outputting line numbers.
This information will instead be output as part of the S_INLINESITE
symbols.
gcc/
* dwarf2codeview.cc (struct codeview_function): Add parent and
inline_block fields.
(cur_func): New global variable.
(new_codeview_function): New function.
(codeview_source_line): Call new_codeview_function, and use cur_func
instead of last_func.
(codeview_begin_block): New function.
(codeview_end_block): New function.
(write_line_numbers): No longer free data as we go along.
(codeview_switch_text_section): Call new_codeview_function, and use
cur_func instead of last_func.
(codeview_end_epilogue): Use cur_func instead of last_func.
(codeview_debug_finish): Free funcs list and its contents.
* dwarf2codeview.h (codeview_begin_block): Add declaration.
(codeview_end_block): Add declaration.
* dwarf2out.cc (dwarf2out_begin_block): Call codeview_begin_block if
outputting CodeView debug info.
(dwarf2out_end_block): Call codeview_end_block if outputting CodeView
debug info.
Mark Harmstone [Mon, 28 Oct 2024 22:32:29 +0000 (22:32 +0000)]
Add block parameter to begin_block debug hook
Add a parameter to the begin_block debug hook that is a pointer to the
tree_node of the block in question. CodeView needs this as it records
line numbers of inlined functions in a different manner, so we need to
be able to tell if the block is actually the start of an inlined
function.
gcc/
* debug.cc (do_nothing_debug_hooks): Change begin_block
function pointer.
(debug_nothing_int_int_tree): New function.
* debug.h (struct gcc_debug_hooks): Add tree parameter to begin_block.
(debug_nothing_int_int_tree): Add declaration.
* dwarf2out.cc (dwarf2out_begin_block): Add tree parameter.
(dwarf2_lineno_debug_hooks): Use new dummy function for begin_block.
* final.cc (final_scan_insn_1): Pass insn block through to
debug_hooks->begin_block.
* vmsdbgout.cc (vmsdbgout_begin_block): Add tree parameter.
Georg-Johann Lay [Sat, 30 Nov 2024 13:58:05 +0000 (14:58 +0100)]
AVR: ad target/84211 - Split MOVW into MOVs in try_split_any.
When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable. When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.
gcc/
PR target/84211
* config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer
reg=reg move over reg=const when splitting a reg=reg insn.
Jakub Jelinek [Sat, 30 Nov 2024 10:30:08 +0000 (11:30 +0100)]
strlen: Handle vector CONSTRUCTORs [PR117057]
The following patch handles VECTOR_TYPE_P CONSTRUCTORs in
count_nonzero_bytes, including handling them if they have some elements
non-constant.
If there are still some constant elements before it (in the range queried),
we derive info at least from those bytes and consider the rest as unknown.
The first 3 hunks just punt in IMHO problematic cases, the spaghetti code
considers byte_size 0 as unknown size, determine yourself, so if offset
is equal to exp size, there are 0 bytes to consider (so nothing useful
to determine), but using byte_size 0 would mean use any size.
Similarly, native_encode_expr uses int type for offset (and size), so
padding it offset larger than INT_MAX could be silent miscompilation.
I've guarded the test to just a couple of targets known to handle it,
because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR
into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly
that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec
-mno-vsx, even -m32).
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117057
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also
when byte_size is equal to offset or nchars. Punt if offset is bigger
than INT_MAX. Handle vector CONSTRUCTOR with some elements constant,
possibly followed by non-constant.
* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign
specific scan-tree-dump-times directive.
* gcc.dg/strlenopt-96.c: New test.
Jakub Jelinek [Sat, 30 Nov 2024 10:19:12 +0000 (11:19 +0100)]
openmp: Add crtoffloadtableS.o and use it [PR117851]
Unlike crtoffload{begin,end}.o which just define some symbols at the start/end
of the various .gnu.offload* sections, crtoffloadtable.o contains
const void *const __OFFLOAD_TABLE__[]
__attribute__ ((__visibility__ ("hidden"))) =
{
&__offload_func_table, &__offload_funcs_end,
&__offload_var_table, &__offload_vars_end,
&__offload_ind_func_table, &__offload_ind_funcs_end,
};
The problem is that linking this into PIEs or shared libraries doesn't
work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC
code is put into .rodata section, but it really needs relocations, so for
PIC it should go into .data.rel.ro/.data.rel.ro.local.
As I think we don't want .data.rel.ro section in non-PIE binaries, this patch
follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o
next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR libgomp/117851
gcc/
* lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument,
search for crtoffloadtableS.o rather than crtoffloadtable.o if
true.
(run_gcc): Add pie_or_shared variable. If OPT_pie or OPT_shared or
OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is
seen, set pie_or_shared to false. Pass it to find_crtoffloadtable.
libgcc/
* configure.ac (extra_parts): Add crtoffloadtableS.o.
* Makefile.in (crtoffloadtableS$(objext)): New goal.
* configure: Regenerated.
Jinyang He [Thu, 28 Nov 2024 01:26:25 +0000 (09:26 +0800)]
LoongArch: Mask shift offset when emit {xv, v}{srl, sll, sra} with sameimm vector
For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.
gcc/ChangeLog:
* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.
In r15-5327, change the default language version for C compilation from
-std=gnu17 to -std=gnu23.
ISO C99 and C11 allow ceil, floor, round and trunc, and their float and
long double variants, to raise the “inexact” exception,
but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as
integrated into ISO C23, does not allow these functions to do so.
So add '-ffp-int-builtin-inexact' to this test case.
Andrew Pinski [Fri, 29 Nov 2024 23:29:41 +0000 (15:29 -0800)]
gimplefe: Error recovery for invalid declarations [PR117749]
c_parser_declarator can return null if there was an error,
but c_parser_gimple_declaration was not ready for that.
This fixes that oversight so we don't get an ICE after the error.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/117749
gcc/c/ChangeLog:
* gimple-parser.cc (c_parser_gimple_declaration): Check
declarator to be non-null.
gcc/testsuite/ChangeLog:
* gcc.dg/gimplefe-55.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek [Sat, 30 Nov 2024 00:51:24 +0000 (01:51 +0100)]
ext-dce: Fix SIGN_EXTEND handling and cleanups [PR117360]
This is mostly a blind attempt to fix the PR + various cleanups.
The PR is about a shift of a HOST_WIDE_INT by 127 invoking UB.
Most of carry_backpropagate works on GET_MODE_INNER of the operand,
mode is assigned
enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
at the beginning and everything is done using that mode, so for
vector modes (or complex even?) we work with the element modes
rather than vector/complex modes.
But the SIGN_EXTEND handling does that inconsistently, it looks
at mode of the operand and uses GET_MODE_INNER in GET_MODE_MASK,
but doesn't use it in the shift.
The following patch appart from the cleanups fixes it by doing
essentially:
mode = GET_MODE (XEXP (x, 0));
if (mask & ~GET_MODE_MASK (GET_MODE_INNER (mode)))
- mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+ mask |= 1ULL << (GET_MODE_BITSIZE (GET_MODE_INNER (mode)).to_constant () - 1);
i.e. also shifting by GET_MODE_BITSIZE of the GET_MODE_INNER of the
operand's mode. We don't need to check if it is at most 64 bits,
at the start of the function we've already verified the result mode
is at most 64 bits and SIGN_EXTEND by definition extends from a narrower
mode.
The rest of the patch are cleanups. For HOST_WIDE_INT we have the
HOST_WIDE_INT_{UC,1U} macros, a HWI isn't necessarily unsigned long long,
so using ULL suffixes for it is weird.
More importantly, the function does
scalar_int_mode smode;
if (!is_a <scalar_int_mode> (mode, &smode)
|| GET_MODE_BITSIZE (smode) > HOST_BITS_PER_WIDE_INT)
return mmask;
early, so we don't need to use GET_MODE_BITSIZE (mode) which is
a poly_int but can use GET_MODE_BITSIZE (smode) with the same value
but in unsigned short, so we don't need to use known_lt or .to_constant ()
everywhere.
Plus some formatting issues.
What I've left around is
if (!GET_MODE_BITSIZE (GET_MODE (x)).is_constant ()
|| !GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0))).is_constant ())
return -1;
at the start of SIGN_EXTEND or ZERO_EXTEND, I'm afraid I don't know enough
about aarch64/riscv VL vectors to know why this is done (though even that
return -1; is weird, rest of the code does return mmask; if it wants to
punt.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117360
* ext-dce.cc (ext_dce_process_sets): Use HOST_WIDE_INT_UC
macro instead of ULL suffixed constants.
(carry_backpropagate): Likewise. Use HOST_WIDE_INT_1U instead of
1ULL. Use GET_MODE_BITSIZE (smode) instead of
GET_MODE_BITSIZE (mode) and with that avoid having to use
known_lt instead of < or use .to_constant (). Formatting fixes.
(case SIGN_EXTEND): Set mode to GET_MODE_INNER (GET_MODE (XEXP (x, 0)))
rather than GET_MODE (XEXP (x, 0)) and don't use GET_MODE_INNER (mode).
(ext_dce_process_uses): Use HOST_WIDE_INT_UC macro instead of ULL
suffixed constants.
Jakub Jelinek [Sat, 30 Nov 2024 00:49:21 +0000 (01:49 +0100)]
c++: Implement C++26 P3176R1 - The Oxford variadic comma
While we are already in stage3, I wonder if implementing this small paper
wouldn't be useful even for GCC 15, so that we have in the GCC world one
extra year of deprecation of variadic ellipsis without preceding comma.
The paper just deprecates something, I'd hope most of the C++ code in the
wild when it uses variadic functions at all uses the comma before the
ellipsis.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c.opt (Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc: Implement C++26 P3176R1 - The Oxford variadic comma.
(cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.
Ian Lance Taylor [Thu, 28 Nov 2024 21:14:34 +0000 (13:14 -0800)]
compiler: increase buffer size to avoid warning
GCC has a new -Wformat-truncation warning that triggers on this code:
../../gcc/go/gofrontend/go-encode-id.cc: In function 'std::string go_encode_id(const std::string&)':
../../gcc/go/gofrontend/go-encode-id.cc:176:48: error: '%02x' directive output may be truncated writing between 2 and 8 bytes into a region of size 6 [-Werror=format-truncation=]
176 | snprintf(buf, sizeof buf, "_x%02x", c);
| ^~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:45: note: directive argument in the range [128, 4294967295]
176 | snprintf(buf, sizeof buf, "_x%02x", c);
| ^~~~~~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:27: note: 'snprintf' output between 5 and 11 bytes into a destination of size 8
176 | snprintf(buf, sizeof buf, "_x%02x", c);
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The code is safe, because the value of c is known to be >= 0 && <= 0xff.
But it's difficult for the compiler to know that.
Bump the buffer size to avoid the warning.
Andrew Pinski [Thu, 21 Nov 2024 18:59:59 +0000 (10:59 -0800)]
aarch64: add attributes to the prefetch_builtins
This adds the attributes associated with prefetch to the bultins.
Just call aarch64_get_attributes with FLAG_PREFETCH_MEMORY to get the attributes.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_prefetch_builtin):
Updete call to aarch64_general_add_builtin in AARCH64_INIT_PREFETCH_BUILTIN.
Add new variable prefetch_attrs.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Thu, 21 Nov 2024 18:51:38 +0000 (10:51 -0800)]
aarch64: Fix up flags for vget_low_*, vget_high_* and vreinterpret intrinsics
These 3 intrinsics will not raise an fp exception, or read FPCR. These intrinsics,
will be folded into VIEW_CONVERT_EXPR or a BIT_FIELD_REF which is already set to
be const expressions too.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (VREINTERPRET_BUILTIN): Use
FLAG_NONE instead of FLAG_AUTO_FP.
(VGET_LOW_BUILTIN): Likewise.
(VGET_HIGH_BUILTIN): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Tue, 19 Nov 2024 08:19:57 +0000 (00:19 -0800)]
aarch64: Mark __builtin_aarch64_im_lane_boundsi as leaf and nothrow [PR117665]
__builtin_aarch64_im_lane_boundsi is known not to throw or call back into another
function since it will either folded into an NOP or will produce a compiler error.
This fixes the ICE by fixing the missed optimization. It does not fix the underlying
issue with fold_marked_statements; which I filed as PR 117668.
Built and tested for aarch64-linux-gnu.
PR target/117665
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_builtin_functions):
Pass nothrow and leaf as attributes to aarch64_general_add_builtin for
__builtin_aarch64_im_lane_boundsi.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/lane-bound-1.C: New test.
* gcc.target/aarch64/lane-bound-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
[PR117770][LRA]: Check hard regs corresponding insn operands for hard reg clobbers
When LRA processes early clobbered hard regs explicitly present in the
insn description, it checks that the hard reg is also used as input.
If the hard reg is not an input also, it is marked as dying. For the
check LRA processed only input hard reg also explicitly present in the
insn description. For given PR, the hard reg is used as input as the
operand and is not present explicitly in the insn description and
therefore LRA marked the hard reg as dying. This results in wrong
allocation and wrong code. The patch solves the problem by processing
hard regs used as the insn operand.
gcc/ChangeLog:
PR rtl-optimization/117770
* lra-lives.cc: Include ira-int.h.
(process_bb_lives): Check hard regs corresponding insn operands
for dying hard wired reg clobbers.
Yury Khrustalev [Fri, 29 Nov 2024 11:09:23 +0000 (11:09 +0000)]
aarch64: Fix build failure due to missing header
Including the "arm_acle.h" header in aarch64-unwind.h requires
stdint.h to be present and it may not be available during the
first stage of cross-compilation of GCC.
When cross-building GCC for the aarch64-none-linux-gnu target
(on any supporting host) using the 3-stage bootstrap build
process when we build native compiler from source, libgcc fails
to compile due to missing header that has not been installed yet.
This could be worked around but it's better to fix the issue.
Andre Vieira [Fri, 29 Nov 2024 10:18:57 +0000 (10:18 +0000)]
arm, mve: Detect uses of vctp_vpr_generated inside subregs
Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that. Using
reg_overlap_mentioned_p is much more robust.
gcc/ChangeLog:
PR target/117814
* config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.
gcc/testsuite/ChangeLog:
PR target/117814
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.