Joseph Myers [Fri, 15 Nov 2024 14:08:42 +0000 (14:08 +0000)]
tree-nested: Do not inline or clone functions with nested functions with VM return type [PR117164]
Bug 117164 is an ICE on an existing test with -std=gnu23 involving a
nested function returning a variable-size structure (and I think the
last bug needing to be resolved before switching to -std=gnu23 as the
default, as without fixing this would be a clear regression from a
change in default).
The problem is a GIMPLE verification failure where (after type
remapping from inlining / cloning) the return type of the function no
longer exactly matches the type to which it is assigned (these types
use structural equality, which means GIMPLE verification can't use
TYPE_CANONICAL and expects an exact match). Specifically, the nested
function itself is *not* inlined (the -fno-inline-small-functions in
the original test nested-func-12.c, I think, or the noinline attribute
in some of my variant tests), but the function containing it is either
cloned (the --param ipa-cp-eval-threshold=0 in the original test) or
inlined. (I'm not sure what role -fno-guess-branch-probability plays
in getting the right situation for the ICE; maybe affecting when
inlining or cloning is considered profitable?)
There is in fact existing code in tree-nested.cc to prevent inlining
of a function containing a nested function with variably modified
*argument* types. I think the same issue of ensuring consistency of
types means such prevention should also apply for a variably modified
return type. Furthermore, exactly the same problem applies for
cloning for other reasons as it does for inlining. Thus, change the
logic to include variably modified return types for nested functions
alongside those for arguments of those functions as a reason not to
inline, and also add the noclone attribute in these cases.
Bootstrapped with no regressions for x86-64-pc-linux-gnu.
PR c/117164
gcc/
* tree-nested.cc: Include "attribs.h".
(check_for_nested_with_variably_modified): Also return true for
variably modified return type.
(create_nesting_tree): If check_for_nested_with_variably_modified
returns true, also add noclone attribute.
gcc/testsuite/
* gcc.dg/nested-func-13.c, gcc.dg/nested-func-14.c:
gcc.dg/nested-func-15.c, gcc.dg/nested-func-16.c,
gcc.dg/nested-func-17.c: New tests.
Christophe Lyon [Thu, 3 Oct 2024 13:37:16 +0000 (13:37 +0000)]
testsuite: Fix tail_call and musttail effective targets [PR116080]
Some of the musttail tests (eg musttail7.c) fail on arm-eabi because
check_effective_target_musttail pass, but the actual code in the test
is rejected.
The reason is that on arm-eabi with the default configuration, the
compiler targets armv4t for which TARGET_INTERWORK is true, making
arm_function_ok_for_sibcall reject a tail-call candidate if
TREE_ASM_WRITTEN (decl) is false.
For more recent architecture versions, TARGET_INTERWORK is false,
hence the problem was not seen on all arm configurations.
musttail7.c is in turn rejected because f2 is recursive, so
TREE_ASM_WRITTEN is false.
However, the same code used in check_effective_target_musttail is not
recursive and the function body for foo has TREE_ASM_WRITTEN == true.
The simplest fix is to remove the (empty) body for foo () in
check_effective_target_musttail. For consistency, do the same with
check_effective_target_tail_call.
Richard Biener [Tue, 12 Nov 2024 13:48:23 +0000 (14:48 +0100)]
Remove dead code related to VEC_COND_EXPR expansion from ISEL
ISEL was introduced to translate vector comparison and vector
condition combinations back to internal function calls mapping to
one of the vcond[u][_eq][_mask] and vec_cmp[_eq] optabs. With
removing the legacy non-mask vcond expanders we now rely on all
vector comparisons and vector conditions to be directly expandable.
The following keeps the intermediate internal function rewrite
given gimple_expand_vec_cond_expr still performs some optimizations
which eventually should move to vector lowering or match.pd, but
simplifies it down to always expand VEC_COND_EXPR to .VCOND_MASK.
* gimple-isel.cc (gimple_expand_vec_cond_expr): If not
simplifying or lowering, always expand to .VCOND_MASK.
(pass_gimple_isel::execute): Simplify.
Richard Biener [Tue, 12 Nov 2024 14:07:34 +0000 (15:07 +0100)]
Streamline vector lowering of VEC_COND_EXPR and vector comparisons
The following makes sure to lower all VEC_COND_EXPRs that we cannot
trivially expand, likewise for comparisons. In particular no longer
try to combine both in fancy ways.
* tree-vect-generic.cc (expand_vector_comparison): Lower
vector comparisons that we cannot trivially expand. Remove
code dealing with uses in VEC_COND_EXPRs.
(expand_vector_condition): Lower vector conditions that we
cannot trivially expand. Remove code dealing with comparison
mask definitions.
(expand_vector_operation): Drop dce_ssa_names.
(expand_vector_operations_1): Likewise.
Pan Li [Fri, 15 Nov 2024 10:43:36 +0000 (18:43 +0800)]
RISC-V: Rearrange the test files for scalar SAT_SUB [NFC]
The test files of scalar SAT_SUB only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}. For example,
test form 3 for uint32_t SAT_SUB will have -3-u32.c for asm check and
-run-3-u32.c for the run test.
Meanwhile, all related test files moved to riscv/sat/.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
Florian Weimer [Thu, 14 Nov 2024 11:42:25 +0000 (12:42 +0100)]
c: Introduce -Wmissing-parameter-name
Empirically, omitted parameter names are difficult to catch in code
review. With this change, projects can build with
-Werror=missing-parameter-name, to avoid this unnecessary
incompatibility with older GCC versions. The existing
-pedantic-errors option is too broad for that because it also flags
widely used and widely available GCC extensions. Likewise for
-Werror=c11-c23-compat.
gcc/c/
* c-decl.cc (store_parm_decls_newstyle): Use
OPT_Wmissing_parameter_name for missing parameter name
warning.
* c-errors.cc (pedwarn_c11): Enable fine-grained warning
control via the option_id argument.
Pan Li [Fri, 15 Nov 2024 07:05:58 +0000 (15:05 +0800)]
RISC-V: Remove unnecessary option for scalar SAT_ADD testcase
After we create a isolated folder to hold all SAT scalar test,
we have fully control of what optimization options passing to
the testcase. Thus, it is better to remove the unnecessary
work around for flto option, as well as the -O3 option for
each cases. The riscv.exp will pass sorts of different optimization
options for each case.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Jakub Jelinek [Fri, 15 Nov 2024 07:43:48 +0000 (08:43 +0100)]
c: Add _Decimal64x support
The following patch adds _Decimal64x type support. Our dfp libraries (dpd &
libbid) can only handle decimal32, decimal64 and decimal128 formats and I
don't see that changing any time soon, so the following patch just hardcodes
that _Decimal64x has the same mode as _Decimal128 (but is a distinct type).
In the unlikely event some target would introduce something different that
can be of course changed with target hooks but would be an ABI change.
_Decimal128x is optional and we don't have a wider decimal type, so that
type isn't added.
2024-11-15 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum tree_index): Add TI_DFLOAT64X_TYPE.
* tree.h (dfloat64x_type_node): Define.
* tree.cc (build_common_tree_nodes): Initialize dfloat64x_type_node.
* builtin-types.def (BT_DFLOAT64X): New DEF_PRIMITIVE_TYPE.
(BT_FN_DFLOAT64X): New DEF_FUNCTION_TYPE_0.
(BT_FN_DFLOAT64X_CONST_STRING, BT_FN_DFLOAT64X_DFLOAT64X): New
DEF_FUNCTION_TYPE_1.
* builtins.def (BUILT_IN_FABSD64X, BUILT_IN_INFD64X, BUILT_IN_NAND64X,
BUILT_IN_NANSD64X): New builtins.
* builtins.cc (expand_builtin): Handle BUILT_IN_FABSD64X.
(fold_builtin_0): Handle BUILT_IN_INFD64X.
(fold_builtin_1): Handle BUILT_IN_FABSD64X.
* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NAND64X
and CFN_BUILT_IN_NANSD64X.
* ginclude/float.h (DEC64X_MANT_DIG, DEC64X_MIN_EXP, DEC64X_MAX_EXP,
DEC64X_MAX, DEC64X_EPSILON, DEC64X_MIN, DEC64X_TRUE_MIN,
DEC64X_SNAN): Redefine.
gcc/c-family/
* c-common.h (enum rid): Add RID_DFLOAT64X.
* c-common.cc (c_global_trees): Fix comment typo. Add
dfloat64x_type_node.
(c_common_nodes_and_builtins): Handle RID_DFLOAT64X.
* c-cppbuiltin.cc (c_cpp_builtins): Call
builtin_define_decimal_float_constants also for dfloat64x_type_node
if non-NULL.
* c-lex.cc (interpret_float): Handle d64x suffixes.
* c-pretty-print.cc (pp_c_floating_constant): Print d64x suffixes
on dfloat64x_type_node typed constants.
gcc/c/
* c-tree.h (enum c_typespec_keyword): Add cts_dfloat64x and adjust
comment.
* c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs,
c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle
RID_DFLOAT64X.
(c_parser_postfix_expression): Handle _Decimal64x arguments in
__builtin_tgmath.
(warn_for_abs): Handle BUILT_IN_FABSD64X.
* c-decl.cc (declspecs_add_type): Handle cts_dfloat64x and
RID_DFLOAT64X.
(finish_declspecs): Handle cts_dfloat64x.
* c-typeck.cc (c_common_type): Handle dfloat64x_type_node.
gcc/testsuite/
* gcc.dg/dfp/c11-decimal64x-1.c: New test.
* gcc.dg/dfp/c11-decimal64x-2.c: New test.
* gcc.dg/dfp/c23-decimal64x-1.c: New test.
* gcc.dg/dfp/c23-decimal64x-2.c: New test.
* gcc.dg/dfp/c23-decimal64x-3.c: New test.
* gcc.dg/dfp/c23-decimal64x-4.c: New test.
libcpp/
* expr.cc (interpret_float_suffix): Handle d64x and D64x
suffixes, adjust comment.
Pan Li [Fri, 15 Nov 2024 03:42:13 +0000 (11:42 +0800)]
RISC-V: Move scalar SAT_ADD test cases to a isolated folder
Move the scalar SAT_ADD includes both the signed and unsigned
integer to the folder gcc.target/riscv/sat. According to the
implementation the below options will be appended for each
test cases.
* -O2
* -O3
* -Ofast
* -Os
* -Oz
Then we can see the test log similar as below:
Executing on host: .../sat_s_add-1-i8.c ... -O2 -march=rv64gc -S -o sat_s_add-1-i8.s
Executing on host: .../sat_s_add-1-i8.c ... -O3 -march=rv64gc -S -o sat_s_add-1-i8.s
Executing on host: .../sat_s_add-1-i8.c ... -Ofast -march=rv64gc -S -o sat_s_add-1-i8.s
Executing on host: .../sat_s_add-1-i8.c ... -Oz -march=rv64gc -S -o sat_s_add-1-i8.s
Executing on host: .../sat_s_add-1-i8.c ... -Os -march=rv64gc -S -o sat_s_add-1-i8.s
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
Kewen Lin [Fri, 15 Nov 2024 03:46:33 +0000 (03:46 +0000)]
rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly. So it's better
to refactor the handlings of vector integer comparison here.
This is part 5, it's to refactor all the handlings of vector
integer comparison to make it neat. This patch doesn't
introduce any functionality change.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
handlings of vector integer comparison.
Kewen Lin [Fri, 15 Nov 2024 03:46:33 +0000 (03:46 +0000)]
rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly. So it's better
to refactor the handlings of vector integer comparison here.
This is part 4, it's to rework the handlings on GE/GEU/LE/LEU,
also make the function not recursive any more. This patch
doesn't introduce any functionality change.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
handlings for operators GE/GEU/LE/LEU.
Kewen Lin [Fri, 15 Nov 2024 03:46:33 +0000 (03:46 +0000)]
rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly. So it's better
to refactor the handlings of vector integer comparison here.
This is part 3, it's to refactor the handlings on NE.
This patch doesn't introduce any functionality change.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
handlings for operator NE.
Kewen Lin [Fri, 15 Nov 2024 03:46:33 +0000 (03:46 +0000)]
rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly. So it's better
to refactor the handlings of vector integer comparison here.
This is part 2, it's to refactor the handlings on LT and LTU.
This patch doesn't introduce any functionality change.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
handlings for operators LT and LTU.
Kewen Lin [Fri, 15 Nov 2024 03:46:33 +0000 (03:46 +0000)]
rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly. So it's better
to refactor the handlings of vector integer comparison here.
This is part 1, it's to remove the helper function
rs6000_emit_vector_compare_inner and move the logics into
rs6000_emit_vector_compare. This patch doesn't introduce any
functionality change.
Kewen Lin [Fri, 15 Nov 2024 03:46:33 +0000 (03:46 +0000)]
rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.
This is part 4, it further checks for comparison opeators
LT/UNGE. In rs6000_emit_vector_compare, for the handling
of LT, it switches to use code GT, swaps operands and try
again, it's exactly the same as what we have in vector.md:
; lt(a,b) = gt(b,a)
As to UNGE, in rs6000_emit_vector_compare, it uses reversed
code LT and further operates on the result with one_cmpl,
it's also the same as what's in vector.md:
; unge(a,b) = ~lt(a,b)
This patch should not have any functionality change too.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Emit rtx
comparison for operators LT/UNGE of MODE_VECTOR_FLOAT directly.
(rs6000_emit_vector_compare): Move assertion of no MODE_VECTOR_FLOAT to
function beginning.
Kewen Lin [Fri, 15 Nov 2024 03:46:32 +0000 (03:46 +0000)]
rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.
This is part 3, it further checks for comparison opeators
LE/UNGT. In rs6000_emit_vector_compare, UNGT is handled
with reversed code LE and inverting with one_cmpl_optab,
LE is handled with LT ior EQ, while in vector.md, we have
the support:
; le(a,b) = ge(b,a)
; ungt(a,b) = ~le(a,b)
The associated test case shows it's an improvement.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
comparison for operators LE/UNGT of MODE_VECTOR_FLOAT directly.
Kewen Lin [Fri, 15 Nov 2024 03:46:32 +0000 (03:46 +0000)]
rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.
This is part 2, it further checks for comparison opeators
NE/UNLE/UNLT. In rs6000_emit_vector_compare, they are
handled with reversed code which is queried from function
reverse_condition_maybe_unordered and inverting with
one_cmpl_optab. It's the same as what we have in vector.md:
Kewen Lin [Fri, 15 Nov 2024 03:46:32 +0000 (03:46 +0000)]
rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.
This is part 1, it only handles the operators which are
already emitted with an rtx comparison previously in function
rs6000_emit_vector_compare_inner, they are EQ/GT/GE/ORDERED/
UNORDERED/UNEQ/LTGT. There is no functionality change.
With this change, rs6000_emit_vector_compare_inner would
only work for vector integer comparison handling, it would
be cleaned up later in vector integer comparison rework.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Move
MODE_VECTOR_FLOAT handlings out.
(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/GT/
GE/UNORDERED/ORDERED/UNEQ/LTGT of MODE_VECTOR_FLOAT directly, and
adjust one call site of rs6000_emit_vector_compare_inner to
rs6000_emit_vector_compare.
Jeff Law [Thu, 14 Nov 2024 23:57:50 +0000 (16:57 -0700)]
[RISC-V][V2] Fix type on vector move patterns
Updated version of my prior patch to fix type attributes on the
pre-allocation vector move pattern. This version just adds a suitable
set of attributes to a second pattern that was obviously wrong.
Passed on my tester for rv64 and rv32 crosses. Bootstrapped and
regression tested on riscv64-linux-gnu as well.
--
So I was looking into a horrific schedule for SAD a week or so ago and
came across this gem.
Basically we were treating a vector load as a vector move from a
scheduling standpoint during sched1. Naturally we didn't expose much
ILP during sched1. That in turn caused the register allocator to pack
the pseudos onto the physical vector registers tightly. regrename
didn't do anything useful and the resulting code had too many false
dependencies for sched2 to do anything useful.
As a result we were taking many load->use stalls in x264's SAD routine.
I'm confident the types are fine, but I'm a lot less sure about the
other attributes (mode, avl_type_index, mode_idx). If someone could
take a look at that, it'd be greatly appreciated.
There's other cases that may need similar treatment. But I didn't want
to muck with them until I understood those other attributes and how they
need adjustments.
In particular mov<VLS_AVL_REG:mode><P:mode>_lra appears to have the same
problem.
--
gcc/
* config/riscv/vector.md (mov<mode> pattern/splitter): Fix type and
other attributes.
(mov<VLS_AVL_REG:mode><P:mode>_lra): Likewise.
Harald Anlauf [Thu, 14 Nov 2024 20:38:04 +0000 (21:38 +0100)]
Fortran: fix passing of NULL() actual argument to character dummy [PR104819]
Ensure that character length is set and passed by the call to a procedure
when its dummy argument is NULL() with MOLD argument present, or set length
to either 0 or the callee's expected character length. For assumed-rank
dummies, use the rank of the MOLD argument. Generate temporaries for
passed arguments when needed.
PR fortran/104819
gcc/fortran/ChangeLog:
* trans-expr.cc (conv_null_actual): Helper function to handle
passing of NULL() to non-optional dummy arguments of non-bind(c)
procedures.
(gfc_conv_procedure_call): Use it for character dummies.
Denis Chertykov [Thu, 14 Nov 2024 20:50:36 +0000 (00:50 +0400)]
The fix for PR117191
Wrong code appears after dse2 pass because it removes necessary insns.
(ie insn 554 - store to frame spill slot)
This happened because LRA pass doesn't cleanup the code exactly like reload does.
The reload1.c has a special pass for such cleanup.
The reload removes CLOBBER insns with spill slots like this:
(insn 202 184 186 7 (clobber (mem/c:TI (plus:HI (reg/f:HI 28 r28)
(const_int 1 [0x1])) [3 %sfp+1 S16 A8])) -1
(nil))
Fragment from reload1.c:
--------------------------------------------------------------------------------
reload_completed = 1;
/* Make a pass over all the insns and delete all USEs which we inserted
only to tag a REG_EQUAL note on them. Remove all REG_DEAD and REG_UNUSED
notes. Delete all CLOBBER insns, except those that refer to the return
value and the special mem:BLK CLOBBERs added to prevent the scheduler
from misarranging variable-array code, and simplify (subreg (reg))
operands. Strip and regenerate REG_INC notes that may have been moved
around. */
for (insn = first; insn; insn = NEXT_INSN (insn))
if (INSN_P (insn))
{
rtx *pnote;
if (CALL_P (insn))
replace_pseudos_in (& CALL_INSN_FUNCTION_USAGE (insn),
VOIDmode, CALL_INSN_FUNCTION_USAGE (insn));
LRA have a similar place where it removes unnecessary insns, but not CLOBBER insns with
memory spill slots. It's `lra_final_code_change' function.
I just mark a CLOBBER insn with pseudo spilled to memory for removing it later together
with LRA temporary CLOBBER insns.
PR rtl-optimization/117191
gcc/
* lra-spills.cc (spill_pseudos): Mark a CLOBBER insn with pseudo
spilled to memory for removing it later together with LRA temporary
CLOBBER insns.
Jonathan Wakely [Thu, 14 Nov 2024 16:57:17 +0000 (16:57 +0000)]
libstdc++: Make equal and is_permutation short-circuit (LWG 3560)
We already implement short-circuiting for random access iterators, but
we also need to do so for ranges::equal and ranges::is_permutation when
given sized ranges that are not random access ranges (e.g. std::list).
libstdc++-v3/ChangeLog:
* include/bits/ranges_algo.h (__is_permutation_fn::operator()):
Short-circuit for sized ranges with different sizes, as per LWG
3560.
* include/bits/ranges_algobase.h (__equal_fn::operator()):
Likewise.
* include/bits/stl_algo.h (__is_permutation): Use if-constexpr
for random access iterator branches.
* include/bits/stl_algobase.h (__equal4): Likewise.
* testsuite/25_algorithms/equal/lwg3560.cc: New test.
* testsuite/25_algorithms/is_permutation/lwg3560.cc: New test.
Martin Jambor [Thu, 14 Nov 2024 19:55:06 +0000 (20:55 +0100)]
ipa: Rationalize IPA-VR computations across pass-through jump functions
Currently ipa_value_range_from_jfunc and
propagate_vr_across_jump_function contain similar but not same code
for dealing with pass-through jump functions. This patch puts these
common bits into one function which can also handle comparison
operations.
gcc/ChangeLog:
2024-11-01 Martin Jambor <mjambor@suse.cz>
PR ipa/114985
* ipa-cp.cc (ipa_vr_intersect_with_arith_jfunc): New function.
(ipa_value_range_from_jfunc): Move the common functionality to the
above new function, adjust the rest so that it works with it well.
(propagate_vr_across_jump_function): Likewise.
Jonathan Wakely [Mon, 26 Feb 2024 11:40:46 +0000 (11:40 +0000)]
libstdc++: Make _GLIBCXX_NODISCARD work for C++11 and C++14
The _GLIBCXX_NODISCARD macro only expands to [[__nodiscard__]] for C++17
and later, but all supported compilers will allow us to use that for
C++11 and C++14 too. Enable it for those older standards, to give
improved diagnostics for users of those older standards.
Jason Merrill [Thu, 14 Nov 2024 04:39:53 +0000 (23:39 -0500)]
libstdc++: stdc++.h and <coroutine>
r13-3036 moved #include <coroutine> into the new freestanding section, but
also moved it from a C++20 section to a C++23 section. This patch moves it
back.
Incidentally, I'm curious why a few headers were removed from the hosted
section (including <coroutine>), but most were left in place, so we have
redundant includes of most hosted headers.
libstdc++-v3/ChangeLog:
* include/precompiled/stdc++.h: <coroutine> is C++20.
Richard Ball [Thu, 14 Nov 2024 16:15:13 +0000 (16:15 +0000)]
aarch64: Add tests and docs for indirect_return attribute
This patch adds a new testcase and docs for indirect_return
attribute.
gcc/ChangeLog:
* doc/extend.texi: Add AArch64 docs for indirect_return
attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/indirect_return-1.c: New test.
* gcc.target/aarch64/indirect_return-2.c: New test.
* gcc.target/aarch64/indirect_return-3.c: New test.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:13 +0000 (16:15 +0000)]
aarch64: Introduce indirect_return attribute
Tail calls of indirect_return functions from non-indirect_return
functions are disallowed even if BTI is disabled, since the call
site may have BTI enabled.
Needed for swapcontext within the same function when GCS is enabled.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_gnu_attributes): Add
indirect_return.
(aarch64_gen_callee_cookie): Use indirect_return attribute.
(aarch64_callee_indirect_return): New.
(aarch_fun_is_indirect_return): New.
(aarch64_function_ok_for_sibcall): Disallow tail calls if caller
is non-indirect_return but callee is indirect_return.
(aarch64_function_arg): Add indirect_return to cookie.
(aarch64_init_cumulative_args): Record indirect_return in
CUMULATIVE_ARGS.
(aarch64_comp_type_attributes): Check indirect_return attribute.
(aarch64_output_mi_thunk): Add indirect_return to cookie.
* config/aarch64/aarch64.h (CUMULATIVE_ARGS): Add new field
indirect_return.
* config/aarch64/aarch64.md (tlsdesc_small_<mode>): Update.
* config/aarch64/aarch64-opts.h (AARCH64_NUM_ABI_ATTRIBUTES): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Update.
* config/arm/aarch-bti-insert.cc (call_needs_bti_j): New.
(rest_of_insert_bti): Use call_needs_bti_j.
* config/arm/aarch-common-protos.h
(aarch_fun_is_indirect_return): New.
* config/arm/arm.cc
(aarch_fun_is_indirect_return): New.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:11 +0000 (16:15 +0000)]
aarch64: Add GCS support to the unwinder
Follows the current linux ABI that uses single signal entry token
and shared shadow stack between thread and alt stack.
Could be behind __ARM_FEATURE_GCS_DEFAULT ifdef (only do anything
special with gcs compat codegen) but there is a runtime check anyway.
Change affected tests to be compatible with -mbranch-protection=standard
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:09 +0000 (16:15 +0000)]
aarch64: Add non-local goto and jump tests for GCS
These are scan asm tests only, relying on existing execution tests
for runtime coverage.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/gcs-nonlocal-1.c: New test.
* gcc.target/aarch64/gcs-nonlocal-1-track-speculation.c: New test.
* gcc.target/aarch64/gcs-nonlocal-2.c: New test.
* gcc.target/aarch64/gcs-nonlocal-2-track-speculation.c: New test.
* gcc.target/aarch64/gcs-nonlocal-1.h: New header file.
* gcc.target/aarch64/gcs-nonlocal-2.h: New header file.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:08 +0000 (16:15 +0000)]
aarch64: Add GCS support for nonlocal stack save
Nonlocal stack save and restore has to also save and restore the GCS
pointer. This is used in __builtin_setjmp/longjmp and nonlocal goto.
The GCS specific code is only emitted if GCS branch-protection is
enabled and the code always checks at runtime if GCS is enabled.
The new -mbranch-protection=gcs and old -mbranch-protection=none code
are ABI compatible: jmpbuf for __builtin_setjmp has space for 5
pointers, the layout is
old layout: fp, pc, sp, unused, unused
new layout: fp, pc, sp, gcsp, unused
Note: the ILP32 code generation is wrong as it saves the pointers with
Pmode (i.e. 8 bytes per pointer), but the user supplied buffer size is
for 5 pointers (4 bytes per pointer), this is not fixed.
The nonlocal goto has no ABI compatibility issues as the goto and its
destination are in the same translation unit.
We use CDImode to allow extra space for GCS without the effect of 16-byte
alignment.
gcc/ChangeLog:
* config/aarch64/aarch64.h (STACK_SAVEAREA_MODE): Make space for gcs.
* config/aarch64/aarch64.md (save_stack_nonlocal): New.
(restore_stack_nonlocal): New.
* tree-nested.cc (get_nl_goto_field): Updated.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:08 +0000 (16:15 +0000)]
aarch64: Add __builtin_aarch64_gcs* and __gcs* tests
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/gcs-1.c: New test.
* gcc.target/aarch64/gcspopm-1.c: New test.
* gcc.target/aarch64/gcspr-1.c: New test.
* gcc.target/aarch64/gcsss-1.c: New test.
The builtins are always enabled, but should be used behind runtime
checks in case the target does not support GCS. They are thin
wrappers around the corresponding instructions.
The GCS pointer is modelled with void * type (normal stores do not
work on GCS memory, but it is writable via the gcsss operation or
via GCSSTR if enabled so not const) and an entry on the GCS is
modelled with uint64_t (since it has fixed size and can be a token
that's not a pointer).
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:06 +0000 (16:15 +0000)]
aarch64: Add GCS instructions
Add instructions for the Guarded Control Stack extension.
GCSSS1 and GCSSS2 are always used together in the compiler and an extra
"mov xn, 0" should be always added before GCSSS2 to clear the output
register. This is needed to get reasonable result when GCS is disabled,
when these instructions are NOPs. Since the instructions are expected
to be used behind runtime feature checks, this is mainly relevant if
GCS can be disabled asynchronously.
GCSPOPM does not have embedded move and code code that emits this
instruction must first emit a zeroing of operand 1 to get a reasonable
result when GCS is not enabled.
The output of GCSPOPM is usually not needed, so a separate gcspopm_xzr
was added to model that. Did not do the same for GCSSS as it is a less
common operation.
The used mnemonics do not depend on updated assembler since these
instructions can be used without new -march setting behind a runtime
check.
Reading the GCSPR is modelled as unspec_volatile so it does not get
reordered wrt the other instructions changing the GCSPR.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:05 +0000 (16:15 +0000)]
aarch64: Add __builtin_aarch64_chkfeat
Builtin for chkfeat: the input argument is used to initialize x16 then
execute chkfeat and return the updated x16.
Note: the ACLE __chkfeat(x) will flip the bits to be more intuitive
(xor the input to output), but for the builtin that seems unnecessary
complication.
Jan Hubicka [Thu, 14 Nov 2024 16:01:12 +0000 (17:01 +0100)]
Remove allocations which are used only for NULL pointer check and free
Extend tree-ssa-dse to remove memory allocations that are used only
to check that return value is non-NULL and freed.
New -fmalloc-dce flag can be used to control malloc/free removal. I
ended up copying what -fallocation-dse does so -fmalloc-dce=1 enables
malloc/free removal provided return value is unused otherwise and
-fmalloc-dce=2 allows additional NULL pointer checks which it folds to
non-NULL direction.
I also added compensation for the gcc.dg/analyzer/pr101837.c testcase and
added testcase that std::nothrow variant of operator new is now optimized way.
With the -fmalloc-dce=n I can also add a level which emits runtime check for half
of address space and calloc overflow if it seems useful, but perhaps
incrementally. Adding size parameter tracking is not that hard (I posted WIP
patch for that).
gcc/ChangeLog:
PR tree-optimization/117370
* common.opt: Add -fmalloc-dce.
* common.opt.urls: Update.
* doc/invoke.texi: Document it; also add missing -flifetime-dse entry.
* tree-ssa-dce.cc (is_removable_allocation_p): Break out from
...
(mark_stmt_if_obviously_necessary): ... here; also check that
operator new satisfies gimple_call_from_new_or_delete.
(checks_return_value_of_removable_allocation_p): New Function.
(mark_all_reaching_defs_necessary_1): add missing case for
STRDUP and STRNDUP
(propagate_necessity): Use is_removable_allocation_p and
checks_return_value_of_removable_allocation_p.
(eliminate_unnecessary_stmts): Update conditionals that use
removed allocation; use is_removable_allocation_p.
Jonathan Wakely [Thu, 14 Nov 2024 09:58:41 +0000 (09:58 +0000)]
libstdc++: Use feature test macros consistently in <bits/stl_iterator.h>
Remove __cplusplus > 201703L checks that are redundant when used
alongside __glibcxx_concepts checks, because <version> already
guarantees that __glibcxx_concepts is only defined for C++20 and later.
Prefer to check __glibcxx_ranges for features such as move_sentinel that
were added by the One Ranges proposal (P0896R4), or for features which
depend on other components introduced by that proposal.
But prefer to check __glibcxx_concepts for constraints that only depend
on requires-clauses and concepts defined in <concepts>, even if those
constraints were added by the Ranges proposal (e.g. the constraints on
non-member operators for move_iterator).
Prefer #ifdef to #if when just testing for the presence of __glibcxx_foo
macros with caring about their value.
Also add/tweak some Doxygen comments.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h: Make use of feature test macros
more consistent. Improve doxygen comments.
Tobias Burnus [Thu, 14 Nov 2024 15:28:20 +0000 (16:28 +0100)]
libgomp.texi: Impl. Status - change TR13 to OpenMP 6.0 + fix routine typo
libgomp/
* libgomp.texi (OpenMP Implementation Status): Change TR13 to
OpenMP 6.0, now released. Fix a typo in the omp_target_memset_async
routine name.
Richard Biener [Thu, 14 Nov 2024 13:22:01 +0000 (14:22 +0100)]
Fix another thinko in peeling for gap compute of get_group_load_store_type
There's inconsistent handling of the cpart_size == cnunits which
currently avoids reporting peeling for gaps being insufficient, but
the following condition which is enough to trigger it,
cremain + group_size < cpart_size with cpart_size == cnunits is
equal to the condition that brings us here in the first place,
maybe_lt (remain + group_size, nunits). The following fixes this
by not checking cpart_size against special values.
* tree-vect-stmts.cc (get_group_load_store_type): Do not
exempt cpart_size == cnunits from failing.
Steve Baird [Wed, 30 Oct 2024 23:20:51 +0000 (16:20 -0700)]
ada: Improved legality checking for deep delta aggregates.
Enforce deep delta legality rules about nonoverlapping choices. For example,
do not allow both Aaa.Bbb and Aaa.Bbb.Ccc as choices in one delta aggregate.
One special case impacts "regular" Ada2022 delta aggregates - the rule
preventing a record component from occurring twice as a choice in a delta
aggregate was previously not being enforced.
gcc/ada/ChangeLog:
* sem_aggr.adb (Resolve_Delta_Aggregate): The rule about
discriminant dependent component references in choices applies to
both array and record delta aggregates, so check for violations in
Resolve_Delta_Aggregate. Call a new procedure,
Check_For_Bad_Dd_Component_Choice, for each choice.
(Resolve_Delta_Record_Aggregate): Call a new procedure,
Check_For_Bad_Overlap, for each pair of choices.
Eric Botcazou [Fri, 1 Nov 2024 19:47:57 +0000 (20:47 +0100)]
ada: Fix spurious warning on representation clause for private discriminated type
This is the warning enabled by -gnatw.h for holes in record types that are
declared with a representation clause for their components.
When a discriminated type has a private declaration that also declares its
discriminants, the sibling discriminants present on the full declaration
are essentially ignored and, therefore, cannot be used in the computation
performed to give the warning.
gcc/ada/ChangeLog:
* sem_ch13.adb (Record_Hole_Check): Deal consistently with the base
type throughout the processing. Return if its declaration is not a
full type declaration. Assert that its record definition is either
a derived type definition or a record definition. If the type has a
private declaration that does not specify unknown discriminants, use
it as the source of discriminant specifications, if any.
(Check_Component_List): Process every N_Discriminant_Specification
but assert that its defining identifier is really a discriminant.
Eric Botcazou [Mon, 28 Oct 2024 10:19:10 +0000 (11:19 +0100)]
ada: Fix internal error on misplaced iterated component association
This happens for example in the others choice of a 1-dimensional array:
A : array (1 .. Length) of Integer
:= (others => for Each in 1 .. Length => Each);
or when it is used without parentheses for the array component of a record.
gcc/ada/ChangeLog:
PR ada/112524
PR ada/113781
* par-ch4.adb (P_Primary) <Tok_For>: Give an error about missing
parentheses in the (purported) iterated component case too.
(P_Unparen_Cond_Expr_Etc): Likewise.
* sem.adb (Analyze): Raise PE on N_Iterated_Component_Association.
* sem_util.ads (Diagnose_Iterated_Component_Association): Delete.
* sem_util.adb (Diagnose_Iterated_Component_Association): Likewise.
Martin Jambor [Thu, 14 Nov 2024 13:42:27 +0000 (14:42 +0100)]
ipa: Introduce a one jump function dumping function
I plan to introduce a verifier that prints a single jump function when
it fails with the function introduced in this one. Because it is a
verifier, the risk that it would need to e reverted are non-zero and
because the function can be useful on its own, this is a special patch
to introduce it.
gcc/ChangeLog:
2024-11-01 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_dump_jump_function): Declare.
* ipa-prop.cc (ipa_dump_jump_function): New function.
(ipa_print_node_jump_functions_for_edge): Move printing of
individual jump functions to the new function.
Martin Jambor [Thu, 14 Nov 2024 13:42:27 +0000 (14:42 +0100)]
ipa-cp: Fix constant dumping
Commit gcc-14-5368-ge0787da2633 removed an overloaded variant of
function print_ipcp_constant_value for tree constants. That did not
break build because the other overloaded variant for polymorphic
contexts-has a parameter which is constructible from a tree, but it
prints polymorphic contexts, not tree constants, so we in dumps we got
things like:
This commit re-adds the needed overloaded variant though it uses the
printing function added in the aforementioned commit instead of
printing it itself.
gcc/ChangeLog:
2024-11-13 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_print_constant_value): Declare.
* ipa-prop.cc (ipa_print_constant_value): Make public.
* ipa-cp.cc (print_ipcp_constant_value): Re-add this overloaded
function for printing tree constants.
gcc/testsuite/ChangeLog:
2024-11-14 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/ipcp-agg-1.c: Add a scan dump for a constant value in
the latice dump.
* g++.dg/tree-ssa/pr96945.C: cleanup
* g++.dg/tree-ssa/pr110819.C: New test.
* g++.dg/tree-ssa/pr116868.C: New test.
* g++.dg/tree-ssa/pr58483.C: New test.
Jonathan Wakely [Thu, 14 Nov 2024 01:14:44 +0000 (01:14 +0000)]
libstdc++: Add missing parts of LWG 3480 for directory iterators [PR117560]
It looks like I only read half the resolution of LWG 3480 and decided we
already supported it. As well as making the non-member overloads of end
take their parameters by value, we need some specializations of the
enable_borrowed_range and enable_view variable templates.
libstdc++-v3/ChangeLog:
PR libstdc++/117560
* include/bits/fs_dir.h (enable_borrowed_range, enable_view):
Define specializations for directory iterators, as per LWG 3480.
* testsuite/27_io/filesystem/iterators/lwg3480.cc: New test.
Richard Biener [Tue, 12 Nov 2024 12:55:14 +0000 (13:55 +0100)]
Remove last comparison-code expand_vec_cond_expr_p call from vectorizer
The following refactors the check with the last remaininig
expand_vec_cond_expr_p call with a comparison code to make it
obvious we are not relying on those anymore.
* tree-vect-stmts.cc (vectorizable_condition): Refactor
target support check.
Richard Biener [Thu, 14 Nov 2024 09:17:23 +0000 (10:17 +0100)]
tree-optimization/117567 - make SLP reassoc resilent against NULL lanes
The following tries to make the SLP chain association code resilent
against not present lanes (the other option would have been to disable
it in this case). Not present lanes can now more widely appear as
part of mask load SLP discovery when there is gaps involved. Requiring
a present first lane shouldn't be a restriction since in unpermuted
state all DR groups have their first lane not a gap.
PR tree-optimization/117567
* tree-vect-slp.cc (vect_build_slp_tree_2): Handle not present
lanes when doing re-association.
Christophe Lyon [Wed, 13 Nov 2024 21:20:13 +0000 (21:20 +0000)]
libgcc: Fix COPY_ARG_VAL initializer (PR 117537)
We recently forced -Werror when building libgcc for aarch64, to make
sure we'd catch and fix the kind of problem described in the PR.
In this case, when building for aarch64_be (so, big endian), gcc emits
this warning/error:
libgcc/config/libbid/bid_conf.h:847:25: error: missing braces around initializer [-Werror=missing-braces]
847 | UINT128 arg_name={ bid_##arg_name.w[1], bid_##arg_name.w[0]};
libgcc/config/libbid/bid_conf.h:871:8: note: in expansion of macro 'COPY_ARG_VAL'
871 | COPY_ARG_VAL(arg_name)
This patch fixes the problem by adding curly braces around the
initializer for COPY_ARG_VAL in the big endian case.
It seems that COPY_ARG_REF (just above COPY_ARG_VAL) has a similar
issue, but DECIMAL_CALL_BY_REFERENCE seems always defined to 0, so
COPY_ARG_REF is never used. The patch fixes it too, though.
Andrew Pinski [Wed, 13 Nov 2024 06:13:35 +0000 (22:13 -0800)]
cfgexpand: Skip doing conflicts if there is only 1 variable
This is a small speed up. If there is only one know stack variable, there
is no reason figure out the scope conflicts as there are none. So don't
go through all the live range calculations just to see there are none.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* cfgexpand.cc (add_scope_conflicts): Return right away
if there are only one stack variable.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Eikansh Gupta [Mon, 11 Nov 2024 11:36:04 +0000 (17:06 +0530)]
MATCH: Simplify `a rrotate (32-b) -> a lrotate b` [PR109906]
The pattern `a rrotate (32-b)` should be optimized to `a lrotate b`.
The same is also true for `a lrotate (32-b)`. It can be optimized to
`a rrotate b`.
This patch adds following patterns:
a rrotate (32-b) -> a lrotate b
a lrotate (32-b) -> a rrotate b
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109906
gcc/ChangeLog:
* match.pd (a rrotate (32-b) -> a lrotate b): New pattern
(a lrotate (32-b) -> a rrotate b): New pattern
Richard Biener [Wed, 13 Nov 2024 10:32:13 +0000 (11:32 +0100)]
Do not consider overrun for VMAT_ELEMENTWISE
When we classify an SLP access as VMAT_ELEMENTWISE we still consider
overrun - the reset of it is later overwritten. The following fixes
this, resolving a few RISC-V FAILs with --param vect-force-slp=1.
* tree-vect-stmts.cc (get_group_load_store_type): For
VMAT_ELEMENTWISE there's no overrun.
In addition to a single DR we also require a single lane, not a splat.
PR tree-optimization/117554
* tree-vect-stmts.cc (get_group_load_store_type): We can
use gather/scatter only for a single-lane single element group
access.
Richard Biener [Wed, 13 Nov 2024 12:56:13 +0000 (13:56 +0100)]
tree-optimization/117559 - avoid hybrid SLP for masked load/store lanes
Hybrid analysis is confused by the mask_conversion pattern making a
uniform mask non-uniform. As load/store lanes only uses a single
lane to mask all data lanes the SLP graph doesn't cover the alternate
(redundant) mask lanes and thus their pattern defs. The following adds
a hack to mark them covered.
Fixes gcc.target/aarch64/sve/mask_struct_store_?.c with forced SLP.
PR tree-optimization/117559
* tree-vect-slp.cc (vect_mark_slp_stmts): Pass in vinfo,
mark all mask defs of a load/store-lane .MASK_LOAD/STORE
as pure.
(vect_make_slp_decision): Adjust.
(vect_slp_analyze_bb_1): Likewise.
Richard Biener [Wed, 13 Nov 2024 13:43:27 +0000 (14:43 +0100)]
tree-optimization/117556 - SLP of live stmts from load-lanes
The following fixes SLP live lane generation for load-lanes which
fails to analyze for gcc.dg/vect/vect-live-slp-3.c because the
VLA division doesn't work out but it would also wrongly use the
transposed vector defs I think. The following properly disables
the actual load-lanes SLP node from live lane processing and instead
relies on the SLP permute node representing the live lane where we
can use extract-last to extract the last lane. This also fixes
the reported Ada miscompile.
PR tree-optimization/117556
PR tree-optimization/117553
* tree-vect-stmts.cc (vect_analyze_stmt): Do not analyze
the SLP load-lanes node for live lanes, but only the
permute node.
(vect_transform_stmt): Likewise for the transform.
* gcc.dg/vect/vect-live-slp-3.c: Expect us to SLP even for
VLA vectors (in single-lane mode).
Pan Li [Thu, 14 Nov 2024 06:16:15 +0000 (14:16 +0800)]
RISC-V: Rearrange the test files for scalar SAT_ADD [NFC]
The test files of scalar SAT_ADD only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}. For example,
test form 3 for uint32_t SAT_ADD will have -3-u32.c for asm check and
-run-3-u32.c for the run test.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
For cstorebf4 it uses comparison_operator for BFmode compare, which is
incorrect when directly uses ix86_expand_setcc as it does not canonicalize
the input comparison to correct the compare code by swapping operands.
The original code without AVX10.2 calls emit_store_flag_force, who
actually calls to emit_store_flags_1 and recurisive calls to this expander
again with swapped operand and flag.
Therefore, we can avoid do the redundant recurisive call by adjusting
the comparison_operator to ix86_fp_comparison_operator, and calls
ix86_expand_setcc directly.
gcc/ChangeLog:
PR target/117495
* config/i386/i386.md (cstorebf4): Use ix86_fp_comparison_operator
and calls ix86_expand_setcc directly.
gcc/testsuite/ChangeLog:
PR target/117495
* gcc.target/i386/pr117495.c: New test.