Andrew Pinski [Tue, 29 Oct 2024 16:16:18 +0000 (09:16 -0700)]
aarch64: Use canonicalize_comparison in ccmp expansion [PR117346]
While testing the patch for PR 85605 on aarch64, it was noticed that
imm_choice_comparison.c test failed. This was because canonicalize_comparison
was not being called in the ccmp case. This can be noticed without the patch
for PR 85605 as evidence of the new testcase.
Bootstrapped and tested on aarch64-linux-gnu.
PR target/117346
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_gen_ccmp_first): Call
canonicalize_comparison before figuring out the cmp_mode/cc_mode.
(aarch64_gen_ccmp_next): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/imm_choice_comparison-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andi Kleen [Fri, 25 Oct 2024 22:04:06 +0000 (15:04 -0700)]
Simplify switch bit test clustering algorithm
The current switch bit test clustering enumerates all possible case
clusters combinations to find ones that fit the bit test constrains
best. This causes performance problems with very large switches.
For bit test clustering which happens naturally in word sized chunks
I don't think such an expensive algorithm is really needed.
This patch implements a simple greedy algorithm that walks
the sorted list and examines word sized windows and tries
to cluster them.
Surprisingly the new algorithm gives consistly better clusters
for the examples I tried.
For example from the gcc bootstrap:
old: 0-15 16-31 96-175
new: 0-31 96-175
I'm not fully sure why that is, probably some bug in the old
algorithm? This shows even up in the test suite where if-to-switch-6
now can generate a switch, as well as a case in switch-1.c
I don't have a proof that the new algorithm is always as good or better,
but so far at least I don't see any counter examples.
It also fixes the excessive compile time in PR117091,
however this was already fixed by an earlier patch
that doesn't run clustering when no targets have multiple
values.
* gcc.dg/tree-ssa/if-to-switch-6.c: Allow condition chain.
* gcc.dg/tree-ssa/switch-1.c: Allow more bit tests.
* gcc.dg/pr21643.c: Use -fno-bit-tests
* gcc.target/aarch64/pr99988.c: Use -fno-bit-tests
Andi Kleen [Wed, 16 Oct 2024 21:07:18 +0000 (14:07 -0700)]
Only do switch bit test clustering when multiple labels point to same bb
The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.
This fixes the switch part of PR117091 where all case labels are unique
however it doesn't address the performance problems for non unique
cases.
gcc/ChangeLog:
PR middle-end/117091
* gimple-if-to-switch.cc (if_chain::is_beneficial): Update
find_bit_test call.
* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Get max_c argument and bail out early if all case labels are
unique.
(switch_decision_tree::compute_cases_per_edge): Record number of
targets per label and return.
(switch_decision_tree::analyze_switch_statement): ... pass to
find_bit_tests.
* tree-switch-conversion.h: Update prototypes.
Eric Botcazou [Tue, 29 Oct 2024 20:40:34 +0000 (21:40 +0100)]
Fix miscompilation of function containing __builtin_unreachable
This is a wrong-code generation on the SPARC for a function containing
a call to __builtin_unreachable caused by the delay slot scheduling pass,
and more specifically the find_end_label function which has these lines:
/* Otherwise, see if there is a label at the end of the function. If there
is, it must be that RETURN insns aren't needed, so that is our return
label and we don't have to do anything else. */
The comment was correct 20 years ago but no longer is nowadays in the
presence of RTL epilogues and calls to __builtin_unreachable, so the
patch just removes the associated two lines of code:
else if (LABEL_P (insn))
*plabel = as_a <rtx_code_label *> (insn);
and otherwise contains just adjustments to the commentary.
gcc/
PR rtl-optimization/117327
* reorg.cc (find_end_label): Do not return a dangling label at the
end of the function and adjust commentary.
gcc/testsuite/
* gcc.c-torture/execute/20241029-1.c: New test.
Andrew Pinski [Tue, 29 Oct 2024 20:01:30 +0000 (13:01 -0700)]
aarch64: Remove unnecessary casts to rtx_code [PR117349]
In aarch64_gen_ccmp_first/aarch64_gen_ccmp_next, the casts
were no longer needed after r14-3412-gbf64392d66f291 which
changed the type of the arguments to rtx_code.
In aarch64_rtx_costs, they were no longer needed since r12-4828-g1d5c43db79b7ea which changed the type of code
to rtx_code.
Pushed as obvious after a build/test for aarch64-linux-gnu.
Jakub Jelinek [Tue, 29 Oct 2024 19:14:09 +0000 (20:14 +0100)]
c-family: Handle RAW_DATA_CST in complete_array_type [PR117313]
The following testcase ICEs, because
add_flexible_array_elts_to_size -> complete_array_type
is done only after braced_lists_to_strings which optimizes
RAW_DATA_CST surrounded by INTEGER_CST into a larger RAW_DATA_CST
covering even the boundaries, while I thought it is done before
that.
So, RAW_DATA_CST now can be the last constructor_elt in a CONSTRUCTOR
and so we need the function to take it into account (handle it as
RAW_DATA_CST standing for RAW_DATA_LENGTH consecutive elements).
The function wants to support both CONSTRUCTORs without indexes and with
them (for non-RAW_DATA_CST elts it was just adding 1 for the current
index). So, if the RAW_DATA_CST elt has ce->index, we need to add
RAW_DATA_LENGTH (ce->value) - 1, while if it doesn't (and it isn't cnt == 0
case where curindex is 0), add that plus 1, i.e. RAW_DATA_LENGTH (ce->value).
2024-10-29 Jakub Jelinek <jakub@redhat.com>
PR c/117313
gcc/c-family/
* c-common.cc (complete_array_type): For RAW_DATA_CST elements
advance curindex by RAW_DATA_LENGTH or one less than that if
ce->index is non-NULL. Handle even the first element if
it is RAW_DATA_CST. Formatting fix.
gcc/testsuite/
* c-c++-common/init-6.c: New test.
Jason Merrill [Tue, 22 Oct 2024 21:45:00 +0000 (17:45 -0400)]
c++: printing AGGR_INIT_EXPR args
PR30854 was about wrongly dumping the dummy object argument to a
constructor; r126582 in 4.3 fixed that by skipping the first argument. But
not all functions called by AGGR_INIT_EXPR are constructors, as observed in
PR116634; we shouldn't skip for non-member functions. And let's combine the
printing code for CALL_EXPR and AGGR_INIT_EXPR.
This doesn't make us accept the ill-formed 116634 testcase again with a
pedwarn, just fixes the diagnostic issue.
Andrew Pinski [Tue, 29 Oct 2024 05:05:08 +0000 (22:05 -0700)]
testcase: Add testcase for tree-optimization/117341
Even though PR 117341 was a duplicate of PR 116768, another
testcase this time C++ does not hurt to have.
The testcase is a self-contained and does not use directly libstdc++
except for operator new (it does not even call delete).
Tested on x86_64-linux-gnu with it working.
PR tree-optimization/117341
gcc/testsuite/ChangeLog:
* g++.dg/torture/pr117341-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Pan Li [Wed, 23 Oct 2024 08:52:01 +0000 (16:52 +0800)]
RISC-V: Add testcases for form 1 of MASK_LEN_STRIDED_LOAD{STORE}
Form 1:
void __attribute__((noinline)) \
vec_strided_load_store_##T##_form_1 (T *restrict out, T *restrict in, \
long stride, size_t size) \
{ \
for (size_t i = 0; i < size; i++) \
out[i * stride] = in[i * stride]; \
}
The below test suites are passed for this patch:
* The riscv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add strided folder.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st.h: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st_data.h: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st_run.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Pan Li [Wed, 23 Oct 2024 08:46:53 +0000 (16:46 +0800)]
RISC-V: Implement the MASK_LEN_STRIDED_LOAD{STORE}
This patch would like to implment the MASK_LEN_STRIDED_LOAD{STORE} in
the RISC-V backend by leveraging the vector strided load/store insn.
For example:
void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
for (int i = 0; i < n; i++)
a[i*stride] = b[i*stride] + 100;
}
Before this patch:
38 │ vsetvli a5,a3,e32,m1,ta,ma
39 │ vluxei64.v v1,(a1),v4
40 │ mul a4,a2,a5
41 │ sub a3,a3,a5
42 │ vadd.vv v1,v1,v2
43 │ vsuxei64.v v1,(a0),v4
44 │ add a1,a1,a4
45 │ add a0,a0,a4
After this patch:
33 │ vsetvli a5,a3,e32,m1,ta,ma
34 │ vlse32.v v1,0(a1),a2
35 │ mul a4,a2,a5
36 │ sub a3,a3,a5
37 │ vadd.vv v1,v1,v2
38 │ vsse32.v v1,0(a0),a2
39 │ add a1,a1,a4
40 │ add a0,a0,a4
The below test suites are passed for this patch:
* The riscv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (mask_len_strided_load_<mode>): Add
new pattern for MASK_LEN_STRIDED_LOAD.
(mask_len_strided_store_<mode>): Ditto but for store.
* config/riscv/riscv-protos.h (expand_strided_load): Add new
func decl to expand strided load.
(expand_strided_store): Ditto but for store.
* config/riscv/riscv-v.cc (expand_strided_load): Add new
func impl to expand strided load.
(expand_strided_store): Ditto but for store.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Pan Li [Wed, 23 Oct 2024 08:43:37 +0000 (16:43 +0800)]
RISC-V: Adjust the gather-scatter testcases due to middle-end change
After we have MASK_LEN_STRIDED_LOAD{STORE} in the middle-end, the
strided case need to be adjust for IR check.
The below test suites are passed for this patch:
* The riscv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
Adjust IR for MASK_LEN_LOAD check.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c:
Ditto but for store.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c:
Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_get_strided_load_store_ops): Handle
MASK_LEN_STRIDED_LOAD{STORE} after supported check.
(vectorizable_store): Generate MASK_LEN_STRIDED_LOAD when the offset
of gater is not vector type.
(vectorizable_load): Ditto but for store.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
The IFN target below code example similar as below
void foo (int * a, int * b, int stride, int n)
{
for (int i = 0; i < n; i++)
a[i * stride] = b[i * stride];
}
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* internal-fn.cc (strided_load_direct): Add new define direct
for strided load.
(strided_store_direct): Ditto but for store.
(expand_strided_load_optab_fn): Add new func to expand the IFN
MASK_LEN_STRIDED_LOAD in middle-end.
(expand_strided_store_optab_fn): Ditto but for store.
(direct_strided_load_optab_supported_p): Add define for stride
load optab supported.
(direct_strided_store_optab_supported_p): Ditto but for store.
(internal_fn_len_index): Add strided load/store len index.
(internal_fn_mask_index): Ditto but for mask.
(internal_fn_stored_value_index): Add strided store value index.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Add new IFN for
strided load.
(MASK_LEN_STRIDED_STORE): Ditto but for store.
* optabs.def (mask_len_strided_load_optab): Add strided load optab.
(mask_len_strided_store_optab): Add strided store optab.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Richard Biener [Sat, 26 Oct 2024 12:27:14 +0000 (14:27 +0200)]
Remove dead vect_recog_mixed_size_cond_pattern
vect_recog_mixed_size_cond_pattern only applies to COMPARISON_CLASS_P
rhs1 COND_EXPRs which no longer appear - the following removes it.
Its testcases still pass, I believe the situation is mitigated by
bool pattern handling of the compare use in COND_EXPRs.
Patrick Palka [Tue, 29 Oct 2024 13:26:19 +0000 (09:26 -0400)]
libstdc++: Fix complexity of drop_view::begin() const [PR112641]
Views are required to have a amortized O(1) begin(), but our drop_view's
const begin overload is O(n) for non-common ranges with a non-sized
sentinel. This patch reimplements it so that it's O(1) always. See
also LWG 4009.
PR libstdc++/112641
libstdc++-v3/ChangeLog:
* include/std/ranges (drop_view::begin): Reimplement const
overload so that it's O(1) always.
* testsuite/std/ranges/adaptors/drop.cc (test10): New test.
David Malcolm [Tue, 29 Oct 2024 12:25:56 +0000 (08:25 -0400)]
jit: fix leak of pending_assemble_externals_set [PR117275]
My recent r15-4580-g779c0390e3b57d fix for resetting state in
varasm.cc introduced some noise to "make selftest-valgrind" and,
presumably, a memory leak in libgccjit:
==2462086== 160 (56 direct, 104 indirect) bytes in 1 blocks are definitely lost in loss record 248 of 352
==2462086== at 0x5270E7D: operator new(unsigned long) (vg_replace_malloc.c:342)
==2462086== by 0x1D1EB89: init_varasm_once() (varasm.cc:6806)
==2462086== by 0x181C845: backend_init() (toplev.cc:1826)
==2462086== by 0x181D41A: do_compile() (toplev.cc:2193)
==2462086== by 0x181D99C: toplev::main(int, char**) (toplev.cc:2371)
==2462086== by 0x378391D: main (main.cc:39)
Fixed thusly.
gcc/ChangeLog:
PR jit/117275
* varasm.cc (process_pending_assemble_externals): Reset
pending_assemble_externals_set to nullptr after deleting it.
(varasm_cc_finalize): Delete pending_assemble_externals_set.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Biener [Tue, 29 Oct 2024 10:26:13 +0000 (11:26 +0100)]
tree-optimization/117343 - decide_masked_load_lanes and stale graph
It turns out decide_masked_load_lanes accesses a stale SLP graph
so the following re-builds it instead.
PR tree-optimization/117343
* tree-vect-slp.cc (vect_optimize_slp_pass::build_vertices):
Support re-building the SLP graph.
(vect_optimize_slp_pass::run): Re-build the SLP graph before
decide_masked_load_lanes.
Jakub Jelinek [Tue, 29 Oct 2024 10:14:12 +0000 (11:14 +0100)]
libstdc++: Use if consteval rather than if (std::__is_constant_evaluated()) for {,b}float16_t nextafter [PR117321]
The nextafter_c++23.cc testcase fails to link at -O0.
The problem is that eventhough std::__is_constant_evaluated() has
always_inline attribute, that at -O0 just means that we inline the
call, but its result is still assigned to a temporary which is tested
later, nothing at -O0 propagates that false into the if and optimizes
away the if body. And the __builtin_nextafterf16{,b} calls are meant
to be used solely for constant evaluation, the C libraries don't
define nextafterf16 these days.
As __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__ are predefined right
now only by GCC, not by clang which doesn't implement the extended floating
point types paper, and as they are predefined in C++23 and later modes only,
I think we can just use if consteval which is folded already during the FE
and the body isn't included even at -O0. I've added a feature test for
that just in case clang implements those and implements those in some weird
way. Note, if (__builtin_is_constant_evaluted()) would work correctly too,
that is also folded to false at gimplification time and the corresponding
if block not emitted at all. But for -O0 it can't be wrapped into a helper
inline function.
2024-10-29 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/117321
* include/c_global/cmath (nextafter(_Float16, _Float16)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16 call.
(nextafter(__gnu_cxx::__bfloat16_t, __gnu_cxx::__bfloat16_t)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16b call.
* testsuite/26_numerics/headers/cmath/117321.cc: New test.
Implement the mve vld and vst intrinsics using the MVE builtins framework.
The main part of the patch is to reimplement to vstr/vldr patterns
such that we now have much fewer of them:
- non-truncating stores
- predicated non-truncating stores
- truncating stores
- predicated truncating stores
- non-extending loads
- predicated non-extending loads
- extending loads
- predicated extending loads
This enables us to update the implementation of vld1/vst1 and use the
new vldr/vstr builtins.
The patch also adds support for the predicated vld1/vst1 versions.
gcc.target/arm/pr112337.c needs an update, to call the intrinsic
instead of the builtin, which this patch deletes.
arm: [MVE intrinsics] Add support for predicated contiguous loads and stores
This patch extends
function_expander::use_contiguous_load_insn and
function_expander::use_contiguous_store_insn functions to
support predicated versions.
* config/arm/arm-mve-builtins.cc
(function_expander::use_contiguous_load_insn): Add support for
PRED_z.
(function_expander::use_contiguous_store_insn): Add support for
PRED_p.
* config/arm/arm-mve-builtins-functions.h
(load_extending): New class.
(store_truncating): New class.
* config/arm/arm-protos.h (arm_mve_data_mode): New helper function.
* config/arm/arm.cc (arm_mve_data_mode): New helper function.
The tests for vst* instrinsics use functions which return a void
expression which can generate a warning. This hasn't come up previously
as the inlining presumably prevents the warning.
This change removed the uneccessary and incorrect returns.
I believe the new C2Y <stdbit.h> type-generic functions
stdc_rotate_{left,right} have the same problems the other stdc_*
type-generic functions had. If we want to support arbitrary
unsigned _BitInt(N), don't want to use statement expressions
(so that one can actually use them in static variable initializers),
don't want to evaluate the arguments multiple times and don't want
to expand the arguments multiple times during preprocessing to avoid the
old tgmath preprocessing bloat, we need a built-in for those.
The following patch adds those. And as we need to support rotations by 0
and tree-ssa-forwprop.cc is only able to pattern recognize with BIT_AND_EXPR
for that case (i.e. for power of two widths), the patch just constructs
LROTATE_EXPR/RROTATE_EXPR right away. Negative second arguments are
considered UB, while positive ones are modulo precision.
2024-10-29 Jakub Jelinek <jakub@redhat.com>
PR c/117030
gcc/
* doc/extend.texi (__builtin_stdc_rotate_left,
__builtin_stdc_rotate_right): Document.
gcc/c-family/
* c-common.cc (c_common_reswords): Add __builtin_stdc_rotate_left
and __builtin_stdc_rotate_right.
* c-ubsan.cc (ubsan_instrument_shift): For {L,R}ROTATE_EXPR
just check if op1 is negative.
gcc/c/
* c-parser.cc: Include asan.h and c-family/c-ubsan.h.
(c_parser_postfix_expression): Handle __builtin_stdc_rotate_left
and __builtin_stdc_rotate_right.
* c-fold.cc (c_fully_fold_internal): Handle LROTATE_EXPR and
RROTATE_EXPR.
gcc/testsuite/
* gcc.dg/builtin-stdc-rotate-1.c: New test.
* gcc.dg/builtin-stdc-rotate-2.c: New test.
* gcc.dg/ubsan/builtin-stdc-rotate-1.c: New test.
* gcc.dg/ubsan/builtin-stdc-rotate-2.c: New test.
David Malcolm [Mon, 28 Oct 2024 22:43:11 +0000 (18:43 -0400)]
testsuite: drop the "test-" prefix from sarif-output python scripts
Drop the "text-" prefix from the various gcc.dg/sarif-output/test-*.py
scripts so that the scripts are close to the .c files they are used by
when the files are sorted by name.
Jonathan Wakely [Mon, 28 Oct 2024 13:05:53 +0000 (13:05 +0000)]
libstdc++: Fix tests for std::vector range operations
The commit I pushed was not the one I'd tested, so it had older versions
of the tests, with bugs that I'd already fixed locally. This commit has
the fixed tests that I'd intended to push in the first place.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/vector/bool/cons/from_range.cc: Use
dg-do run instead of compile.
(test_ranges): Use do_test instead of do_test_a for rvalue
range.
(test_constexpr): Call function template instead of just
instantiating it.
* testsuite/23_containers/vector/bool/modifiers/assign/assign_range.cc:
Use dg-do run instead of compile.
(do_test): Use same test logic for vector<bool> as for primary
template.
(test_constexpr): Call function template instead of just
instantiating it.
* testsuite/23_containers/vector/bool/modifiers/insert/append_range.cc:
Use dg-do run instead of compile.
(test_ranges): Use do_test instead of do_test_a for rvalue
range.
(test_constexpr): Call function template instead of just
instantiating it.
* testsuite/23_containers/vector/bool/modifiers/insert/insert_range.cc:
Use dg-do run instead of compile.
(do_test): Fix incorrect function arguments to match intended
results.
(test_ranges): Use do_test instead of do_test_a for rvalue
range.
(test_constexpr): Call function template instead of just
instantiating it.
* testsuite/23_containers/vector/cons/from_range.cc: Use dg-do
run instead of compile.
(test_ranges): Fix ill-formed call to do_test.
(test_constexpr): Call function template instead of just
instantiating it.
* testsuite/23_containers/vector/modifiers/append_range.cc:
Use dg-do run instead of compile.
(test_constexpr): Likewise.
* testsuite/23_containers/vector/modifiers/assign/assign_range.cc:
Use dg-do run instead of compile.
(do_test): Do not reuse input ranges.
(test_constexpr): Call function template instead of just
instantiating it.
* testsuite/23_containers/vector/modifiers/insert/insert_range.cc:
Use dg-do run instead of compile.
(do_test): Fix incorrect function arguments to match intended
results.
(test_constexpr): Call function template instead of just
instantiating it.
Jason Merrill [Tue, 17 Sep 2024 21:38:35 +0000 (17:38 -0400)]
build: update bootstrap req to C++14
We moved to a bootstrap requirement of C++11 in GCC 11, 8 years after
support was stable in GCC 4.8.
It is now 8 years since C++14 was the default mode in GCC 6 (and 9 years
since support was complete in GCC 5), and we have a few bits of optional
C++14 code in the compiler, so it seems a good time to update the bootstrap
requirement again.
The big benefit of the change is the greater constexpr power, but C++14 also
added variable templates, generic lambdas, lambda init-capture, binary
literals, and numeric literal digit separators.
C++14 was feature-complete in GCC 5, and became the default in GCC 6. 5.4.0
bootstraps trunk correctly; trunk stage1 built with 5.3.0 breaks in
eh_data_format_name due to PR69995.
gcc/ChangeLog:
* doc/install.texi (Prerequisites): Update to C++14.
ChangeLog:
* configure.ac: Update requirement to C++14.
* configure: Regenerate.
Jeff Law [Mon, 28 Oct 2024 11:39:24 +0000 (05:39 -0600)]
[target/117316] Fix initializer for riscv code alignment handling
The construct used for initializing the code alignments in a recent change is
causing bootstrap problems on riscv64 as seen in the referenced bugzilla.
This patch adjusts the initializer by pushing the NULL down into each uarch
clause. Bootstrapped on riscv64, regression test in flight, but given
bootstrap is broken it seemed advisable to move this forward now.
I'm so much looking forward to the day when we have performant hardware for
bootstrap testing... Sigh.
Anyway, bootstrapped and installing on the trunk.
PR target/117316
gcc/
* config/riscv/riscv.cc (riscv_tune_param): Drop initializer.
(*_tune_info): Add initializers for code alignments.
STMT_VINFO_SLP_VECT_ONLY isn't properly computed as union of all
group members and when the group is later split due to duplicates
not all sub-groups inherit the flag.
PR tree-optimization/117307
* tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
Properly compute STMT_VINFO_SLP_VECT_ONLY. Set it on all
parts of a split group.
Andrew Pinski [Sun, 27 Oct 2024 20:16:22 +0000 (13:16 -0700)]
vec-lowering: Fix ABSU lowering [PR111285]
ABSU_EXPR lowering incorrectly used the resulting type
for the new expression but in the case of ABSU the resulting
type is an unsigned type and with ABSU is folded away. The fix
is to use a signed type for the expression instead.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/111285
gcc/ChangeLog:
* tree-vect-generic.cc (do_unop): Use a signed type for the
operand if the operation was ABSU_EXPR.
gcc/testsuite/ChangeLog:
* g++.dg/torture/vect-absu-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 27 Oct 2024 03:37:36 +0000 (20:37 -0700)]
phiopt: Move check for maybe_undef_p slightly earlier
This moves the check for maybe_undef_p in match_simplify_replacement
slightly earlier before figuring out the true/false arg using arg0/arg1
instead.
In most cases this is no difference in compile time; just in the case
there is an undef in the args there would be a slight compile time
improvement as there is no reason to figure out which arg corresponds
to the true/false side of the conditional.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement): Move
check for maybe_undef_p earlier.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Sat, 26 Oct 2024 12:18:37 +0000 (14:18 +0200)]
Remove code in vectorizer pattern recog relying on vec_cond{u,eq,}
With the intent to rely on vec_cond_mask and vec_cmp patterns
comparisons do not need rewriting into COND_EXPRs that eventually
combine to vec_cond{u,eq,}.
* tree-vect-patterns.cc (check_bool_pattern): For comparisons
we do nothing if we can expand them or we can't replace them
with a ? -1 : 0 condition - but the latter would require
expanding the comparison which we proved we can't. So do
nothing, aka not think vec_cond{u,eq,} will save us.
Jonathan Wakely [Tue, 8 Oct 2024 20:15:18 +0000 (21:15 +0100)]
libstdc++: Add P1206R7 from_range members to std::vector [PR111055]
This is another piece of P1206R7, adding new members to std::vector and
std::vector<bool>.
The __uninitialized_copy_a extension needs to be enhanced to support
passing non-common ranges (i.e. a sentinel that is a different type from
the iterator) and move-only input iterators.
libstdc++-v3/ChangeLog:
PR libstdc++/111055
* include/bits/ranges_base.h (__container_compatible_range): New
concept.
* include/bits/stl_bvector.h (vector(from_range, R&&, const Alloc&))
(assign_range, insert_range, append_range): Define.
* include/bits/stl_uninitialized.h (__do_uninit_copy): Support
non-common ranges.
(__uninitialized_copy_a): Likewise.
* include/bits/stl_vector.h (_Vector_base::_M_append_range_to):
New function.
(_Vector_base::_M_append_range): Likewise.
(vector(from_range, R&&, const Alloc&), assign_range): Define.
(append_range): Define.
(insert_range): Declare.
* include/debug/vector (vector(from_range, R&&, const Alloc&))
(assign_range, insert_range, append_range): Define.
* include/bits/vector.tcc (insert_range): Define.
* testsuite/util/testsuite_iterators.h (input_iterator_wrapper_rval):
New class template.
* testsuite/23_containers/vector/bool/cons/from_range.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/assign/assign_range.cc:
New test.
* testsuite/23_containers/vector/bool/modifiers/insert/append_range.cc:
New test.
* testsuite/23_containers/vector/bool/modifiers/insert/insert_range.cc:
New test.
* testsuite/23_containers/vector/cons/from_range.cc: New test.
* testsuite/23_containers/vector/modifiers/append_range.cc: New test.
* testsuite/23_containers/vector/modifiers/assign/assign_range.cc:
New test.
* testsuite/23_containers/vector/modifiers/insert/insert_range.cc:
New test.
Jonathan Wakely [Sat, 26 Oct 2024 20:24:58 +0000 (21:24 +0100)]
libstdc++: Fix std::vector<bool>::emplace to forward parameter
If the parameter is not lvalue-convertible to bool then the current code
will fail to compile. The parameter should be forwarded to restore the
original value category.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (emplace_back, emplace): Forward
parameter pack to preserve value category.
* testsuite/23_containers/vector/bool/emplace_rvalue.cc: New
test.
Fangrui Song [Sun, 27 Oct 2024 19:37:21 +0000 (12:37 -0700)]
arm: Support -mfdpic for more targets
Targets that are not arm*-*-uclinuxfdpiceabi can use -S -mfdpic, but -c
-mfdpic does not pass --fdpic to gas. This is an unnecessary
restriction. Just define the ASM_SPEC in bpabi.h.
Additionally, use armelf[b]_linux_fdpiceabi emulations for -mfdpic in
linux-eabi.h. This will allow a future musl fdpic port to use the
desired BFD emulation.
gcc/ChangeLog:
* config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): Transform -mfdpic.
* config/arm/linux-eabi.h (TARGET_FDPIC_LINKER_EMULATION): Define.
(SUBTARGET_EXTRA_LINK_SPEC): Use TARGET_FDPIC_LINKER_EMULATION
if -mfdpic.
In commit bc5a9dab55d13f888a3cdd150c8cf5c2244f35e0 ("gcc: xtensa: reorder
movsi_internal patterns for better code generation during LRA"), the
instruction order in "movsi_internal" MD definition was changed to make LRA
use load/store instructions with larger memory address displacements, but as
a side effect, it now uses the larger displacements (ie., the larger
instructions) even outside of reload operations.
The underlying problem is that LRA assumes by default that there is only one
maximal legitimate displacement for the same address structure, meaning that
it has no choice but to use the first load/store instruction it finds.
To fix this, define TARGET_DIFFERENT_ADDR_DISPLACEMENT_P hook to always
return true.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (TARGET_DIFFERENT_ADDR_DISPLACEMENT_P):
Add new target hook to always return true.
* config/xtensa/xtensa.md (movsi_internal):
Revert the previous changes.
Jakub Jelinek [Sun, 27 Oct 2024 15:44:35 +0000 (16:44 +0100)]
genmatch: Add selftests to genmatch for diag_vfprintf
The following patch adds selftests to genmatch to verify the new printing
routine there.
So that I can rely on HAVE_DECL_FMEMOPEN (host test), the tests are done
solely in stage2+ where we link the host libcpp etc. to genmatch.
The tests have been adjusted from pretty-print.cc (test_pp_format),
and I've added to that function two new tests because I've noticed nothing
was testing the %M$.*N$s etc. format specifiers.
2024-10-27 Jakub Jelinek <jakub@redhat.com>
* configure.ac (gcc_AC_CHECK_DECLS): Add fmemopen.
* configure: Regenerate.
* config.in: Regenerate.
* Makefile.in (build/genmatch.o): Add -DGENMATCH_SELFTESTS to
BUILD_CPPFLAGS for stage2+ genmatch.
* genmatch.cc (test_diag_vfprintf, genmatch_diag_selftests): New
functions.
(main): Call genmatch_diag_selftests.
* pretty-print.cc (test_pp_format): Add two tests, one for %M$.*N$s
and one for %M$.Ns.
Jakub Jelinek [Sun, 27 Oct 2024 15:42:53 +0000 (16:42 +0100)]
c-family: -Wleading-whitespace= argument spelling
On Thu, Oct 24, 2024 at 03:33:25PM -0400, Eric Gallager wrote:
> On Thu, Oct 24, 2024 at 4:17 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > I've tried to build stage3 with
> > -Wleading-whitespace=blanks -Wtrailing-whitespace=blank -Wno-error=leading-whitespace=blanks -Wno-error=trailing-whitespace=blank
>
> So wait, it's "blanks" (plural) when it's leading, but "blank"
> (singular) when it's trailing? That inconsistency bothers me...
I've mentioned it already in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664664.html
Citing that here:
Not sure about the kinds for the option, given -Wleading-whitespace=
uses plural and this option singular and -Wleading-whitespace= spaces
means literally just ' ' characters, while space in
-Wtrailing-whitespace= was ' ', '\t', '\v' and '\f'; so category;
perhaps just use any and blanks?
Other preferences?
Here is a patch to do the blank->blanks and space->any changes.
2024-10-27 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/invoke.texi (Wtrailing-whitespace=): Change
blank argument to blanks and space argument to any.
gcc/c-family/
* c.opt (warn_trailing_whitespace_kind): Change blank
to blanks and space to any.
gcc/testsuite/
* c-c++-common/cpp/Wtrailing-whitespace-2.c: Use
-Wtrailing-whitespace=blanks rather than -Wtrailing-whitespace=blank.
* c-c++-common/cpp/Wtrailing-whitespace-3.c: Use
-Wtrailing-whitespace=any rather than -Wtrailing-whitespace=space.
* c-c++-common/cpp/Wtrailing-whitespace-7.c: Use
-Wtrailing-whitespace=blanks rather than -Wtrailing-whitespace=blank.
* c-c++-common/cpp/Wtrailing-whitespace-8.c: Use
-Wtrailing-whitespace=any rather than -Wtrailing-whitespace=space.
Jakub Jelinek [Sun, 27 Oct 2024 15:41:28 +0000 (16:41 +0100)]
testsuite: Fix up gcc.dg/vec-perm-lower.c test
On Tue, Oct 15, 2024 at 12:45:35PM +0000, Tamar Christina wrote:
> I'll write a gimple one and commit with this then.
The new test FAILs on i686-linux, with the usual
FAIL: gcc.dg/vec-perm-lower.c (test for excess errors)
Excess errors:
.../gcc/testsuite/gcc.dg/vec-perm-lower.c:9:1: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi]
.../gcc/testsuite/gcc.dg/vec-perm-lower.c:8:1: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
The following patch fixes that.
Tested on x86_64-linux with
make check-gcc RUNTESTFLAGS='--target_board=unix/\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} dg.exp=vec-perm-lower.c'
which previously FAILed, now PASSes, ok for trunk?
2024-10-27 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/vec-perm-lower.c: Add -Wno-psabi to dg-options.
Paul Thomas [Sun, 27 Oct 2024 12:40:42 +0000 (12:40 +0000)]
Fortran: Fix regressions with intent(out) class[PR115070, PR115348].
2024-10-27 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/115070
PR fortran/115348
* trans-expr.cc (gfc_trans_class_init_assign): If all the
components of the default initializer are null for a scalar,
build an empty statement to prevent prior declarations from
disappearing.
gcc/testsuite/
PR fortran/115070
* gfortran.dg/pr115070.f90: New test.
PR fortran/115348
* gfortran.dg/pr115348.f90: New test.
Andrew Pinski [Sat, 26 Oct 2024 09:14:18 +0000 (02:14 -0700)]
tree: Mark PAREN_EXPR and VEC_DUPLICATE_EXPR as non-trapping [PR117234]
While looking to fix a possible trapping issue in PHI-OPT's factor,
I noticed that some tree codes could be marked as trapping even
though they don't have a possibility to trap. In the case of PAREN_EXPR,
it is basically a nop except when it comes to association across it so
it can't trap.
In the case of VEC_DUPLICATE_EXPR, it is similar to a CONSTRUCTOR, so it
can't trap.
This fixes those 2 issues and adds 4 testcases, 2 which are specific to aarch64
since the only way to get a VEC_DUPLICATE_EXPR is to use intrinsics currently.
Build and tested for aarch64-linux-gnu.
PR tree-optimization/117234
gcc/ChangeLog:
* tree-eh.cc (operation_could_trap_helper_p): Treat
PAREN_EXPR and VEC_DUPLICATE_EXPR like constructing
expressions.
gcc/testsuite/ChangeLog:
* g++.dg/eh/noncall-fp-1.C: New test.
* g++.target/aarch64/sve/noncall-eh-fp-1.C: New test.
* gcc.dg/tree-ssa/trapping-1.c: New test.
* gcc.target/aarch64/sve/trapping-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Thomas Koenig [Sat, 26 Oct 2024 17:20:14 +0000 (19:20 +0200)]
Add UNSIGNED for intrinsics.
gcc/fortran/ChangeLog:
* gfortran.texi: Correct reference to make clear that UNSIGNED
will not be part of F202Y.
Other clarifications.
Extend table of intrinsics, add links.
* intrinsic.texi: Add descriptions for UNSIGNED arguments.
* invoke.texi: Add anchor for -funsigned.
But more is needed: this gets the test working properly in terms of scanning
the dump and handling the interaction w/ LTO with not producing an executable
(did try ltrans scan but that didn't work either).
Unfortunately, the test seems to fail for me on godbolt even going back to
GCC 7.1 or thereabouts, hence XFAIL. However, if I revert r9-3870-g2a98b4bfc3d952,
I do get an ICE in fld_incomplete_type_of -- because we do far more checking
with LTO now on (in)complete types. And reverting it on releases/gcc-9 actually
makes it give 0.
In summary: fix the test fully so it really does run and we get a check
for ICEing at least, and mark the dg-final scan as XFAIL so Honza can
comment on that.
Andrew Pinski [Sun, 20 Oct 2024 17:44:14 +0000 (10:44 -0700)]
simplify-rtx: Handle `a != 0 ? -a : 0` [PR58195]
The gimple (and generic) levels have this optmization since r12-2041-g7d6979197274a662da7bdc5.
It seems like a good idea to add a similar one to rtl just in case it is not caught at the
gimple level.
Note the loop case in csel-neg-1.c is not handled at the gimple level (even with phiopt turned back on),
this is because of casts to avoid signed integer overflow; a patch to fix this at the gimple level will be
submitted seperately.
Changes since v1:
* v2: Use `CONST0_RTX (mode)` instead of const0_rtx. Add csel-neg-2.c for float testcase which now passes.
Build and tested for aarch64-linux-gnu.
PR rtl-optimization/58195
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_ternary_operation): Handle
`a != 0 ? -a : 0` and `a == 0 ? 0 : -a`.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/csel-neg-1.c: New test.
* gcc.target/aarch64/csel-neg-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Sam James [Fri, 25 Oct 2024 20:12:21 +0000 (21:12 +0100)]
testsuite: lto: fix pr47333 test
This failure was hidden until we started to run the test by fixing
the filename earlier: ignore -Wtemplate-body using a pragma like
e.g. g++.dg/lto/20101010-1_0.C does because lto.exp doesn't support
dg-additional-options.
Sam James [Fri, 25 Oct 2024 16:59:31 +0000 (17:59 +0100)]
testsuite: lto: fix tbaa_0 test
These failures were hidden until we started to run the test by fixing
the filename earlier: use dg-lto directives, pass -std=gnu89 for
implicit-int, and use -flto-partition=none like c-c++-common/hwasan/builtin-special-handling.c.
gcc/testsuite/ChangeLog:
* gcc.dg/lto/tbaa_0.c: Use dg-lto directives, pass -std=gnu89, and
use -flto-partition=none.
Andrew Pinski [Thu, 22 Aug 2024 21:34:03 +0000 (14:34 -0700)]
toplevel: Error out if using --disable-libstdcxx with bootstrap [PR105474]
Bootstrapping and using --disable-libstdcxx will cause a build failure deep in compiling
stage2 so instead error out early in the toplevel configure so it is more user friendly.
Bootstrapped and tested on x86_64-linux-gnu.
Also made sure --disable-libstdcxx without --disable-bootstrap failed.
PR bootstrap/105474
ChangeLog:
* configure: Regenerate.
* configure.ac: Error out if libstdc++ is not enabled
with bootstrapping.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 4 May 2024 09:03:16 +0000 (02:03 -0700)]
aarch64: Support multiple variants including up to 3
On some of the Qualcomm's SoC that includes oryon-1 core, the variant
will be different on the cores due to big.little config. Though
the difference between big and little is not significant enough
to have seperate cost/scheduling models for them and the feature set
is the same across all variants.
Also on some SoCs, there are 3 variants of the core, big.middle.little
so this increases the support there for up to 3 cores and 3 variants
in the original parsing loop but it does not change the support for max
of 2 different cores.
After this patch and the patch that adds oryon-1, -mcpu=native works
on the SoCs I am working with.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Support
3 cores and 3 variants. If there is one core but multiple variant,
then treat the variant as being all.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/info_25: New file.
* gcc.target/aarch64/cpunative/info_26: New file.
* gcc.target/aarch64/cpunative/native_cpu_25.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_26.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Wilco Dijkstra [Fri, 25 Oct 2024 14:53:58 +0000 (14:53 +0000)]
AArch64: Add more accurate constraint [PR117292]
As shown in the PR, reload may only check the constraint in some cases and
and not check the predicate is still valid for the resulting instruction.
To fix the issue, add a new constraint which matches the predicate exactly.
gcc/ChangeLog:
PR target/117292
* config/aarch64/aarch64-simd.md (xor<mode>3<vczle><vczbe>): Use
'De' constraint.
* config/aarch64/constraints.md (De): Add new constraint.
Paul Thomas [Fri, 25 Oct 2024 16:59:03 +0000 (17:59 +0100)]
Fortran: Fix ICE with structure constructor in data statement [PR79685]
2024-10-25 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/79685
* decl.cc (match_data_constant): Find the symtree instead of
the symbol so the use renamed symbols are found. Pass this and
the derived type to gfc_match_structure_constructor.
* match.h: Update prototype of gfc_match_structure_contructor.
* primary.cc (gfc_match_structure_constructor): Remove call to
gfc_get_ha_sym_tree and use caller supplied symtree instead.
gcc/testsuite/
PR fortran/79685
* gfortran.dg/use_rename_13.f90: New test.
Jennifer Schmitz [Thu, 17 Oct 2024 15:40:34 +0000 (08:40 -0700)]
match.pd: Add std::pow folding optimizations.
This patch adds the following two simplifications in match.pd for
POW_ALL and POWI:
- pow (1.0/x, y) to pow (x, -y), avoiding the division
- pow (0.0, x) to 0.0, avoiding the call to pow.
The patterns are guarded by flag_unsafe_math_optimizations,
!flag_trapping_math, and !HONOR_INFINITIES.
The POW_ALL patterns are also gated under !flag_errno_math.
The second pattern is also guarded by !HONOR_NANS and
!HONOR_SIGNED_ZEROS.
Tests were added to confirm the application of the transform for
builtins pow, powf, powl, powi, powif, powil, and powf16.
The patch was bootstrapped and regtested on aarch64-linux-gnu and
x86_64-linux-gnu, no regression.
OK for mainline?
Pan Li [Thu, 24 Oct 2024 13:57:04 +0000 (21:57 +0800)]
Match: Simplify branch form 3 of unsigned SAT_ADD into branchless
There are sorts of forms for the unsigned SAT_ADD. Some of them are
complicated while others are cheap. This patch would like to simplify
the complicated form into the cheap ones. For example as below:
From the form 3 (branch):
SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1.
The simplify doesn't need to check if target support the SAT_ADD, it
is somehow the optimization in gimple level.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Remove unsigned branch form 3 for SAT_ADD, and
add simplify to branchless instead.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/sat_u_add-simplify-1-u16.c: New test.
* gcc.dg/tree-ssa/sat_u_add-simplify-1-u32.c: New test.
* gcc.dg/tree-ssa/sat_u_add-simplify-1-u64.c: New test.
* gcc.dg/tree-ssa/sat_u_add-simplify-1-u8.c: New test.
Jakub Jelinek [Fri, 25 Oct 2024 12:09:42 +0000 (14:09 +0200)]
Assorted --disable-checking fixes [PR117249]
We have currently 3 different definitions of gcc_assert macro, one used most
of the time (unless --disable-checking) which evaluates the condition at
runtime and also checks it at runtime, then one for --disable-checking GCC 4.5+
which looks like
((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
and a fallback one
((void)(0 && (EXPR)))
Now, the last one actually doesn't evaluate any of the side-effects in the
argument, just quiets up unused var/parameter warnings.
I've tried to replace the middle definition with
({ [[assume (EXPR)]]; (void) 0; })
for compilers which support assume attribute and statement expressions
(surprisingly quite a few spots use gcc_assert inside of comma expressions),
but ran into PR117287, so for now such a change isn't being proposed.
The following patch attempts to move important side-effects from gcc_assert
arguments.
Bootstrapped/regtested on x86_64-linux and i686-linux with normal
--enable-checking=yes,rtl,extra, plus additionally I've attempted to do
x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
((void)(0 && (EXPR)))
version when --disable-checking. That version ran into spurious middle-end
warnings
../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too large [-Werror=alloca-larger-than=]
../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro ‘XALLOCAVEC’
int op_num = ops.length ();
int op_normal_num = op_num;
gcc_assert (op_num > 0);
int stmt_num = op_num - 1;
gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
where we have gcc_assert exactly to work-around middle-end warnings.
Guess I'd need to also disable -Werror for this experiment, which actually
isn't a problem with unmodified system.h, because even for
--disable-checking we use the __builtin_unreachable at least in
stage2/stage3 and so the warnings aren't emitted, and even if it used
[[assume ()]]; it would work too because in stage2/stage3 we could again
rely on assume and statement expression support.
2024-10-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117249
* tree-ssa-structalias.cc (insert_vi_for_tree): Move put calls out of
gcc_assert.
* lto-cgraph.cc (lto_symtab_encoder_delete_node): Likewise.
* gimple-ssa-strength-reduction.cc (get_alternative_base,
add_cand_for_stmt): Likewise.
* tree-eh.cc (add_stmt_to_eh_lp_fn): Likewise.
* except.cc (duplicate_eh_regions_1): Likewise.
* tree-ssa-reassoc.cc (insert_operand_rank): Likewise.
* config/nvptx/nvptx.cc (nvptx_expand_call): Use == rather than = in
gcc_assert.
* opts-common.cc (jobserver_info::disconnect): Call close outside of
gcc_assert and only check result in it.
(jobserver_info::return_token): Call write outside of gcc_assert and
only check result in it.
* genautomata.cc (output_default_latencies): Move j++ side-effect
outside of gcc_assert.
* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Use
== rather than = in gcc_assert.
* cgraph.cc (symbol_table::create_edge): Move ++edges_max_uid
side-effect outside of gcc_assert.
Jakub Jelinek [Fri, 25 Oct 2024 12:05:37 +0000 (14:05 +0200)]
lto: Handle RAW_DATA_CST in compare_tree_sccs_1 [PR117201]
I've missed I need to add RAW_DATA_CST support in compare_tree_sccs_1,
because without that it considers all RAW_DATA_CSTs to be equivalent,
regardless of their length or content.
Richard Biener [Fri, 25 Oct 2024 10:38:24 +0000 (12:38 +0200)]
Default expand_vec_cond_expr_p code to ERROR_MARK
As we want to transition to only vcond_mask expanders the following
makes it possible to easier distinguish queries that rely on
vcond queries for expand_vec_cond_expr_p from those of vcond_mask
by for the latter having the comparison code defaulted to ERROR_MARK.
My recent gcc.dg/tree-ssa/shifts-3.c test failed on arm-linux-gnu
because it used widen_mult_expr to do a multiplication on chars.
This patch generalises the regexp in the same way as for f3.
gcc/testsuite/
* gcc.dg/tree-ssa/shifts-3.c: Accept widen_mult for f2 too.
libstdc++: implement concatenation of strings and string_views
This adds support for P2591R5, merged for C++26.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h: Implement the four operator+
overloads between basic_string and (types convertible to)
basic_string_view.
* include/bits/version.def: Bump the feature-testing macro.
* include/bits/version.h: Regenerate.
* testsuite/21_strings/basic_string/operators/char/op_plus_fspath_neg.cc: New test.
* testsuite/21_strings/basic_string/operators/char/op_plus_string_view.cc: New test.
* testsuite/21_strings/basic_string/operators/char/op_plus_string_view_compat.cc:
New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Richard Biener [Thu, 24 Oct 2024 15:06:29 +0000 (17:06 +0200)]
Restrict :c to commutative ops as intended
genmatch was supposed to restrict :c to verifiable commutative
operations while leaving :C to the "I know what I'm doing" case.
The following enforces this, cleaning up parsing and amending
the commutative_op helper. There's one pattern that needs adjustment,
the pattern optimizing fmax (x, NaN) or fmax (NaN, x) to x since
fmax isn't commutative.
* genmatch.cc (commutative_op): Add paramter to indicate whether
all compares should be considered commutative. Handle
hypot, add_overflow and mul_overflow.
(parser::parse_expr): Simplify 'c' handling by using
commutative_op and error out when the operation is not.
* match.pd ((minmax:c @0 NaN@1) -> @0): Use :C, we know
what we are doing.
Richard Biener [Thu, 24 Oct 2024 14:15:43 +0000 (16:15 +0200)]
tree-optimization/117277 - remove CLOBBERs before SLP code generation
We have to remove CLOBBERs before SLP is code generated since for
store-lanes we are inserting our own CLOBBERs that we want to survive.
So the following refactors vect_transform_loop to remove unwanted
stmts first.
This resolves the gcc.target/aarch64/sve/store_lane_spill_1.c FAIL.
PR tree-optimization/117277
* tree-vect-loop.cc (vect_transform_loop): Remove CLOBBERs
and prefetches before doing any code generation.
The following implements masked load-lane discovery for SLP. The
challenge here is that a masked load has a full-width mask with
group-size number of elements when this becomes a masked load-lanes
instruction one mask element gates all group members. We already
have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY
to guard non-uniform masks, but we need to choose a way for SLP
discovery to handle possible masked load-lanes SLP trees.
I have this time chosen to handle load-lanes discovery where we
have performed permute optimization already and conveniently got
the graph with predecessor edges built. This is because unlike
non-masked loads masked loads with a load_permutation are never
produced by SLP discovery (because load permutation handling doesn't
handle un-permuting the mask) and thus the load-permutation lowering
which handles non-masked load-lanes discovery doesn't trigger.
With this SLP discovery for a possible masked load-lanes, thus
a masked load with uniform mask, produces a splat of a single-lane
sub-graph as the mask SLP operand. This is a representation that
shouldn't pessimize the mask load case and allows the masked load-lanes
transform to simply elide this splat.
This fixes the aarch64-sve.exp mask_struct_load*.c testcases with
--param vect-force-slp=1
PR tree-optimization/116575
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle
gaps, aka NULL scalar stmt.
(vect_build_slp_tree_2): Allow gaps in the middle of a
grouped mask load. When the mask of a grouped mask load
is uniform do single-lane discovery for the mask and
insert a splat VEC_PERM_EXPR node.
(vect_optimize_slp_pass::decide_masked_load_lanes): New
function.
(vect_optimize_slp_pass::run): Call it.
Richard Biener [Wed, 23 Oct 2024 09:55:31 +0000 (11:55 +0200)]
Relax vect_check_scalar_mask check
When the mask is not a constant or external def there's no need to
check the scalar type, in particular with SLP and the mask being
a VEC_PERM_EXPR there isn't a scalar operand ready to check
(not one vect_is_simple_use will get you). We later check the
vector type and reject non-mask types there.
* tree-vect-stmts.cc (vect_check_scalar_mask): Only check
the scalar type for constant or extern defs.
Tom Tromey [Wed, 31 Jul 2024 15:01:45 +0000 (09:01 -0600)]
ada: Change scope of XUB type
An earlier patch in the "nameless" series caused a regression with
-fgnat-encodings=all. Previously, all artificial types were emitted
in the CU scope in the DWARF, but with the patch, an "XUB" type is
emitted in the function scope. This causes gdb lookups to erroneously
find the XUB type rather than the type that gdb expects to find.
Note that I don't know why the earlier code worked, because decl.cc
clearly sets the XUB type's context to be the current function.
This patch changes the type's context so that it is nested in a type
that is conveniently available.
gcc/ada/ChangeLog:
* gcc-interface/decl.cc (gnat_to_gnu_entity): Use gnu_fat_type as the type
context for a XUB type.